Opened 10 years ago
Closed 4 years ago
#381 closed defect (fixed)
SIRSI translator not working at some sites
| Reported by: | stakats | Owned by: | mcburton |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | translators | Version: | 1.5 |
| Keywords: | Cc: |
Description (last modified by ajlyon)
detectWeb continues to work, but doWeb is not. There are reports of other SIRSI sites also ceasing to function.
Some sites detect and don't scrape, others don't detect at all, and still others don't detect on search pages.
Attachments (1)
Change History (16)
Changed 10 years ago by stakats
comment:1 Changed 10 years ago by stakats
- Type changed from enhancement to defect
comment:2 Changed 10 years ago by stakats
- Description modified (diff)
- Summary changed from SIRSI scraper not working at William & Mary to SIRSI scraper no longer working at William & Mary
comment:3 Changed 10 years ago by stakats
comment:4 Changed 10 years ago by dstillman
- Resolution set to fixed
- Status changed from new to closed
(In [883]) Fixes #377, Problems scraping from Hubmed/PubMed
Fixes #381, SIRSI scraper no longer working at William & Mary
And new Amazon scraper. And a few COinS errors. And possibly some others.
It turns out Firefox has a bug in which DOM nodeValues greater than 4096 characters are split into multiple nodes, and so any scrapers pulled from the repository with 'code' fields greater than 4K were being truncated. We didn't see it during testing of repo code because most are smaller.
Calling normalize() on the node combines the nodes, so future releases won't have the problem regardless of when it's fixed in Firefox. For existing installs, I managed to get PubMed, COinS, SIRSI 2003+, and, with quite a lot of effort, Amazon, under 4096 characters, hopefully without breaking anything. I removed all other scrapers from the repository for now.
comment:5 Changed 10 years ago by stakats
- Resolution fixed deleted
- Status changed from closed to reopened
The DOM issue was not the only problem with our SIRSI translator. SIRSI -2003 and 2003+ translators have been replaced with a single translator in [923]. This new translator fixes many of the outstanding issues with SIRSI, but we still have problems with some user-set view preferences. Am working with Mack Lundy to try to move away from our scraper function toward a pure MARC solution.
comment:6 Changed 10 years ago by stakats
- Owner changed from simon to stakats
- Status changed from reopened to new
comment:7 Changed 10 years ago by stakats
- Milestone changed from 1.0 Beta 3 to 1.0 Final
Pushing back milestone to 1.0 final since Mack Lundy is still working on MARC binary solution for the situations that still require screen scraping.
comment:8 Changed 9 years ago by stakats
- Milestone changed from 1.0 Final to 1.0 RC 1
- Owner changed from stakats to mikowitz
comment:9 Changed 9 years ago by stakats
- Component changed from ingester to translators
- Milestone changed from 1.0 RC 1 to 1.0 Final
- Priority changed from major to minor
comment:10 Changed 9 years ago by stakats
- Milestone changed from 1.0 Final to 1.5 Beta
- Version changed from 1.0 to 1.5
comment:11 Changed 9 years ago by stakats
- Summary changed from SIRSI scraper no longer working at William & Mary to SIRSI translator not working at some sites
comment:12 Changed 8 years ago by stakats
- Owner changed from mikowitz to mcburton
- Status changed from new to assigned
comment:13 Changed 6 years ago by ajlyon
- Description modified (diff)
- Priority changed from minor to major
More SIRSI sites that don't work:
University of Toronto: http://toroprod.library.utoronto.ca/ (possibly separate issue, only search results are broken, see ticket #1312)
Pennsylvania State University: http://cat.libraries.psu.edu/ (nothing works, see ticket #1327 for local contact info)
And using a new version, SIRSI Dynix, Symphony 3.3.1 (reported at http://forums.zotero.org/discussion/13093/):
Indiana University: http://www.iucat.iu.edu/
Mississippi State: http://catalog.library.msstate.edu/
In light of the rising number of unsupported sites, I'm going to bump up the priority of this and see what I can do to address the issue.
comment:14 Changed 6 years ago by dstillman
- Milestone 2.0 Beta 3 deleted
Milestone 2.0 Beta 3 deleted
comment:15 Changed 4 years ago by simon
- Resolution set to fixed
- Status changed from assigned to closed
Other SIRSI sites not working:
Brigham Young University
http://catalog.lib.byu.edu/
University of Virginia
http://virgo.lib.virginia.edu/
Randolph Macon
http://libcatalog.rmc.edu/
Rutgers University
http://www.iris.rutgers.edu/
Piedmont College
https://mayflower.piedmont.edu/