Opened 10 years ago
Closed 10 years ago
#377 closed defect (fixed)
Problems scraping from Hubmed/PubMed
| Reported by: | dstillman | Owned by: | simon |
|---|---|---|---|
| Priority: | major | Milestone: | 1.0 Beta 3 |
| Component: | ingester | Version: | 1.0 |
| Keywords: | Cc: |
Description
Two or three (depending on whether the first two are related) problems on Hubmed/Pubmed reported by a user:
First two are "Could not save item" errors that I can't reproduce, but I see them in the notification log:
05d07af9-105a-4572-99f6-a8e231c0daef 2006-10-02 17:00:00 81.151.78.150 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1) Gecko/20061010 Firefox/2.0 QueryInterface => function QueryInterface() {
[native code]
}
message =>
result => 2152398858
filename => chrome://zotero/content/xpcom/translate.js
lineNumber => 571
columnNumber => 0
initialize => function initialize() {
[native code]
}
url => http://www.hubmed.org/display.cgi?uids=17054214
extensions.zotero.cacheTranslatorData => true
extensions.zotero.automaticSnapshots => true 2006-10-27 06:34:26
fcf41bed-0cbc-3704-85c7-8062a0068a7a|2006-10-23 00:23:00|128.32.177.180|Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0||message => missing ) after argument list fileName => lineNumber => 108 stack => doWeb()@:0 @:0 @:0 name => SyntaxError url => http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?holding=;&db=PubMed&cmd=search&term=takeout extensions.zotero.cacheTranslatorData => true extensions.zotero.automaticSnapshots => true|2006-10-27 03:11:30
Here's the third:
In Hubmed there's an Export link below the abstract. Click this and choose the RDF export. For me, the details then show in plaintext in my browser window. Click icon to add to Zotero, now get "Saving item..." The item doesn't get saved and the 'Saving item...' will only go away with a FF restart.
This one I can reproduce. It throws an error at line 225 of translate.js: doc.location has no properties
I can access doc.location's properties via Venkman, so it looks like it may be a sandbox problem.
For what it's worth I'm also getting doc.domain has no properties on line 175 lots of places, though that's probably unrelated.
Change History (4)
comment:1 Changed 10 years ago by simon
comment:2 Changed 10 years ago by simon
the third is now resolved; the first and second i need to figure out how to reproduce
comment:3 Changed 10 years ago by dstillman
According to feedback, happens without being logged into My NCBI and without other installed extensions.
comment:4 Changed 10 years ago by dstillman
- Resolution set to fixed
- Status changed from new to closed
(In [883]) Fixes #377, Problems scraping from Hubmed/PubMed
Fixes #381, SIRSI scraper no longer working at William & Mary
And new Amazon scraper. And a few COinS errors. And possibly some others.
It turns out Firefox has a bug in which DOM nodeValues greater than 4096 characters are split into multiple nodes, and so any scrapers pulled from the repository with 'code' fields greater than 4K were being truncated. We didn't see it during testing of repo code because most are smaller.
Calling normalize() on the node combines the nodes, so future releases won't have the problem regardless of when it's fixed in Firefox. For existing installs, I managed to get PubMed, COinS, SIRSI 2003+, and, with quite a lot of effort, Amazon, under 4096 characters, hopefully without breaking anything. I removed all other scrapers from the repository for now.
(In [844]) addresses #377, Problems scraping from Hubmed/PubMed
makes scrape icon disappear when navigating away from a page