#379 closed defect (wontfix)
Amazon translator not working on some pages
| Reported by: | dstillman | Owned by: | simon |
|---|---|---|---|
| Priority: | major | Milestone: | 1.0 Beta 2 |
| Component: | ingester | Version: | 1.0 |
| Keywords: | Cc: | stakats |
Description
With a fresh profile and r2, I don't see a scraping icon on this Amazon page:
http://www.amazon.com/exec/obidos/ASIN/0307237699/ref=amb_link_3613522_/102-5410905-6666567
Attachments (1)
Change History (13)
comment:1 Changed 10 years ago by stakats
- Resolution set to fixed
- Status changed from new to closed
comment:2 Changed 10 years ago by stakats
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening this ticket since we're reverting to r2 translator for the repository.
comment:3 Changed 10 years ago by stakats
- Milestone changed from 1.0 Beta 3 to 1.0 Beta 2
comment:4 Changed 10 years ago by dstillman
- Resolution set to fixed
- Status changed from reopened to closed
This is fixed--no need to reopen. I'll update the repo to Sean's translator once we have repo version logic (r412).
Changed 10 years ago by stakats
comment:5 Changed 10 years ago by dstillman
Patched r2 version added to repository
comment:6 Changed 10 years ago by stakats
Attached file fixes detectWeb issues by replacing absolute xpath with searching for all matching descendent xpaths. Also fixes problem where selection dialog presented users with extraneous "Get it by" options.
comment:7 Changed 10 years ago by dstillman
- Cc stakats added
- Resolution fixed deleted
- Status changed from closed to reopened
I can't reproduce it, but it looks like some people are now getting a different error with the new old Amazon translator:
message => doc.evaluate(xpath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext() has no properties fileName => lineNumber => 67 stack => scrape([object XPCNativeWrapper])@:67 doWeb([object XPCNativeWrapper],"http://www.amazon.com/Magdalene-Unveiled-Ancient-Modern-Prostitute/dp/B000FG5MFM/ref=rsl_mainw_dpl/103-1187003-1233416?ie=UTF8&m=ATVPDKIKX0DER")@:106 name => TypeError url => http://www.amazon.com/Magdalene-Unveiled-Ancient-Modern-Prostitute/dp/B000FG5MFM/ref=rsl_mainw_dpl/103-1187003-1233416?ie=UTF8&m=ATVPDKIKX0DER extensions.zotero.cacheTranslatorData => true extensions.zotero.automaticSnapshots => true
Corresponds to line 86 of amazonr2new.txt above. This doesn't seem to be a truncation error. Maybe people getting served a different DOM?
comment:8 Changed 10 years ago by stakats
- Resolution set to wontfix
- Status changed from reopened to closed
comment:9 Changed 10 years ago by dstillman
Sorry, bad example--though can we change it to not show the URL icon for non-books then for the next couple weeks before Beta 3 is out?
How about this book, which also appears in the broken log (and, more importantly, works for me with the new old translator):
comment:10 Changed 10 years ago by stakats
I can't reproduce any problems importing from that page using Firefox 2.0 and Zotero 1.0b2 on either MacOS X or Windows XP. Something is squirrely with that guy's installation. It's possible that his scrapers table is somehow out of whack, and perhaps we need to think about a means of manually flushing the table. Indeed, right now we do not have a way to remove scrapers, only overwrite them since we just use REPLACE INTO statements. If our automated refresh dropped the table before replacing, we could solve this problem but would delete any user-installed scrapers.
comment:11 Changed 10 years ago by dstillman
Actually there are a bunch of people experiencing the problem, so it's not just him. Amazon does do lots of A/B tests, so I wouldn't be all that surprised if it was a DOM issue, but that may be too easy an answer.
As for deleting scrapers, we don't drop the table for exactly the reason you mention--we can't delete scrapers people are developing. My trick in the meantime has been to change the scraper in the repository to priority 0, regex "nomatch", and empty detectWeb() and doWeb(), and I'm going to be adding a line in the update mechanism to delete the local version of any repo scraper with priority 0.
comment:12 Changed 10 years ago by stakats
It would make me feel much better if it turns out to be a matter of Amazon adjusting their page layout. In any event, our b3 translator only grabs the ASIN off of the page, which seems to be very stable in comparison. Everything else comes from the API. So long story short, this problem will solve itself as soon as we release b3.
Fixed in [880] with new Amazon translator.