Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#379 closed defect (wontfix)

Amazon translator not working on some pages

Reported by: dstillman Owned by: simon
Priority: major Milestone: 1.0 Beta 2
Component: ingester Version: 1.0
Keywords: Cc: stakats

Description

With a fresh profile and r2, I don't see a scraping icon on this Amazon page:

http://www.amazon.com/exec/obidos/ASIN/0307237699/ref=amb_link_3613522_/102-5410905-6666567

Attachments (1)

amazonr2new.txt (4.5 KB) - added by stakats 10 years ago.

Download all attachments as: .zip

Change History (13)

comment:1 Changed 10 years ago by stakats

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in [880] with new Amazon translator.

comment:2 Changed 10 years ago by stakats

  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening this ticket since we're reverting to r2 translator for the repository.

comment:3 Changed 10 years ago by stakats

  • Milestone changed from 1.0 Beta 3 to 1.0 Beta 2

comment:4 Changed 10 years ago by dstillman

  • Resolution set to fixed
  • Status changed from reopened to closed

This is fixed--no need to reopen. I'll update the repo to Sean's translator once we have repo version logic (r412).

Changed 10 years ago by stakats

comment:5 Changed 10 years ago by dstillman

Patched r2 version added to repository

comment:6 Changed 10 years ago by stakats

Attached file fixes detectWeb issues by replacing absolute xpath with searching for all matching descendent xpaths. Also fixes problem where selection dialog presented users with extraneous "Get it by" options.

comment:7 Changed 10 years ago by dstillman

  • Cc stakats added
  • Resolution fixed deleted
  • Status changed from closed to reopened

I can't reproduce it, but it looks like some people are now getting a different error with the new old Amazon translator:

message => doc.evaluate(xpath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext() has no properties
fileName => 
lineNumber => 67
stack => scrape([object XPCNativeWrapper])@:67
doWeb([object XPCNativeWrapper],"http://www.amazon.com/Magdalene-Unveiled-Ancient-Modern-Prostitute/dp/B000FG5MFM/ref=rsl_mainw_dpl/103-1187003-1233416?ie=UTF8&m=ATVPDKIKX0DER")@:106

name => TypeError
url => http://www.amazon.com/Magdalene-Unveiled-Ancient-Modern-Prostitute/dp/B000FG5MFM/ref=rsl_mainw_dpl/103-1187003-1233416?ie=UTF8&m=ATVPDKIKX0DER
extensions.zotero.cacheTranslatorData => true
extensions.zotero.automaticSnapshots => true

Corresponds to line 86 of amazonr2new.txt above. This doesn't seem to be a truncation error. Maybe people getting served a different DOM?

comment:8 Changed 10 years ago by stakats

  • Resolution set to wontfix
  • Status changed from reopened to closed

Our r2 translator does not support the import of Amazon items other than books. Closing this ticket since our r3 translator already works.

comment:9 Changed 10 years ago by dstillman

Sorry, bad example--though can we change it to not show the URL icon for non-books then for the next couple weeks before Beta 3 is out?

How about this book, which also appears in the broken log (and, more importantly, works for me with the new old translator):

http://www.amazon.com/Nature-Girl-Carl-Hiaasen/dp/0307262995/sr=8-1/qid=1164771412/ref=pd_bbs_sr_1/103-8400978-5012646?ie=UTF8&s=books

comment:10 Changed 10 years ago by stakats

I can't reproduce any problems importing from that page using Firefox 2.0 and Zotero 1.0b2 on either MacOS X or Windows XP. Something is squirrely with that guy's installation. It's possible that his scrapers table is somehow out of whack, and perhaps we need to think about a means of manually flushing the table. Indeed, right now we do not have a way to remove scrapers, only overwrite them since we just use REPLACE INTO statements. If our automated refresh dropped the table before replacing, we could solve this problem but would delete any user-installed scrapers.

comment:11 Changed 10 years ago by dstillman

Actually there are a bunch of people experiencing the problem, so it's not just him. Amazon does do lots of A/B tests, so I wouldn't be all that surprised if it was a DOM issue, but that may be too easy an answer.

As for deleting scrapers, we don't drop the table for exactly the reason you mention--we can't delete scrapers people are developing. My trick in the meantime has been to change the scraper in the repository to priority 0, regex "nomatch", and empty detectWeb() and doWeb(), and I'm going to be adding a line in the update mechanism to delete the local version of any repo scraper with priority 0.

comment:12 Changed 10 years ago by stakats

It would make me feel much better if it turns out to be a matter of Amazon adjusting their page layout. In any event, our b3 translator only grabs the ASIN off of the page, which seems to be very stable in comparison. Everything else comes from the API. So long story short, this problem will solve itself as soon as we release b3.

Note: See TracTickets for help on using tickets.