Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#409 closed defect (fixed)

Google Books translator broken after site update

Reported by: dstillman Owned by: stakats
Priority: critical Milestone: 1.0 Beta 3
Component: ingester Version: 1.0
Keywords: Cc: stakats

Description

Assigning to Simon, but whoever has time to fix this should Accept it.

Change History (6)

comment:1 Changed 10 years ago by dcohen

Looks like there is a standard link on each Google Books page, "Find this book in a library," which goes to the WorldCat/OCLC canonical page for that ISBN, which gives you the bibliographical info.

comment:2 Changed 10 years ago by stakats

  • Owner changed from simon to stakats
  • Status changed from new to assigned

comment:3 Changed 10 years ago by stakats

I now have a working translator for single books that pulls COinS metadata from WorldCat. Two problems remain:
1) some Google Books do not include the "Find in a Library" link to WorldCat. Not sure why this is the case, but see for example, Dena Goodman's Marie Antoinette: Writings in the Body of a Queen. In cases like this, we can either fall back to page scraping or simply fail gracefully.
2) importing items from the search results page requires a double jump, first to the individual book page to generate the WorldCat link and then a second request to pull the WorldCat pages. I am stumped here about how to work around our apparent limitation to have only one asynchronous operation in a translator, since the only way to resume from Zotero.wait() is by calling Zotero.done(), which stops the translator completely.

comment:4 Changed 10 years ago by stakats

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [890]) closes #409, google books translator broken after site update

comment:5 Changed 10 years ago by stakats

OCLC WorldCat links were not a reliable source of metadata. I have contacted Google Books about exposing their metadata more transparently. For now we will continue to page scrape as before.

comment:6 Changed 10 years ago by dstillman

(In [891]) Pushed updated NYT and Google Books translators to repo

Refs #409, Google Books translator broken after site update
Refs #380, Archived New York Times articles accessed via TimesSelect aren't detected

Note: See TracTickets for help on using tickets.