Opened 10 years ago
Closed 10 years ago
#88 closed enhancement (fixed)
migrate scrapers away from RDF
| Reported by: | simon | Owned by: | simon |
|---|---|---|---|
| Priority: | major | Milestone: | 1.0 Beta 1 |
| Component: | ingester | Version: | 1.0 |
| Keywords: | Cc: |
Description
Dan Cohen says all we need is the ability to run our scrapers from within Piggy Bank, not the ability to run Piggy Bank scrapers from within Firefox Scholar. Thus, as soon as Dan S. implements #87, I can get rid of all this RDF mess and replace it with something much cleaner. Then, if we get the Mellon grant, we can just make the web repository code modify our scrapers (by adding a function at the top) to support RDF. We get a simple, consistent API and SIMILE gets the messy RDF one.
Depends on #87
Change History (2)
comment:1 Changed 10 years ago by simon
- Component changed from uncategorized to ingester
- Milestone set to 1.0 Alpha 2
- Owner changed from nobody to simon
- Version set to 1.0
comment:2 Changed 10 years ago by simon
- Resolution set to fixed
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
(In [364]) closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.