Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
dev:translators:framework [2011/07/23 10:53] – [Functions] ajlyondev:translators:framework [2017/11/18 16:20] (current) – Add legacy notice adamsmith
Line 1: Line 1:
 +**While the Translator Framework still works, new translators using the Framework are no longer accepted, and we are migrating existing translators away from the format.**
 +
 +**This page exists as legacy documentation only.**
 +
 ====== Translator Framework ====== ====== Translator Framework ======
 The translator framework is a way to build web translators that lets translator authors avoid most of the boilerplate that usually is required for new translators, making it possible to write simple content scrapers in just a few lines of JavaScript. The translator framework is a way to build web translators that lets translator authors avoid most of the boilerplate that usually is required for new translators, making it possible to write simple content scrapers in just a few lines of JavaScript.
  
-The framework was written and contributed by Erik Hetzner and is licensed under the AGPLv3+. It currently resides at http://e6h.org/~egh/hg/zotero-transfw/, but there are plans to include it in Zotero itself.+The framework was written and contributed by Erik Hetzner and is licensed under the AGPLv3+. It currently resides at https://gitlab.com/egh/zotero-transfw, but there are plans to include it in Zotero itself.
  
-To use the framework, simply insert the framework code at the beginning of your translator, after the translator information block (JSON header). If you are using [[dev/translators/Scaffold]] to develop your translator, you won't see the information block, and you can just insert the framework at the top of the code box. The latest version of the code is [[http://e6h.org/~egh/hg/zotero-transfw/raw-file/tip/framework.js|here]].+To use the framework, simply insert the framework code at the beginning of your translator, after the translator information block (JSON header). If you are using [[dev/translators/Scaffold]] to develop your translator, you won't see the information block, and can click "Uses translator framework" on Scaffold's Metadata tab to automatically include the code.
  
 You'll start writing beneath the line that reads: You'll start writing beneath the line that reads:
Line 54: Line 58:
  
 === FW.Scraper === === FW.Scraper ===
-  * Required keys: ''detect'', ''itemType'' +  * Required keys: ''detect'', ''itemType'' ([[http://aurimasv.github.io/z2csl/typeMap.xml|list of itemType options]]
-  * Optional keys: ''attachments'', all [[http://gsl-nagoya-u.net/http/pub/csl-fields/index.html|Zotero item fields]]+  * Optional keys: ''attachments''
  
 === FW.MultiScraper === === FW.MultiScraper ===
Line 84: Line 88:
 </code> </code>
 Here the option is used to guarantee that the multiple item page has links to the BibTeX files that the translator uses. Here the option is used to guarantee that the multiple item page has links to the BibTeX files that the translator uses.
- 
-== Delegation == 
-It is possible to have a translator using this framework delegate processing to another translator, by setting the key ''itemTrans'', as in this example from the framework-derived version of the Google Scholar translator: 
- 
-<code javascript> 
-itemTrans : FW.DelegateTranslator({ translatorType : "import", 
-                                    translatorId   : "9cb70025-a888-4a29-a210-93ec52da40d4"}), 
-</code> 
- 
-This delegation method can only be used with a ''MultiScraper'' -- each response will be sent to the specified translator to be processed, instead of being matched against the other scrapers defined in the current translator. 
  
 === Attachments === === Attachments ===
Line 130: Line 124:
 In the example above, the scraper had been written to save two potential date fields, one as "runningTime" and one as "date". The post-processing function sets the item's "date" property to the valid one of these two choices. It also checks if the author last names are in all-caps and fixes them if they are. Both of these tasks are a little hard to do within the framework. In the example above, the scraper had been written to save two potential date fields, one as "runningTime" and one as "date". The post-processing function sets the item's "date" property to the valid one of these two choices. It also checks if the author last names are in all-caps and fixes them if they are. Both of these tasks are a little hard to do within the framework.
 ===== Functions ===== ===== Functions =====
-To use the framework, just chain together functions from the list below until you get the desired output.+To use the framework, just chain together functions from the list below until you get the desired output. Note that JavaScript functions not in this list will not work within the scrapers.
 === Main functions === === Main functions ===
   * ''FW.PageText ( )'' Provides the HTML source of the current document as a string.   * ''FW.PageText ( )'' Provides the HTML source of the current document as a string.
Line 150: Line 144:
   * ''split ( regex )'' Split the string into multiple strings on the [[dev:technologies#regular_expressions|regular expression]].   * ''split ( regex )'' Split the string into multiple strings on the [[dev:technologies#regular_expressions|regular expression]].
   * ''join ( separator )'' Join all the strings into one, placing specified the separator between them.   * ''join ( separator )'' Join all the strings into one, placing specified the separator between them.
-  * ''cleanAuthor ( type, [ useComma ] )'' Makes creator objects of the specified type (i.e., ''author'', ''editor'', ''translator'', ''contributor'', ''bookAuthor'', ''director'', etc.) If the second argument is true, the input will be split into first and last names on a comma, if present, in the input.+  * ''cleanAuthor ( type, [ useComma ] )'' Makes creator objects of the specified type (i.e., ''author'', ''editor'', ''translator'', ''contributor'', ''bookAuthor'', ''director'', etc.) If the second argument is true, the input will be split into first and last names on a comma, if present, in the input. See the [[http://gimranov.com/research/zotero/creator-types|list of valid creator types for each item type]]. 
 + 
 +=== Putting things together === 
 +''FW.Xpath()'' and ''FW.Url()'' are the main functions you'll call; they return an object that, when processed by the framework, results in selecting some text from a page or in the current URL. 
 + 
 +You can also call a method on this object, e.g.: 
 + 
 +  FW.Xpath("//xpath/expression").split(/,/
 + 
 +This modifies the object to include a filter that splits the 
 +text on /,/. You can chain these together: 
 + 
 +  FW.Xpath("//xpath/expression").text().split(/,/).trim().cleanAuthor() 
 + 
 +This will split the text returned by the XPath into an array and call 
 +the filters (trim, then cleanAuthor) on each member of the array. This 
 +way we can add multiple creators to the item. 
 + 
 +If you want to add an arbitrary filter, this should work: 
 + 
 +  FW.Xpath("//xpath/expression").text().addFilter(function (s) { return s + "HELLO WORLD"; }) 
 ===== Templates ===== ===== Templates =====
 Just paste the following templates into your translator, fill in the appropriate fields, and delete the unnecessary fields. Just paste the following templates into your translator, fill in the appropriate fields, and delete the unnecessary fields.
Line 172: Line 187:
 </code> </code>
 ==== FW.Scraper ==== ==== FW.Scraper ====
-For possible values of ''itemType'' and the legal fields for each type, see the schema description at http://gsl-nagoya-u.net/http/pub/csl-fields/index.html .+For possible values of ''itemType'' and the legal fields for each type, see the schema description at http://aurimasv.github.io/z2csl/typeMap.xml .
 <code javascript> <code javascript>
 FW.Scraper({ FW.Scraper({
dev/translators/framework.1311432811.txt.gz · Last modified: 2011/07/23 10:53 by ajlyon