Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
dev:translator_framework [2011/04/05 12:54] – more details on FW ajlyondev:translator_framework [2017/11/12 19:53] (current) – external edit 127.0.0.1
Line 1: Line 1:
-===== Translator Framework ===== +<html><p id="zotero-5-update-warning" style="color: red; font-weight: bold">We’re 
-The translator framework is a way to build web translators that lets translator authors avoid most of the boilerplate that usually is required for new translators, making it possible to write simple content scrapers in just few lines of JavaScript.+in the process of updating the documentation for 
 +<href="https://www.zotero.org/blog/zotero-5-0">Zotero 5.0</a>. Some documentation 
 +may be outdated in the meantime. Thanks for your understanding.</p></html>
  
-The framework was written and contributed by Erik Hetzner and is licensed under the GPLv3+. It currently resides at http://e6h.org/~egh/hg/zotero-transfw/, but there are plans to include it in Zotero itself. 
  
-To use the framework, simply insert the framework code at the beginning of your translator, after the translator information block (JSON header). If you are using [[dev/scaffold|Scaffold]] to develop your translator, you won't see the information block, and you can just insert the framework at the top of the code box. The latest version of the code is [[http://e6h.org/~egh/hg/zotero-transfw/raw-file/tip/framework.js|here]]. +See [[dev/translators/Framework]].
- +
-You'll start writing beneath the line that reads: +
-''/* End generic code */'' +
- +
-===Example Translator=== +
-From APN.ru.js (GPLv3+ licensed): +
-<code javascript> +
-function detectWeb(doc, url) { return FW.detectWeb(doc, url); } +
-function doWeb(doc, url) { return FW.doWeb(doc, url); } +
- +
-/** Articles */ +
-FW.Scraper({ +
-itemType         : 'newspaperArticle', +
-detect           : FW.Xpath('//div[@class="block_div"]/div/*[@class="article_title"]'), +
-title            : FW.Xpath('//div[@class="block_div"]/div/*[@class="article_title"]').text().trim(), +
-attachments      : FW.Url().replace(/article/,"print").makeAttachment("text/html", "APN.ru Printable"), +
-creators         : FW.Xpath('//div[@class="block_div"]/div/a[@class="pub_aname"]').text().cleanAuthor("author"), +
-date             : FW.Xpath('//div[@class="block_div"]/div/span[@class="pub_date"]').text(), +
-publicationTitle : "Агенство политических новостей" +
-}); +
- +
-/** Search results */ +
-FW.MultiScraper({ +
-itemType  : "multiple", +
-detect    : FW.Xpath('//div[@class="search_content"]'), +
-titles    : FW.Xpath('//div[@class="search_content"]/div/a[@class="searchtitle"]').text(), +
-urls    : FW.Xpath('//div[@class="search_content"]/div/a[@class="searchtitle"]').key('href').text() +
-}); +
-</code> +
- +
-This is the functional portion of a real, working web translator using the translator framework. It defines two scrapers, in this case one for newspaper articles and one for multiple result pages. +
- +
-This is the general model for creating a translator using the framework -- define several scrapers that are triggered by different kinds of page content or URLs. +
-  +
-===Scrapers=== +
-As the example translator above shows, there are two kinds of scrapers in the framework, defined using the functions ''FW.Scraper()'' and ''FW.MultiScraper()''. The first kind identifies item metadata for a single item from a single page, while the second kind identifies item page URLs on a single page and is usually used for things like search results of journal issue tables of contents. +
- +
-Both kinds of scrapers are defined by passing an object with the scraper's item type (''itemType''), detect conditions (''detect'') and other keys to the corresponding function. +
- +
-== FW.Scraper == +
-  * Required keys: ''detect'', ''itemType'' +
-  * Optional keys: ''attachments'', all [[http://gsl-nagoya-u.net/http/pub/csl-fields/index.html|Zotero item fields]] +
- +
-== FW.MultiScraper == +
-  * Required keys: ''detect'', ''itemType'', ''titles'', ''urls'' +
-  * Optional keys: ''attachments'' +
- +
-== Delegation == +
-It is possible to have a translator using this framework delegate processing to another translator, by setting the key ''itemTrans'', as in this example from the framework-derived version of the Google Scholar translator: +
- +
-<code javascript> +
-itemTrans : FW.DelegateTranslator({ translatorType : "import", +
-                                    translatorId   : "9cb70025-a888-4a29-a210-93ec52da40d4"}), +
-</code> +
- +
-==== Functions ==== +
-FIXME Functions that can be used with the framework. +
-=== Main functions === +
-  * ''FW.PageText ( )'' +
-  * ''FW.Url ( )'' +
-  * ''FW.Xpath ( expression )'' +
-  * ''FW.Scraper ( {..} )'' +
-  * ''FW.MultiScraper ( {..} )'' +
-=== String functions === +
-  * ''prepend ( text )'' +
-  * ''append ( text )'' +
-  * ''remove (regex, flags )'' note that empty entries are dropped silently-- can be used to filter +
-  * ''trim ()'' +
-  * ''trimInternal ()'' +
-  * ''match ( regex, [ group ] )'' +
-  * ''capitalizeTitle ( )'' FIXME Should support flag? +
-  * ''unescapeHTML ( text )'' +
-  * ''unescape ( text )'' +
-  * ''key ( key )'' +
-  * ''split ( regex )'' +
-  * ''join ( separator )'' +
-=== Zotero functions === +
-  * ''cleanAuthor ( text, useComma )'' +
-  * ''makeAttachment ( type, title )''+
dev/translator_framework.1302022482.txt.gz · Last modified: 2011/04/05 12:54 by ajlyon