Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dev:translator_coding [2011/04/20 16:52]
ajlyon adding section on cross-domain
dev:translator_coding [2017/11/12 19:53] (current)
Line 1: Line 1:
-Below we will describe how the ''​detect*''​ and ''​do*''​ functions ​of Zotero [[translators]] can and should be coded. If you are unfamiliar with JavaScript, make sure to check out [[https://developer.mozilla.org/en/JavaScript/​A_re-introduction_to_JavaScript|JavaScript tutorial]] to get familiar with the syntaxIn addition to the information on this page, it can often be very informative to look at existing translators to see how things are done.+<​html><​p id="​zotero-5-update-warning"​ style="​color:​ red; font-weight:​ bold">​We’re 
 +in the process ​of updating the documentation for 
 +<href="https://www.zotero.org/blog/zotero-5-0">​Zotero 5.0</​a>​. Some documentation 
 +may be outdated in the meantimeThanks for your understanding.</​p></​html>​
  
-====== Web Translators ====== 
  
-===== detectWeb ===== +See [[dev/​translators/​Coding]].
- +
-''​detectWeb''​ is run to determine whether item metadata can indeed be retrieved from the webpage. The return value of this function should be the detected item type (e.g. "​journalArticle",​ see the [[http://​gsl-nagoya-u.net/​http/​pub/​csl-fields/​index.html|overview of Zotero item types]]), or, if multiple items are found, "​multiple"​. +
- +
-''​detectWeb''​ receives two arguments, the webpage document object and URL (typically named ''​doc''​ and ''​url''​). In some cases, the URL provides all the information needed to determine whether item metadata is available, allowing for a simple ''​detectWeb''​ function, e.g. (example from ''​Cell Press.js''​):​ +
- +
-<code javascript>​function detectWeb(doc,​ url) { +
-  +
- if (url.indexOf("​search/​results"​) != -1) { +
- return "​multiple";​ +
- } else if (url.indexOf("​content/​article"​) != -1) { +
- return "​journalArticle";​ +
-+
-}</​code>​ +
- +
-===== doWeb ===== +
- +
-''​doWeb''​ is run when a user, wishing to save one or more items, activates the selected translator. Sidestepping the retrieval of item metadata, we'll first focus on how ''​doWeb''​ can be used to save retrieved item metadata (as well as attachments and notes) to your Zotero library.  +
- +
-==== Saving Single Items ==== +
- +
-=== Metadata === +
- +
-The first step towards saving an item is to create an item object of the desired [[http://​gsl-nagoya-u.net/​http/​pub/​csl-fields/​index.html|item type]] (examples from "NCBI PubMed.js"​):​ +
- +
-<code javascript>​var newItem = new Zotero.Item("​journalArticle"​);</​code>​ +
- +
-Metadata can then be stored in the properties of the object. Of the different fields available for the chosen item type (see the [[http://​gsl-nagoya-u.net/​http/​pub/​csl-fields/​index.html|Field Index]]), only the title is required. E.g.: +
- +
-<code javascript>​var title = article.ArticleTitle.text().toString();​ +
-newItem.title = title; +
- +
-var PMID = citation.PMID.text().toString();​ +
-newItem.url = "​http://​www.ncbi.nlm.nih.gov/​pubmed/"​ + PMID;</​code>​ +
- +
-After all metadata has been stored in the item object, the item can be saved: +
- +
-<code javascript>​newItem.complete();</​code>​ +
- +
-This process can be repeated (e.g. using a loop) to save multiple items. +
- +
-=== Attachments === +
- +
-Attachments may be saved alongside item metadata via the item object'​s ''​attachments''​ property. Common attachment types are full-text PDFs, links and snapshots. An example from "​Pubmed Central.js":​ +
- +
-<code javascript>​var linkurl = "​http://​www.ncbi.nlm.nih.gov/​pmc/​articles/​PMC"​ + ids[i] + "/";​ +
-newItem.attachments = [{ +
-url: linkurl, +
-title: "​PubMed Central Link",​ +
-mimeType: "​text/​html",​ +
-snapshot: false}]; +
- +
-var pdfurl = "​http://​www.ncbi.nlm.nih.gov/​pmc/​articles/​PMC"​ + ids[i] + "/​pdf/"​ + pdfFileName;​ +
-newItem.attachments.push({ +
-title:"​PubMed Central Full Text PDF",​ +
-mimeType:"​application/​pdf",​ +
-url:​pdfurl});</​code>​ +
- +
-An attachment can only be saved if the source is indicated. The source is often a URL (set on the ''​url''​ property), but can also be a file path (set on ''​path''​) or a document object (set on ''​document''​). Other properties that can be set are ''​mimeType''​ ("​text/​html"​ for webpages, "​application/​pdf"​ for PDFs), ''​title'',​ and ''​snapshot''​ (if the latter is set to ''​false'',​ an attached webpage is always saved as a link). +
- +
-=== Notes === +
- +
-Notes are saved similarly to attachments. The content of the note, which should consist of a string, should be stored in the ''​note''​ property of the item's ''​notes''​ property. A title, stored in the ''​title''​ property, is optional. E.g.: +
- +
-<code javascript>​bbCite = "​Bluebook citation: " + bbCite + "​.";​ +
-newItem.notes.push({note:​bbCite});</​code>​ +
- +
-==== Saving Multiple Items ==== +
- +
-Some webpages, such as those showing search results or the index of a journal issue, list multiple items. For these pages, web translators can be written to a) allow the user to select one or more items and b) batch save the selected items to the user's Zotero library. +
- +
-=== Item Selection === +
- +
-To present the user with a selection window that shows all the items that have been found on the webpage, a JavaScript object should be created. Then, for each item, an item ID and label should be stored in the object as a property/​value pair. The item ID is used internally by the translator, and can be a URL, DOI, or any other identifier, whereas the label is shown to the user (this will usually be the item's title). Passing the object to the ''​Zotero.selectItems''​ function will trigger the selection window, and the function will return the items that the user selected. An example from ESpacenet.js:​ +
- +
-<code javascript>​if (detectWeb(doc,​ url) == "​multiple"​) { +
-  var items = new Object(); +
-  ... +
-  while (next_title = titles.iterateNext()) { +
-    items[next_title.href] = Zotero.Utilities.trim(next_title.textContent);​ +
-  } +
-  items = Zotero.selectItems(items);​ +
-  ... +
-  }</​code>​ +
- +
-For compatibility with Zotero Connectors, ''​Zotero.selectItems''​ should preferably be called with a callback function as the second parameter. This callback function receives the object with the selected items. FIXME We need an example of a translator that does this. +
- +
-=== Batch Saving === +
- +
-== Asynchronous == +
-You will often need to make additional requests to fetch all the metadata needed, either to make multiple items, or to get additional information on a single item. The most common and reliable way to make such requests is with the utility functions ''​Zotero.Utilities.doGet'',​ ''​Zotero.Utilities.doPost'',​ ''​Zotero.Utilities.processDocuments''​. +
- +
-''​Zotero.Utilities.doGet(url,​ callback, onDone, charset)''​ sends a GET request to the specified URL or to each in an array of URLs, and then calls function ''​callback''​ with three arguments: response string, response object, and the URL. This function is frequently used to fetch standard representations of items in formats like RIS and BibTeX. The function ''​onDone''​ is called when the input URLs have all been processed. The optional ''​charset''​ argument forces the response to be interpreted in the specified character set. +
- +
-''​Zotero.Utilities.doPost(url,​ postdata, callback, charset)''​ sends a POST request to the specified URL (not an array), with the POST string defined in ''​postdata''​ and then calls function ''​callback''​ with two arguments: response string, and the response object. The optional ''​charset''​ argument forces the response to be interpreted in the specified character set. +
- +
-''​Zotero.Utilities.processDocuments(url,​ callback, onDone, charset)''​ sends a GET request to the specified URL or to each in an array of URLs, and then calls the function ''​callback''​ with XXXXX arguments: DOM document object, URL, and XXXX. FIXME the optional ''​charset''​ argument forces the response to be interpreted in the specified character set. This is approximately the equivalent of ''​doGet'',​ except that it returns DOM document objects instead of strings. +
- +
-**Note:** The response objects passed to the callbacks above are [[https://​developer.mozilla.org/​en/​XMLHttpRequest|described in detail in the MDC Documentation]]. +
- +
-''​Zotero.Utilities.processAsync(sets,​ callbacks, onDone)''​ can be used from translators to make it easier to correctly chain sets of asynchronous callbacks, since many translators that require multiple callbacks do it incorrectly [text from commit message, r4262]  +
- +
-== Synchronous == +
- +
-**Note** While synchronous loading of sources is easier to implement, it should be avoided in new code to ensure compatibility with Zotero Connectors. +
- +
-Webpages can be loaded synchronously with ''​Zotero.Utilities.retrieveDocument'',​ which requires a URL as its argument, and returns a DOM document object, e.g. (example from ''​Nagoya University OPAC.js''​):​ +
- +
-<code javascript>​for (var url in items){ +
- var doc = Zotero.Utilities.retrieveDocument(url);​ +
- ​scrapeAndParse(doc,​ url); +
- ​}</​code>​ +
- +
-Metadata documents can be loaded synchronously using ''​Zotero.Utilities.retrieveSource''​. This function can be called with only a URL, in which case a GET request is executed, or with additional body, headers and responseCharset parameters, in which case a POST request is executed. The body, headers and responseCharset parameters are respectively the request body to POST to the URL, the HTTP headers to include in request, and the character set to force on the response. An example of ''​Zotero.Utilities.retrieveSource''​ used for a GET request (from ''​Google Scholar.js''​):​ +
- +
-<code javascript>​var bibtexData = Zotero.Utilities.retrieveSource(this.bibtexLink);</​code>​ +
- +
-== Cross-Domain Restrictions == +
-Note that all the above functions are affected by [[https://​developer.mozilla.org/​en/​http_access_control|Firefox'​s HTTP Access Control]]. ​See the linked article at the Mozilla Developer Center for more details, but the gist of it is that ''​Zotero.Utilities.retrieveDocument''​ and ''​Zotero.Utilities.processDocuments''​ will not in general work when called from one domain, requesting documents from another domain. Such arrangements are actually fairly common for site index and search pages. The other functions, like ''​Zotero.Utilities.doGet'',​ will work, but the response will be a simple text string which will usually have to be processed using regular expressions,​ not XPath or other DOM-based approaches. +
- +
-When such HTTP Access Control prevents an action, you will see an error like this in the error console or debug output: +
-<​code>​00:​41:​50 Translation using Test failed:  +
-         ​message => Permission denied to access property '​documentElement'​ +
-         ​fileName => chrome://​zotero/​content/​xpcom/​translation/​browser_firefox.js +
-         ​lineNumber => 451</​code>​ +
-====== Import Translators ====== +
-====== Export Translators ====== +
-====== Search Translators ====== +
-====== Utility functions ====== +
-Zotero provides several utility functions for translators to use. Some of them are used for asynchronous and synchronous HTTP requests; those are [[#​batch_saving|discussed above]]. In addition to those HTTP functions and the many standard functions provided by JavaScript, Zotero provides: +
-  * ''​Zotero.Utilities.capitalizeTitle(title,​ ignorePreference)''​ +
-  * ''​Zotero.Utilities.cleanAuthor(author,​ creatorType,​ hasComma)''​ +
-  * ''​Zotero.Utilities.trimInternal(text)''​ +
- +
-====== Working with the Translator object ====== +
-=== Methods === +
-  * ''​Zotero.loadTranslator(type)''​ +
-  * ''​translator.setSearch(OpenURL ContextObject)''​ +
-For search translators. Takes a skeleton item object and ... FIXME +
-  * ''​translator.setString(string)''​ +
-For import translators. Sets the string that the translator will import from. +
-  * ''​translator.getTranslators()''​ +
-Returns an array of translators that should be able to run on the given data. That is, those translators that return a non-false value for ''​detectImport'',​ ''​detectSearch''​ or ''​detectWeb''​ when passed the input given with ''​setString'',​ ''​setSearch'',​ etc. +
-  * ''​translator.setTranslator(translator)''​ Takes translator object (returned by ''​getTranslators(..)'',​ or the UUID of a translator. +
-  * ''​translator.setHandler(event,​ callback)''​ +
-  * ''​translator.translate()''​ +
- +
-=== Calling an import translator === +
-use RIS as an example, then maybe MARC +
-=== Calling a search translator === +
- (from COinS.js:​53-67) +
-<code javascript>​ +
-var search = Zotero.loadTranslator("​search"​);​ +
-search.setHandler("​itemDone",​ function(obj,​ item) { +
-   ​newItems.push(item);​ +
-}); +
-search.setHandler("​done",​ function() { +
-   ​retrieveNextCOinS(needFullItems,​ newItems, couldUseFullItems,​ doc); +
-}); +
-search.setSearch(item);​ +
- +
-// look for translators +
-var translators = search.getTranslators();​ +
-search.setTranslator(translators);​ +
-search.translate();​ +
-</​code>​ +
-===== Translator Framework ===== +
-Many web translators can be written in a simplified form by using the [[dev/​translator_framework|Translator Framework]],​ a library for translator development. Translators written in this way consist of simple sets of rules for scraping item metadata from specified portions of the page. See the [[dev/translator_framework|Translator Framework]] page for details+
dev/translator_coding.txt · Last modified: 2017/11/12 19:53 (external edit)