This is an old revision of the document!


Below we will describe how the detect* and do* functions of Zotero translators can and should be coded. If you are unfamiliar with JavaScript, make sure to check out a JavaScript tutorial to get familiar with the syntax. In addition to the information on this page, it can often be very informative to look at existing translators to see how things are done.

Web Translators

detectWeb

detectWeb is run to determine whether item metadata can indeed be retrieved from the webpage. The return value of this function should be the detected item type (e.g. “journalArticle”, see the overview of Zotero item types), or, if multiple items are found, “multiple”.

detectWeb receives two arguments, the webpage document object and URL (typically named doc and url). In some cases, the URL provides all the information needed to determine whether item metadata is available, allowing for a simple detectWeb function, e.g. (example from Cell Press.js):

function detectWeb(doc, url) {
 
	if (url.indexOf("search/results") != -1) {
		return "multiple";
	} else if (url.indexOf("content/article") != -1) {
		return "journalArticle";
	}
}

doWeb

doWeb is run when a user, wishing to save one or more items, activates the selected translator. Sidestepping the retrieval of item metadata, we'll first focus on how doWeb can be used to save retrieved item metadata (as well as attachments and notes) to your Zotero library.

Saving Single Items

Metadata

The first step towards saving an item is to create an item object of the desired item type (examples from “NCBI PubMed.js”):

var newItem = new Zotero.Item("journalArticle");

Metadata can then be stored in the properties of the object. Of the different fields available for the chosen item type (see the Field Index), only the title is required. E.g.:

var title = article.ArticleTitle.text().toString();
newItem.title = title;
 
var PMID = citation.PMID.text().toString();
newItem.url = "http://www.ncbi.nlm.nih.gov/pubmed/" + PMID;

After all metadata has been stored in the item object, the item can be saved:

newItem.complete();

This process can be repeated (e.g. using a loop) to save multiple items.

Attachments

Attachments may be saved alongside item metadata via the item object's attachments property. Common attachment types are full-text PDFs, links and snapshots. An example from “Pubmed Central.js”:

var linkurl = "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC" + ids[i] + "/";
newItem.attachments = [{
url: linkurl,
title: "PubMed Central Link",
mimeType: "text/html",
snapshot: false}];
 
var pdfurl = "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC" + ids[i] + "/pdf/" + pdfFileName;
newItem.attachments.push({
title:"PubMed Central Full Text PDF",
mimeType:"application/pdf",
url:pdfurl});

An attachment can only be saved if the source is indicated. The source is often a URL (set on the url property), but can also be a file path (set on path) or a document object (set on document). Other properties that can be set are mimeType (“text/html” for webpages, “application/pdf” for PDFs), title, and snapshot (if the latter is set to false, an attached webpage is always saved as a link).

Notes

Notes are saved similarly to attachments. The content of the note, which should consist of a string, should be stored in the note property of the item's notes property. A title, stored in the title property, is optional. E.g.:

bbCite = "Bluebook citation: " + bbCite + ".";
newItem.notes.push({note:bbCite});

Saving Multiple Items

Some webpages, such as those showing search results or the index of a journal issue, list multiple items. For these pages, web translators can be written to a) allow the user to select one or more items and b) batch save the selected items to the user's Zotero library.

Item Selection

To present the user with a selection window that shows all the items that have been found on the webpage, a JavaScript object should be created. Then, for each item, an item ID and label should be stored in the object as a property/value pair. The item ID is used internally by the translator, and can be a URL, DOI, or any other identifier, whereas the label is shown to the user (this will usually be the item's title). Passing the object to the Zotero.selectItems function will trigger the selection window, and the function will return the items that the user selected. An example from ESpacenet.js:

if (detectWeb(doc, url) == "multiple") {
  var items = new Object();
  ...
  while (next_title = titles.iterateNext()) {
    items[next_title.href] = Zotero.Utilities.trim(next_title.textContent);
  }
  items = Zotero.selectItems(items);
  ...
  }

For compatibility with Zotero Connectors, Zotero.selectItems should preferably be called with a callback function as the second parameter. This callback function receives the object with the selected items. FIXME We need an example of a translator that does this.

Batch Saving

Asynchronous

You will often need to make additional requests to fetch all the metadata needed, either to make multiple items, or to get additional information on a single item. The most common and reliable way to make such requests is with the utility functions Zotero.Utilities.doGet, Zotero.Utilities.doPost, Zotero.Utilities.processDocuments.

Zotero.Utilities.doGet(url, callback, onDone, charset) sends a GET request to the specified URL or to each in an array of URLs, and then calls function callback with three arguments: response string, response object, and the URL. This function is frequently used to fetch standard representations of items in formats like RIS and BibTeX. The function onDone is called when the input URLs have all been processed. The optional charset argument forces the response to be interpreted in the specified character set.

Zotero.Utilities.doPost(url, postdata, callback, charset) sends a POST request to the specified URL (not an array), with the POST string defined in postdata and then calls function callback with two arguments: response string, and the response object. The optional charset argument forces the response to be interpreted in the specified character set.

Zotero.Utilities.processDocuments(url, callback, onDone, charset) sends a GET request to the specified URL or to each in an array of URLs, and then calls the function callback with XXXXX arguments: DOM document object, URL, and XXXX. FIXME the optional charset argument forces the response to be interpreted in the specified character set. This is approximately the equivalent of doGet, except that it returns DOM document objects instead of strings.

Note: The response objects passed to the callbacks above are described in detail in the MDC Documentation.

Zotero.Utilities.processAsync(sets, callbacks, onDone) can be used from translators to make it easier to correctly chain sets of asynchronous callbacks, since many translators that require multiple callbacks do it incorrectly [text from commit message, r4262]

Synchronous

Note While synchronous loading of sources is easier to implement, it should be avoided in new code to ensure compatibility with Zotero Connectors.

Webpages can be loaded synchronously with Zotero.Utilities.retrieveDocument, which requires a URL as its argument, and returns a DOM document object, e.g. (example from Nagoya University OPAC.js):

for (var url in items){
 var doc = Zotero.Utilities.retrieveDocument(url);
 scrapeAndParse(doc, url);
 }

Metadata documents can be loaded synchronously using Zotero.Utilities.retrieveSource. This function can be called with only a URL, in which case a GET request is executed, or with additional body, headers and responseCharset parameters, in which case a POST request is executed. The body, headers and responseCharset parameters are respectively the request body to POST to the URL, the HTTP headers to include in request, and the character set to force on the response. An example of Zotero.Utilities.retrieveSource used for a GET request (from Google Scholar.js):

var bibtexData = Zotero.Utilities.retrieveSource(this.bibtexLink);
Cross-Domain Restrictions

Note that all the above functions are affected by Firefox's HTTP Access Control. See the linked article at the Mozilla Developer Center for more details, but the gist of it is that Zotero.Utilities.retrieveDocument and Zotero.Utilities.processDocuments will not in general work when called from one domain, requesting documents from another domain. Such arrangements are actually fairly common for site index and search pages. The other functions, like Zotero.Utilities.doGet, will work, but the response will be a simple text string which will usually have to be processed using regular expressions, not XPath or other DOM-based approaches.

When such HTTP Access Control prevents an action, you will see an error like this in the error console or debug output:

00:41:50 Translation using Test failed: 
         message => Permission denied to access property 'documentElement'
         fileName => chrome://zotero/content/xpcom/translation/browser_firefox.js
         lineNumber => 451

Import Translators

Export Translators

Search Translators

Utility functions

Zotero provides several utility functions for translators to use. Some of them are used for asynchronous and synchronous HTTP requests; those are discussed above. In addition to those HTTP functions and the many standard functions provided by JavaScript, Zotero provides:

  • Zotero.Utilities.capitalizeTitle(title, ignorePreference)
  • Zotero.Utilities.cleanAuthor(author, creatorType, hasComma)
  • Zotero.Utilities.trimInternal(text)

Working with the Translator object

Methods

  • Zotero.loadTranslator(type)
  • translator.setSearch(OpenURL ContextObject)

For search translators. Takes a skeleton item object and … FIXME

  • translator.setString(string)

For import translators. Sets the string that the translator will import from.

  • translator.getTranslators()

Returns an array of translators that should be able to run on the given data. That is, those translators that return a non-false value for detectImport, detectSearch or detectWeb when passed the input given with setString, setSearch, etc.

  • translator.setTranslator(translator) Takes translator object (returned by getTranslators(..), or the UUID of a translator.
  • translator.setHandler(event, callback)
  • translator.translate()

Calling an import translator

use RIS as an example, then maybe MARC

Calling a search translator

(from COinS.js:53-67)

var search = Zotero.loadTranslator("search");
search.setHandler("itemDone", function(obj, item) {
   newItems.push(item);
});
search.setHandler("done", function() {
   retrieveNextCOinS(needFullItems, newItems, couldUseFullItems, doc);
});
search.setSearch(item);
 
// look for translators
var translators = search.getTranslators();
search.setTranslator(translators);
search.translate();

Translator Framework

Many web translators can be written in a simplified form by using the Translator Framework, a library for translator development. Translators written in this way consist of simple sets of rules for scraping item metadata from specified portions of the page. See the Translator Framework page for details.