This is an old revision of the document!


Writing Translator Code

Below we will describe how the detect* and do* functions of Zotero translators can and should be coded. If you are unfamiliar with JavaScript, make sure to check out a JavaScript tutorial to get familiar with the syntax. In addition to the information on this page, it can often be very informative to look at existing translators to see how things are done. A particularly helpful guide with up-to-date recommendation on best coding practices is provided by the wikimedia foundation, whose tool Citoid uses Zotero translators.

While translators can be written with any text editor, the Zotero add-on Scaffold can make writing them much easier, as it provides the option to test and troubleshoot translators relatively quickly.

Web Translators

detectWeb

detectWeb is run to determine whether item metadata can indeed be retrieved from the webpage. The return value of this function should be the detected item type (e.g. “journalArticle”, see the overview of Zotero item types), or, if multiple items are found, “multiple”.

detectWeb receives two arguments, the webpage document object and URL (typically named doc and url). In some cases, the URL provides all the information needed to determine whether item metadata is available, allowing for a simple detectWeb function, e.g. (example from Cell Press.js):

function detectWeb(doc, url) {
 
	if (url.indexOf("search/results") != -1) {
		return "multiple";
	} else if (url.indexOf("content/article") != -1) {
		return "journalArticle";
	}
}

doWeb

doWeb is run when a user, wishing to save one or more items, activates the selected translator. Sidestepping the retrieval of item metadata, we'll first focus on how doWeb can be used to save retrieved item metadata (as well as attachments and notes) to your Zotero library.

Saving Single Items

Metadata

The first step towards saving an item is to create an item object of the desired item type (examples from “NCBI PubMed.js”):

var newItem = new Zotero.Item("journalArticle");

Metadata can then be stored in the properties of the object. Of the different fields available for the chosen item type (see the Field Index), only the title is required. E.g.:

var title = article.ArticleTitle.text().toString();
newItem.title = title;
 
var PMID = citation.PMID.text().toString();
newItem.url = "http://www.ncbi.nlm.nih.gov/pubmed/" + PMID;

After all metadata has been stored in the item object, the item can be saved:

newItem.complete();

This process can be repeated (e.g. using a loop) to save multiple items.

Attachments

Attachments may be saved alongside item metadata via the item object's attachments property. Common attachment types are full-text PDFs, links and snapshots. An example from “Pubmed Central.js”:

var linkurl = "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC" + ids[i] + "/";
newItem.attachments = [{
	url: linkurl,
	title: "PubMed Central Link",
	mimeType: "text/html",
	snapshot: false
}];
 
var pdfurl = "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC" + ids[i] + "/pdf/" + pdfFileName;
newItem.attachments.push({
	title:"Full Text PDF",
	mimeType:"application/pdf",
	url:pdfurl
});

An attachment can only be saved if the source is indicated. The source is often a URL (set on the url property), but can also be a file path (set on path) or a document object (set on document). Other properties that can be set are mimeType (“text/html” for webpages, “application/pdf” for PDFs), title, and snapshot (if the latter is set to false, an attached webpage is always saved as a link).

In the very common case of saving the current page as an attachment, set document to the current document, so that Zotero doesn't have to make an additional request:

newItem.attachments.push({
title:"Snapshot",
document:doc});

Notes

Notes are saved similarly to attachments. The content of the note, which should consist of a string, should be stored in the note property of the item's notes property. A title, stored in the title property, is optional. E.g.:

bbCite = "Bluebook citation: " + bbCite + ".";
newItem.notes.push({note:bbCite});

When saving more than one item from a single source, relationships can be established between the items being saved. These relationships are established using two properties of the item object: seeAlso and itemID. To establish a relationship, set itemID to some unique value on one or more of the item objects, and assign an array of the IDs of related items to the seeAlso property of another item object.

Note: The itemID used here is completely ad hoc– it has nothing to do with the internal ID that Zotero assigns items once they are saved. Also, it is not possible to establish a relationship to an item previously saved to Zotero, since non-export translators have no access to the local library.

When the item objects are saved via item.complete(), the relationships will be established. The following code illustrates a simple seeAlso relationship:

function doWeb(doc, url) {
	var item, items, i, ilen, j, jlen;
 
	Zotero.debug("Simple example of setting seeAlso relations");
 
	items = [];
 
	// Real data acquisition would happen here
	var titles = ["Book A", "Book B"];
	for (i = 0, ilen = 2; i < ilen; i += 1) {
		item = new Zotero.Item();
		item.itemType = "book";
		item.title = titles[i];
		items.push(item);
	}
 
	// Assign a bogus itemID to each item in the set
	for (i = 0, ilen = items.length) {
		items[i].itemID = "" + i;
	}
 
	// Set bogus itemIDs in each item's seeAlso
	// field (skipping the item's own ID)
	for (i = 0, ilen = items.length; i < ilen; i += 1) {
		for (j = 0, jlen = items.length; j < jlen; j += 1) {
			if (i === j) {
				continue;
			}
			items[i].seeAlso.push("" + j);
		}
	}
 
	// Save the items
	for (i = 0, ilen = items.length; i < ilen; i += 1) {
		items[i].complete();
	}
};

Saving Multiple Items

Some webpages, such as those showing search results or the index of a journal issue, list multiple items. For these pages, web translators can be written to a) allow the user to select one or more items and b) batch save the selected items to the user's Zotero library.

Item Selection

To present the user with a selection window that shows all the items that have been found on the webpage, a JavaScript object should be created. Then, for each item, an item ID and label should be stored in the object as a property/value pair. The item ID is used internally by the translator, and can be a URL, DOI, or any other identifier, whereas the label is shown to the user (this will usually be the item's title). Passing the object to the Zotero.selectItems function will trigger the selection window, and the function passed as the second argument will receive an object with the selected items, as in this example from the IMDb translator:

Zotero.selectItems(items, function(items) {
	if(!items) return true;
	for (var i in items) {
		ids.push(i);
	}
	apiFetch(ids);
});

Here, Zotero.selectItems(..) is called with an anonymous function as the callback. As in many translators, the selected items are simply loaded into an array and passed off to a processing function that makes requests for each of them.

Batch Saving

You will often need to make additional requests to fetch all the metadata needed, either to make multiple items, or to get additional information on a single item. The most common and reliable way to make such requests is with the utility functions Zotero.Utilities.doGet, Zotero.Utilities.doPost, Zotero.Utilities.processDocuments.

Zotero.Utilities.doGet(url, callback, onDone, charset) sends a GET request to the specified URL or to each in an array of URLs, and then calls function callback with three arguments: response string, response object, and the URL. This function is frequently used to fetch standard representations of items in formats like RIS and BibTeX. The function onDone is called when the input URLs have all been processed. The optional charset argument forces the response to be interpreted in the specified character set.

Zotero.Utilities.doPost(url, postdata, callback, headers, charset) sends a POST request to the specified URL (not an array), with the POST string defined in postdata and headers set as defined in headers associative array (optional), and then calls function callback with two arguments: response string, and the response object. The optional charset argument forces the response to be interpreted in the specified character set.

Zotero.Utilities.processDocuments(url, callback, onDone, charset) sends a GET request to the specified URL or to each in an array of URLs, and then calls the function callback with a single argument, the DOM document object.

Note: The response objects passed to the callbacks above are described in detail in the MDC Documentation.

Zotero.Utilities.processAsync(sets, callbacks, onDone) can be used from translators to make it easier to correctly chain sets of asynchronous callbacks, since many translators that require multiple callbacks do it incorrectly [text from commit message, r4262]

Import Translators

To read in the input text, call Zotero.read():

var line;
while((line = Zotero.read()) !== false)) {
      // Do something
}

If given an integer argument, the function will provide up to the specific number of bytes. Zotero.read() returns false when it reaches the end of the file.

If dataMode in the translator metadata is set to rdf/xml or xml/dom, the input will be parsed accordingly, and the data will be made available through Zotero.RDF and Zotero.getXML(), respectively. Documentation for these input modes is not available, but consult the RDF translators (“RDF.js”, “Bibliontology RDF.js”, “Embedded RDF.js”) and XML-based translators (“MODS.js”, “CTX.js”) to see how these modes can be used.

Creating Collections

To create collections, make a collection object and append objects to its children attribute. Just like ordinary Zotero items, you must call collection.complete() to save a collection– otherwise it will be silently discarded.

var item = new Zotero.Item("book");
item.itemID = "my-item-id"; // any string or number
item.complete();
 
var collection = new Zotero.Collection();
collection.name = "Test Collection";
collection.type = "collection";
collection.children = [{type:"item", id:"my-item-id"}];
collection.complete();

The children of a collection can include other collections. In this case, collection.complete() should be called only on the top-level collection.

Export Translators

Export translators use Zotero.nextItem() and optionally Zotero.nextCollection() to iterate through the items selected for export, and generally write their output using Zotero.write( text ). A minimal translator might be:

function doExport() {
    var item;
    while (item = Zotero.nextItem()) {
        Zotero.write(item.title);
    }
}

As with import translators, it is also possible to produce XML and RDF/XML using Zotero.RDF. See for example Zotero RDF which is a RDF export translator, which also deals with collections.

Exporting Collections

If configOptions in the translator metadata has the getCollections attribute set to true, the Zotero.nextCollection() call will be available. It provides collection objects like those created on import.

while(collection = Zotero.nextCollection()) {
        // Do something
}

The function Zotero.nextCollection() returns a collection object:

{
        id : "ABCD1234", // Eight-character hexadecimal key
        children : [ item, item, .. , item ], // Array of Zotero item objects
        name : "Test Collection"
}

The collection ID here is the same thing as the collection key used in API calls.

Search Translators

The detectSearch and doSearch functions of search translators are passed item objects. On any given input detectSearch should return true or false, as in “COinS.js”:

function detectSearch(item) {
        if(item.itemType === "journalArticle" || item.DOI) {
                return true;
        }
        return false;
}

doSearch should augment the provided item with additional information and call item.complete() when done. Since search translators are never called directly, but only by other translators or by the Add Item by Identifier (magic wand) function, it is common for the information to be further processed an ''itemDone'' handler specified in the calling translator.

Further Reference

Utility Functions

Zotero provides several utility functions for translators to use. Some of them are used for asynchronous and synchronous HTTP requests; those are discussed above. In addition to those HTTP functions and the many standard functions provided by JavaScript, Zotero provides:

  • Zotero.Utilities.capitalizeTitle(title, ignorePreference)
    Applies English-style title case to the string, if the capitalizeTitles hidden preference is set. If ignorePreference is true, title case will be applied even if the preference is set to false. This function is often useful for fixing capitalization of personal names, in conjunction with the built-in string method text.toLowerCase().
  • Zotero.Utilities.cleanAuthor(author, creatorType, hasComma)
    Attempts to split the given string into firstName and lastName components, splitting on a comma if desired and performs some clean-up (e.g. removes unnecessary white-spaces and punctuation). The creatorType (see the list of valid creator types for each item type) will be just passed trough. Returns a creator object of the form: { lastName: , firstName: , creatorType: }, which can for example used directly in item.creators.push() as argument.
  • Zotero.Utilities.getItemArray(doc, node, includeRegex, excludeRegex)
    Given the current DOM document, and a node or nodes in that document, returns an associative array of link ⇒ textContent pairs, suitable for passing to Zotero.selectItems(..). All <a> children of the specified node with HREF attributes that are matched by includeRegex and/or not matched by excludeRegex are included in the array.
    var items = Zotero.Utilities.getItemArray(doc,
                    doc.getElementById("MainColumn")
                      .getElementsByTagName("h1"),
                      '/artikel/.+\\.html');
    Zotero.selectItems(items, processCallback);
  • Zotero.Utilities.trimInternal(text)
    Removes extra internal whitespace from the text and returns it. This is frequently useful for post-processing text extracted using XPath, which frequently has odd internal whitespace.
  • Zotero.Utilities.xpath(elements, xpath, [namespaces])
    Evaluates the specified XPath on the DOM element or array of DOM elements given, with the optionally specified namespaces. If present, the third argument should be object whose keys represent namespace prefixes, and whose values represent their URIs. Returns an array of matching DOM elements, or null if no match. (Added in Zotero 2.1.9)
  • Zotero.Utilities.xpathText(elements, xpath, [namespaces], [delimiter])
    Generates a string from the content of nodes matching a given XPath, as in Zotero.Utilities.xpath(..). By default, the nodes' content is delimited by commas; a different delimiter symbol or string may be specified. (Added in Zotero 2.1.9)
  • Zotero.Utilities.removeDiacritics(str, lowercaseOnly)
    Removes diacritics from a string, returning the result. The second argument is an optimization that specifies that only lowercase diacritics should be replaced. (Added in Zotero 3.0)
  • Zotero.debug(text)
    Prints the specified message to the debug log at zotero://debug.

Zotero.Utilities can optionally be replaced with the shorthand ZU and Zotero with Z, as in ZU.capitalizeTitle(..) and Z.debug(..).

Function and Object Index

See also the Function and Object Index, which lists (without documentation), all the functions and objects are accessible to translators.

Calling other translators

Web translators can call other translators to parse metadata provided in a standard format with the help of existing import translators, or to augment incomplete data with the help of search translators. There are several ways of invoking other translators.

Calling a translator by UUID

This is the most common way to use another translator– simply specify the translator type and the UUID of the desired translator. In this case, the RIS translator is being called.

var translator = Zotero.loadTranslator("import");
translator.setTranslator("32d59d2d-b65a-4da4-b0a3-bdd3cfb979e7");
translator.setString(text);
translator.translate();

Calling a translator using ''getTranslators''

This code, based on the “COinS.js” code, calls getTranslators() to identify which search translators can make a complete item out of the basic template information already present. Note that translate() is called from within the event handler. Analogous logic could be used to get the right import translator for incoming metadata in an unknown format.

var search = Zotero.loadTranslator("search");
search.setHandler("translators", function(obj, translators) {
     search.setTranslator(translators);
     search.translate();
});
search.setSearch(item);
// look for translators for given item
search.getTranslators();

Using ''getTranslatorObject''

The MARC translator is one of several translators that provide an interface to their internal logic by exposing several objects, listed in their exports array. Here, it provides an object that encapsulates the MARC logic. The translator can also take input specified via setString that can take binary MARC, but this provides a way for library catalog translators to feed human-readable MARC into the translator.

// Load MARC
var translator = Zotero.loadTranslator("import");
translator.setTranslator("a6ee60df-1ddc-4aae-bb25-45e0537be973");
translator.getTranslatorObject( function (obj) {
     var record = obj.record();
     record.leader = "leader goes here";
     record.addField(code, indicator, content);
     var item = new Zotero.Item();
     record.translate(item);
     item.libraryCatalog = "Zotero.org Library Catalog";
     item.complete();
});

Method overview

  • Zotero.loadTranslator(type)
    Type should be one of import or search. Returns an object with the methods below.
  • translator.setSearch(item)
    For search translators. Sets the skeleton item object the translator will use for its search.
  • translator.setString(string)
    For import translators. Sets the string that the translator will import from.
  • translator.setDocument(document)
    For web translators. Sets the document that the translator will use.
  • translator.setTranslator(translator)
    Takes translator object (returned by getTranslators(..), or the UUID of a translator.
  • translator.setHandler(event, callback)
    Valid events are itemDone, done, translators, error. The itemDone handler is called on each invocation of item.complete() in the translator, and the specified callback is passed two arguments: the translator object and the item in question. Note: The itemDone callback is responsible for calling item.complete() on the item it receives, otherwise the item will not be saved to the database.
  • translator.getTranslators()
    Send a translators event to the registered handler (use setHandler, above). The handler will be called with, as its second argument, an array of those translators that return a non-false value for detectImport, detectSearch or detectWeb when passed the input given with setString, setSearch, etc.
  • translator.getTranslatorObject(callback)
    The callback is passed an object that has the variables and functions defined in the translator as attributes and methods. In connectors, only the exports object, if present in the translator, will be passed to the callback. If an exports object is present, other functions and variables in the translator will not be passed to the callback, even when running in Firefox.
    This is typically used when calling import translators that define utility functions, like the MARC and RDF translators. Despite the unfortunate nomenclature, this object is not the same thing as the object returned by getTranslators(..) or by Zotero.loadTranslator().
  • translator.translate()
    Runs the translator on the given input.