This is an old revision of the document!


Zotero Translators - The Missing Manual

Tools of the Trade

Writing Zotero translators can be made much easier by using the right tools. Here are some suggested downloads:

  • Zotero - writers of Zotero translators naturally can't do without a copy of Zotero.
  • Scaffold - this tool offers an easy and quick way to modify and test translators, but is (currently) only compatible with Zotero 1.0.x. If you want to use Scaffold for writing translators, and Zotero 2.0 for other purposes, consider installing Zotero 1.0.x and Scaffold in a separate Firefox profile or separate Firefox installation.
  • XPath tools - most translators rely on XPath to extract information from HTML web pages or from XML data files. To quickly construct robust XPaths, consider using one of the following tools:
    • Firebug - a popular and very powerful tool. Very useful in inspecting the HTML structure of web pages, and finding XPaths to the elements of interest.
    • Solvent - although no longer maintained, this is the preferred XPath tool of Adam Crymble, discussed in detail in his 'How to Write a Zotero Translator'-guide (including installation instructions)
    • XPather/DOM Inspector - XPather, which requires the DOM Inspector extension to be installed, is mostly useful for testing XPaths. Recommended in combination with Firebug.

Translator Metadata

Each translator is described by several metadata fields. For the stand-alone javascript translator files in Zotero 2.0, this metadata is included at the beginning of the file in a JSON block, e.g.:

{
	"translatorID":"fcf41bed-0cbc-3704-85c7-8062a0068a7a",
	"translatorType":12,
	"label":"NCBI PubMed",
	"creator":"Simon Kornblith and Michael Berkowitz",
	"target":"http://[^/]*www\\.ncbi\\.nlm\\.nih\\.gov[^/]*/(pubmed|sites/entrez|entrez/query\\.fcgi\\?.*db=PubMed)",
	"minVersion":"1.0.0b3.r1",
	"maxVersion":"",
	"priority":100,
	"inRepository":true,
	"lastUpdated":"2008-12-15 00:25:00"
}

A description of the metadata fields:

  • translatorID
    The internal ID by which Zotero identifies the translator. It is recommended to use a GUID (GUIDs can be automatically generated in Scaffold). As the translatorID is used for automatic updating of translators, and for calling translators within other translators, using stable GUIDs is strongly recommended.
  • translatorType
    Four types of translator exist, web translators being the most common. The four types are: import (1), export (2), web (4) and search (8). The value of translatorType should be the number listed after the relevant type. Some translators belong to multiple types. In those cases, the value of translatorType is the sum of the types (e.g. an web/search translator will have a translatorType value of 12). In Scaffold the translatorType is set with checkboxes.
  • label
    The name of the translator
  • creator
    The author(s) of the translator
  • target
    For web translators, the target should specify a Javascript regular expression. Whenever a page is loaded, Zotero tests the target regular expressions of all web translators on the webpage URL. Of the matching translators, the translator with the lowest priority number will be used for that page. The translator's DetectCode function is run, and a Zotero item icon will appear in the address bar if an item is found.
  • minVersion
    The minimum version of Zotero for which the translators works properly
  • maxVersion
    The maximum version of Zotero for which the translators works properly
  • priority
    The priority number is used to determine which translator should be used, if multiple translators are found to be able to translate a certain web page. A lower number indicates a higher priority.
  • inRepository
    FIXME This probably indicates whether the translator is shipped with Zotero.
  • lastUpdated
    The date and time when the translator was last modified (format YYYY-MM-DD HH:MM:SS). Scaffold automatically updates this fields when the translator is saved (or run).

Zotero.Utilities

To do: include details on all the useful (but hidden) functions in Zotero.Utilities

When writing translator code, you can make use of a number of functions in Zotero.Utilities. Below each function is described, and an example of its use is given.

String manipulation

cleanAuthor

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L40 Zotero.Utilities.prototype.cleanAuthor = function(author, type, useComma)
@param {String} author Creator string
@param {String} type Creator type string (e.g., “author” or “editor”)
@param {Boolean} useComma Whether the creator string is in inverted (Last, First) format
@return {Object} firstName, lastName, and creatorType

Sometimes it is difficult to extract clean author names from webpages. cleanAuthor removes white-space and punctuation (.,/[]:) that precedes or follows the author name, and performs some additional clean-up as well (e.g. removal of double spaces). If the author name is inverted (last name first, separated from the first name by a comma or comma-space), set useComma to true and cleanAuthor will correctly isolate the last and first name (see example).

Example code

var name = " :Doe, John";
Zotero.debug(Zotero.Utilities.cleanAuthor(name, "author",true));

Example code debug output

'firstName' => "John"
'lastName' => "Doe"
'creatorType' => "author"

trim

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L85 Zotero.Utilities.prototype.trim = function(s)
@type String

Removes leading and trailing whitespace from a string

trimInternal

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L98 Zotero.Utilities.prototype.trimInternal = function(s) @type String

Cleans whitespace off a string and replaces multiple spaces with one

cleanString

Deprecated function, use trimInternal instead.

superCleanString

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L123 Zotero.Utilities.prototype.superCleanString = function(x) @type String

Cleans any non-word non-parenthesis characters off the ends of a string

cleanTags

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L136 Zotero.Utilities.prototype.cleanTags = function(x) @type String

Eliminates HTML tags, replacing each instance of <br> with a newline

htmlSpecialChars

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L153 Zotero.Utilities.prototype.htmlSpecialChars = function(str) @type String

Escapes several predefined characters:

  • & (ampersand) becomes &amp;
  • “ (double quote) becomes &quot;
  • ' (single quote) becomes &#039;
  • < (less than) becomes &lt;
  • > (greater than) becomes &gt;

and

  • <ZOTEROBREAK/> becomes <br/>
  • <ZOTEROHELLIP> becomes &#8230;

unescapeHTML

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L189 Zotero.Utilities.prototype.unescapeHTML = function(str) @type String

Converts all HTML entities in a string into Unicode characters.

parseMarkup

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L206 Zotero.Utilities.prototype.parseMarkup = function(str) @return {Array} An array of objects with the following form: { type: 'text'|'link', text: “text content”, [ attributes: { key1: val [ , key2: val, …] } }</pre>

Parses a text string for HTML/XUL markup and returns an array of parts. Currently only finds HTML links (<a> tags)

isInt

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L247 Zotero.Utilities.prototype.isInt = function(x) @deprecated Use isNaN(parseInt(x)) @type Boolean

Tests if a string is an integer

getPageRange

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L260 Zotero.Utilities.prototype.getPageRange = function(pages) @param {String} Page range to parse @return {Integer[]} Start and end pages

Parses a page range

lpad

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L283 Zotero.Utilities.prototype.lpad = function(string, pad, length) @param {String} string String to pad @param {String} pad String to use as padding @length {Integer} length Length of new padded string @type String

Pads a number or other string with a given string on the left

getLocalizedCreatorType

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L342 Zotero.Utilities.prototype.capitalizeTitle = function(string, force) @param {String} string @param {Boolean} force Forces title case conversion, even if the capitalizeTitles pref is off @type String

Cleans a title, converting it to title case and replacing ” :“ with ”:“

Other functions

itemTypeExists

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L297 Zotero.Utilities.prototype.itemTypeExists = function(type) @param {String} type Item type @type Boolean

Tests if an item type exists (FIXME: what is the use case for this?)

getCreatorsForType

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L311 Zotero.Utilities.prototype.getCreatorsForType = function(type) @param {String} type Item type @return {String[]} Creator types

Find valid creator types for a given item type (FIXME: what is the use case for this?)

getLocalizedCreatorType

Function description
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L327 Zotero.Utilities.prototype.getLocalizedCreatorType = function(type) @param {String} type Creator type @param {String} Localized creator type @type Boolean

Gets a creator type name, localized to the current locale (FIXME: what is the use case for this?)

Zotero.Utilities.processAsync https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/utilities.js#L361

Item properties

Especially for screen scraper translators, knowing which item types (book, journalArticle, etc) and item fields (title, url, etc) exist in Zotero can be very helpful. Fortunately, the possible item properties can be found in the following source code file (types are listed as “itemTypes” entries, fields as “itemFields”):

https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/locale/en-US/zotero/zotero.properties#L154

Note that the different item types make use of different combinations of item fields (e.g. the book item type has the field ISBN, while the journalArticle item type lacks this field).

Translator delegation

To do: describe how translators can call other translators (annotate existing RIS-translator with a bunch of comments?)

Useful translator examples

To do: pick some examples of the different types of translators:

  • XML based: NCBI Pubmed, Google Books
  • RIS translators
  • Pure screen scrapers