Differences

This shows you the differences between two versions of the page.

--- dev:translators:coding [2022/03/04 19:27] – Update instructions a little abejellinek
+++ dev:translators:coding [2023/08/04 01:14] (current) – [Search Translators] dstillman
@@ Line 27: / Line 27: @@
 ===== doWeb =====
-''doWeb'' is run when a user, wishing to save one or more items, activates the selected translator. Sidestepping the retrieval of item metadata, we'll first focus on how ''doWeb'' can be used to save retrieved item metadata (as well as attachments and notes) to your Zotero library.
+''doWeb'' is run when a user, wishing to save one or more items, activates the selected translator. It can be seen as the entry point of the translation process.
+The signature of ''doWeb'' should be
+<code javascript>doWeb(doc, url)</code>
+Here ''doc'' refers to the DOM object of the web page that the user wants to save as a Zotero item, and ''url'' is the page's URL as a string.
+In this section, we will describe the common tasks in the translation workflow started by ''doWeb()''.
 ==== Saving Single Items ====
+=== Scraping for metadata ===
+"Scraping" refers to the act of collecting information that can be used to populate Zotero item fields from the web page. Such information typically include the title, creators, permanent URL, and source of the work being saved (for example, the title/volume/pages of a journal).
+Having identified what information to look for, you need to know where to look. The best way to do this is to use the web inspections tools that come with the browser ([[https://firefox-source-docs.mozilla.org/devtools-user/page_inspector/|Firefox]], [[https://developer.chrome.com/docs/devtools/dom/|Chromium-based]], and [[https://webkit.org/web-inspector/elements-tab/|Webkit/Safari]]). They are indispensable for locating the DOM node / HTML element -- by visual inspection, searching, or browsing the DOM tree.
+To actually retrieve information from the nodes in your translator code, you should be familiar with the use of [[https://developer.mozilla.org/en-US/docs/Web/API/Document_object_model/Locating_DOM_elements_using_selectors|selectors]], in the way they are used with the JavaScript API function ''[[https://developer.mozilla.org/en-US/docs/Web/API/Element/querySelectorAll|querySelectorAll()]]''.
+Most often, you will do the scraping using the helper functions ''text()'' and ''attr()'', for retrieving text content and attribute value, respectively. In fact, these two actions are performed so often, that ''text()'' and ''attr()'' are available to the translator script as top-level functions.
+<code javascript>function text(parentNode, selector[, index])
+function attr(parentNode, selector, attributeName[, index])</code>
+  * ''text()'' finds the descendant of ''parentNode'' (which can also be a document) that matches ''selector'', and returns the text content (i.e. the value of the ''textContent'' property) of the selected node, with leading and trailing whitespace trimmed. If the selector doesn't match, the empty string is returned.
+  * ''attr()'' similarly uses the selector to locate a descendant node. However, it returns the value of the HTML attribute ''attributeName'' on that element. If the selector doesn't match, or if the there's no specified attribute on that element, the empty string is returned.
+Optionally, a number ''index'' (zero-based) can be used to select a specific node when the selector matches multiple nodes. If the index is out of range, the return value of both function will be the empty string.
+Another less-used helper function ''innerText()'' has the same signature as ''text()'', but it differs from the latter by returning the selected node's ''[[https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/innerText|innerText]]'' value, which is affected by how the node's content would have been rendered.
+In addition, you can always use the API functions ''querySelector'' and ''querySelectorAll'' directly, but the helper functions should be preferred when they are adequate for the job.
+In some older translator code, you are likely to encounter node-selection expressed by XPath. Although XPath has its uses, for the most common types of scraping the selector-based functions should be preferred because of the simpler syntax of selectors.
 === Metadata ===
@@ Line 232: / Line 265: @@
 </code>
-''doSearch'' should augment the provided item with additional information and call ''item.complete()'' when done. Since search translators are never called directly, but only by other translators or by the [[:getting_stuff_into_your_library#add_item_by_identifier|Add Item by Identifier]] (magic wand) function, it is common for the information to be further processed an [[#calling_other_translators|''itemDone'' handler]] specified in the calling translator.
+''doSearch'' should augment the provided item with additional information and call ''item.complete()'' when done. Since search translators are never called directly, but only by other translators or by the [[:adding_items_to_zotero#add_item_by_identifier|Add Item by Identifier]] (magic wand) function, it is common for the information to be further processed an [[#calling_other_translators|''itemDone'' handler]] specified in the calling translator.
 ====== Further Reference ======
 ===== Utility Functions =====