Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
dev:how_to_write_a_zotero_translator_plusplus [2010/07/27 17:14] – format tweak tomrochewikidev:how_to_write_a_zotero_translator_plusplus [2017/11/12 19:53] – external edit 127.0.0.1
Line 1: Line 1:
 +<html><p id="zotero-5-update-warning" style="color: red; font-weight: bold">We’re
 +in the process of updating the documentation for
 +<a href="https://www.zotero.org/blog/zotero-5-0">Zotero 5.0</a>. Some documentation
 +may be outdated in the meantime. Thanks for your understanding.</p></html>
 +
 +
 +**Note: This guide's advice on translator code [[https://github.com/zotero/translators/pull/1282#issuecomment-287574730|may be outdated]].** Instead, model your code on [[https://github.com/zotero/translators/pulls|recently updated translators]], [[https://github.com/zuphilip/translators/wiki/Common-code-blocks-for-translators|common code templates]], [[https://www.zotero.org/support/dev/translators/coding|the translator coding documentation]], and search for answers in the [[https://forums.zotero.org/discussions|Zotero Forums]] and [[https://github.com/zotero/translators/issues|on GitHub]].
 +
 ===== Chapter 0: Introduction ===== ===== Chapter 0: Introduction =====
  
Line 10: Line 18:
 ===== Chapter 1: Introduction to Zotero Translators ===== ===== Chapter 1: Introduction to Zotero Translators =====
  
-//Note:// the following is adapted from [[http://niche-canada.org/member-projects/zotero-guide/chapter1.html|HWZT chapter 1 (Intro)]]+//Note:// the following is adapted from [[http://niche-canada.org/member-projects/zotero-guide/chapter1.html|HWZT chapter 1]]
  
 ==== Zotero ==== ==== Zotero ====
Line 119: Line 127:
 This is NOT a guide detailing how to use Zotero. It is a guide detailing how to write code to extend the usefulness of Zotero. This is NOT a guide detailing how to use Zotero. It is a guide detailing how to write code to extend the usefulness of Zotero.
  
-===== Chapter 2: Troubleshooting =====+===== Chapter 2: General Troubleshooting Guidelines =====
  
-[[http://niche-canada.org/member-projects/zotero-guide/chapter2.html|HWZT chapter 2 (Troubleshooting)]]: information only, appears uptodate, no deltas.+//Note:// the following is adapted from [[http://niche-canada.org/member-projects/zotero-guide/chapter2.html|HWZT chapter 2]]
  
-===== Chapter 3Required Software =====+Before we start, you should be awareyou will get frustrated — at least once. Here are a few tips to help you solve your problems.
  
-[[http://niche-canada.org/member-projects/zotero-guide/chapter3.html|HWZT chapter 3 (Required Software)]]: deltas are+==== Search Engines ====
  
-  - Scaffold: don't get Scaffold 1.0 from the link in HWZT, get Scaffold 2.0 [[http://bitbucket.org/rmzelle/scaffold/downloads|here]](temporarily). +If you run into difficulties when writing computer code the great news is: the answer to almost any problem can be found onlineAll computer programmers have needed help at one time or anotherand given their love for computersmost sought that help onlineLucky for youthat means that most of the questions they asked — and the subsequent answers — are still floating around the internet.
-  - Solvent: also downlevel, so instead get the following uplevel Firefox add-ons (from either the [[https://addons.mozilla.org/en-US/firefox/|official Firefox add-ons repository]] or the project links below): +
-    * XPather: download [[http://xpath.alephzarro.com/download|here]]documentation [[http://xpath.alephzarro.com/documentation|here]]. XPather can standalonebut works better in combination with ... +
-    * DOM Inspector: download [[https://developer.mozilla.org/en/dom_inspector#Getting_DOM_Inspector|here]]documentation [[https://developer.mozilla.org/en/dom_inspector#Documentation|here]] +
-Install all 3, then restart Firefox.+
  
-===== Chapter 4: DOM & HTML =====+This means the internet is often your best resource for finding help. If you run into a problem, the first thing you should do is type your problem into a search engine. More often than not someone has already asked your exact question, and someone else has provided an answer. You might even find entire websites dedicated to solving your particular problem. As far as coding goes, Zotero translators are quite basic; you will not come across a problem when writing a Zotero translator that no one has encountered before.
  
-[[http://niche: -canada.org/member-projects/zotero-guide/chapter4.html|HWZT chapter 4 (DOM & HTML)]]: information only, appears uptodateno deltas.+Likewise, if you encounter an error message you don't understand, cut and paste that error message into a search engine and surround it with quotation marks. You will likely find dozens of explanations why this error appeared and how to fix it. 
 + 
 +The more specific you can be about your problem, the better the results you will find. Don't be discouraged if you don't find the answer on your first search. Rephrase the search terms and try again. 
 + 
 +==== Online Tutorials ==== 
 + 
 +Your second best option is [[http://www.w3schools.com/|W3Schools]] tutorials. W3Schools has step-by-step tutorials for nearly every internet-related programming language. Particularly helpful for this project are their: 
 + 
 +  * [[http://www.w3schools.com/html/default.asp|HTML tutorial]] 
 +  * [[http://www.w3schools.com/htmldom/default.asp|HTML DOM tutorial]] 
 +  * [[http://www.w3schools.com/js/default.asp|JavaScript tutorial]] 
 + 
 +At W3Schools, you can find great reference charts that will show you at a glance all the different capabilities of JavaScript and HTML. These will come in handy when you want to do something and can't remember how. 
 + 
 +Apart from W3Schools, you can find many other tutorials online. Try typing in what you want to learn into a search engine and you will likely find a tutorial. Keep in mind that many tutorials teach you how to accomplish a specific task and may not teach you exactly what you're looking for. 
 + 
 +==== Forums ==== 
 + 
 +If you've Googled it, Yahoo'd it, looked it up on the W3Schools reference charts and tried various combinations of teas, coffees, and energy drinks to no avail, you're going to need to ask for help. There are numerous internet forums to which you can turn for this; just find a forum you like. Here are a couple to get you started: 
 + 
 +=== Webdeveloper.com === 
 + 
 +The [[http://www.webdeveloper.com/forum/index.php?|JavaScript forum at WebDeveloper.com]] is excellent, especially for code-related questions. 
 + 
 +If you can't figure out why you are getting a particular error message, or why you can't get information from point A to point B, this is the forum for you. At any given time there are over one hundred people logged into the forum just waiting to answer your question. If you post your problem here in a courteous manner, with a little bit of luck you will have a solution within a couple of hours. 
 + 
 +It may not be the instant gratification we've come to expect, but don't forget, these people are volunteering to help you, and most probably if you're desperate enough to ask for help, you could use a few hours away from the keyboard anyway. 
 + 
 +=== Zotero Forums === 
 + 
 +If your question is something specific to Zotero, such as "why can't I put anything in the field Loc. in archive?" the helpful men and women at WebDeveloper.com will have no idea how to answer your question. Instead, post it to the [[http://forums.zotero.org/categories/|Zotero forums]]. Don't expect an answer as quickly as you would get on a more popular forum — there are only so many Zotero programmers to go around. You should get an answer in a couple of days as long as you're clear in your description of your problem. 
 + 
 +==== Asking Good Questions ==== 
 + 
 +Clarity and specificity are your friends when it comes to asking for help on a forum. The people who read forums and offer their expertise are busy; make it easy for them by carefully thinking out your problem before you ask. Likewise, make sure you are asking a specific question to a narrowly defined problem. 
 + 
 +For example, don't post something like: "Why won't my translator work?" 
 + 
 +Instead, try: "Why am I getting a syntax error when I try to [[http://niche-canada.org/member-projects/zotero-guide/chapter6.html#pushExplanation|Push a value into an Object]]?" 
 + 
 +Always post the relevant section of your code (and only the relevant section of your code) along with your question. This will make it easier for the experts to help you solve your problem. If the answer you get does not do the trick and you are still stuck, be polite and try rephrasing the question. Remember, don't bite the hand that feeds you; these are volunteers and they're trying to help you! 
 + 
 +==== Debugging ==== 
 + 
 +To help you ask good questions you'll learn how to use the ''Zotero.debug()'' method in [[how_to_write_a_zotero_translator_plusplus#chapter_6js_variables|Chapter 6]]. This will let you figure out exactly which part of your code is not working. Until then, keep the following in mind: 
 + 
 +When fixing problem code, only change one thing at a time. If you are working on a section of code that has several issues, fix one problem and retry the code before moving on. Sometimes if you make three or four changes before retrying the program, you will inadvertently cause another unexpected problem. This can make you think your fix was incorrect, when in fact only the last change you made was wrong. Change one thing and make sure it works before moving on and you will prevent a lot of confusion. 
 + 
 +===== Chapter 3: Translator Tools ===== 
 + 
 +//Note:// the following is adapted from [[http://niche-canada.org/member-projects/zotero-guide/chapter3.html|HWZT chapter 3]] 
 + 
 +There are many software packages that can help you to efficiently write a translator. Here we will discuss only a few. [[wp>Firefox|Firefox]] and [[wp>Zotero|Zotero]] are essential for running and testing translators; fortunately they are also free. Scaffold 2.0 is not essential for translator writing, but it automates some tasks that you would otherwise need to know how to do. (It therefore makes documentation writing much easier %%:-)%% [[wp>Firebug_(web_development)|Firebug]] is not essential for translator writing, but it makes inspecting your [[wp>Document_Object_Model|DOM]] and acquiring [[wp>XPath|XPath]]s much easier. Finally, there are several [[wp>Source_code_editor|code editors]] which can help you write better Javascript faster. 
 + 
 +==== Firefox ==== 
 + 
 +[[http://getfirefox.com/|install Firefox]] 
 + 
 +Zotero is a Firefox add-on; therefore, to use Zotero, we must use Firefox. Don't worry, it's free and safe and very user-friendly. In fact, after a few days of using it, most people never look back. 
 + 
 +Zotero is only available with Firefox because Firefox is much more customizable than certain other proprietary browsers. The makers of Firefox released all the code that makes up the program and posted it online. This practice, known as "Open Source," is not particularly important for writing a translator, but it was crucial for the Zotero programmers who needed access to Firefox's code in order to write Zotero. 
 + 
 +It is possible to install more than one web browser on a computer. If you normally use another browser you don't have to worry about losing it or your bookmarks when you install Firefox. 
 + 
 +==== Zotero ==== 
 + 
 +[[http://www.zotero.org/|install Zotero]] 
 + 
 +As a Firefox "add-on," Zotero automatically runs whenever Firefox is in use. Open Zotero by clicking on the Zotero logo icon in the bottom right corner of the browser window. If you have never used Zotero, you can watch a video to help you get started. 
 + 
 +Be sure you have the most up to date version of Zotero. You can find this information on the [[http://www.zotero.org/|Zotero website]]. 
 + 
 +==== Scaffold ==== 
 + 
 +Install [[https://www.zotero.org/support/dev/translators/scaffold Scaffold]]. Note that it has been updated more recently than most of this wiki, and some features may have changed. 
 + 
 +The makers of Zotero created Scaffold specifically for writing translators. It's a sort of a "sandbox," which means you can muck around with the code without worrying about really messing anything up, and it automates some tasks. In this guide, all code will be written, tested, and retested in Scaffold. 
 + 
 +Once installed, launch Scaffold by navigating to the "Tools" menu in your Firefox window and select "Scaffold." 
 + 
 +You will learn to use Scaffold in [[how_to_write_a_zotero_translator_plusplus#chapter_6js_variables|Chapter 6]]. 
 + 
 +==== Firebug ==== 
 + 
 +[[http://getfirebug.com/|install Firebug]] 
 + 
 +Firebug helps you understand your %%DOM%% (think, the structure of the page you're scraping) and to find XPaths (think, directions to individual page items you want to scrape). Once installed, activate it by clicking on the bug icon in the bottom right corner of your Firefox window. You will learn in detail how to use this program in [[how_to_write_a_zotero_translator_plusplus#chapter_5xpath_directions|Chapter 5]]. 
 + 
 +==== Javascript editors ==== 
 + 
 +Code editors can help you write better code, and Javascript editors (or, more properly, "Javascript-aware" editors) can help you write better Javascript. This is important, since your translator is (in Zotero versions > 2.0) just a Javascript file. Unlike the software above, these editors do not run in Firefox; you launch them as you would any other desktop application. 
 + 
 +[[http://www.activestate.com/Products/komodo_ide/komodo_edit.mhtml|Komodo Edit]] is one example of such an editor. Once you start coding, you can use this program to show you where "syntax" errors appear in the code. Syntax errors are instances where you have not followed JavaScript's rules (think misspelling). If you get an error message that says something like "syntax error," or "unterminating literal," you can cut and paste your code into Komodo Edit. Then, under the "View" menu, select "View as Language" > "JavaScript". Just like in MSWord when you make a grammatical mistake, Komodo Edit should point you to the line in your code where you've made a syntax error, helping to save your eyes and sanity. 
 + 
 +There are many other such editors: feel free to google for one, or check for Javascript extensions for the editor you already use. 
 + 
 +===== Chapter 4: DOM and HTML ===== 
 + 
 +//Note:// 
 +  * the following is adapted from [[http://niche-canada.org/member-projects/zotero-guide/chapter4.html|HWZT chapter 4]] 
 +  * Words appearing between ''<'' and ''>'' are HTML elements. 
 +  * Words that appear between ''<!--'' and ''-->'' are HTML comments, intended for human readers, ignored by the browser. HTML comments do not tell your browser to do anything, but can be used to provide valuable information to anyone reading your code. 
 +  * If you do not recognize these markup structures, or find it difficult to follow this chapter, please take the [[http://www.w3schools.com/html/default.asp|W3Schools HTML tutorial]] before continuing. 
 + 
 +==== The Document Object Model ==== 
 + 
 +<html> 
 + 
 +<p class="style1">DOM stands for &quot;Document Object Model.&quot;</p> 
 +<p class="style1">It is not so much a thing as a way of describing how web pages are structured.</p> 
 +<p class="style1">Most people think of a web page much the same way as they think of a newspaper spread: there are words, pictures and headlines on various parts of the page. As far as we can tell, white space appears where nothing else has been placed. However, this is not how websites actually work.</p> 
 +<p class="style1">Web pages are actually comprised of a series of nodes.  These nodes are organized in a particular hierarchy, as defined by the person who wrote the web page, according to how they decided they wanted the page to function. But before we discuss that further, let&#39;s take a look at what a web page really is.</p> 
 + 
 +</html> 
 + 
 +==== Understanding HTML structure ==== 
 + 
 +<html> 
 + 
 +<p class="style1">If you&#39;ve ever written a basic web page, you know that it is really just an HTML document. These documents contain the page&#39;s content &mdash; the words, links, images &mdash; as well as a series of tags that help your browser understand at what it is looking.</p> 
 +<p class="style1">If you&#39;ve never written a website, go up to your &quot;View&quot; menu and click on &quot;Page Source.&quot; A new window will pop up with what is called &quot;source code.&quot;  This is what your browser interprets. The result of this interpretation is what you see in your browser when you go to the website.</p> 
 +<p class="style1">&quot;Don&#39;t worry if you can&#39;t understand most of the things you see in the source code; most major commercial organizations go out of their way to make the source code of their websites confusing so that people cannot copy their style and format.</p> 
 +<p class="style1">You certainly don&#39;t need to understand everything about web pages to write a Zotero translator. However, you will have to have a general understanding of how HTML documents are structured.</p> 
 +<p class="style1">Most newer websites contain many languages and markup styles in a typical page source. These include but are not limited to JavaScript, Java, PhP, Flash, CSS and XML.</p> 
 +<p class="style1">HTML will always appear between two sets of angle brackets &lt; &gt;. Looking for these will often make it easier for you to distinguish HTML from other markups and code. Many browsers, including Firefox, will colour code the source for you to make your job even easier. At this point, we are only interested in looking at the HTML bits, which means you can ignore everything else.</p> 
 +<p class="style1">For the most part, the tags we are interested in start with: &lt;div&gt;, &lt;span&gt;, &lt;table&gt;, &lt;tr&gt;, &lt;td&gt;, &lt;ul&gt;, &lt;li&gt;, &lt;p&gt;, &lt;img&gt;, &lt;a href&gt;, &lt;h1&gt;, &lt;h2&gt;, &lt;h3&gt;, &lt;h4&gt;, etc.</p>  
 +<p class="style1">Every HTML document will contain the same basic structure</p> 
 +<div class="example1"> 
 + <p class="style3"> 
 + <span class="smallType">Example 4.1</span><br/> 
 + <span class="indent1">&lt;html&gt;</span><br/> 
 + <span class="indent2">&lt;head&gt;</span><br/> 
 + <span class="indent3"><span class="comment1">&lt;!&ndash;&ndash;metadata. Generally not visible when visiting a webpage (except title). &ndash;&ndash;&gt;</span></span><br/> 
 + <span class="indent2">&lt;/head&gt;</span><br/><br/> 
 + 
 + <span class="indent2">&lt;body&gt;</span><br/> 
 + <span class="indent3"><span class="comment1">&lt;!&ndash;&ndash;the content of the page. Generally is visible. &ndash;&ndash;&gt;</span></span><br/> 
 + <span class="indent2">&lt;/body&gt;</span><br/> 
 + <span class="indent1">&lt;/html&gt;</span> 
 + </p> 
 +</div> 
 + 
 +<p class="style1">Example 4.1 is a fully functional &mdash; though boring &mdash; HTML document, which would display a blank white page.</p> 
 +<p class="style1">Notice that HTML tags always come in pairs. One to tell the browser, &lt;this is beginning&gt; and one to say &lt;/this is finished&gt;. What appears between that set of tags is the content of that set. In the example above, the content of the &lt;head&gt; and &lt;body&gt; tags are just HTML comments that a human reader could see when looking at the source code.</p> 
 +<p class="style1">Each pair of tags represents one node known as an &quot;element.&quot; The &lt;html&gt; node &mdash; part of every HTML document (see the first and last lines of the example) &mdash; is also known as the root node which is the first node in the &quot;document.&quot; All other nodes spring forth from this root node.</p> 
 +<p class="style1">The example above consists of three element nodesan &lt;html&gt; node, a &lt;head&gt; node and a &lt;body&gt; node.</p> 
 +<p class="style1">Take note that in our standard HTML document, all the tags are properly &quot;nested.&quot; This means that if a node overlaps another node it is always <em>completely</em> contained within that node. You will <strong>never</strong> see a proper HTML document with a structure like this:</p> 
 +<div class="example1"> 
 + <p class="style3"> 
 + <span class="smallType">Example 4.2</span><br/> 
 + <span class="indent1">&lt;html&gt;</span><br/> 
 + <span class="indent2">&lt;head&gt;</span><br/> 
 + <span class="indent3">&lt;body&gt;</span><br/> 
 + <span class="indent2">&lt;/head&gt;</span><br/> 
 + <span class="indent3">&lt;/body&gt;</span><br/> 
 + <span class="indent1">&lt;/html&gt;</span></p> 
 +</div> 
 + 
 +<p class="style1">The tags in Example 4.2 are not properly nested.</p> 
 +<p class="style1">In Example 4.1, notice the &lt;head&gt; node and &lt;body&gt; node are both wholly contained within the &lt;html&gt; node (the &lt;html&gt; and &lt;/html&gt; tags start before and close after both &lt;head&gt; and &lt;body&gt; have closed). The &lt;head&gt; and &lt;body&gt; nodes do not intersect with one another.</p> 
 +<p class="style1">Programmers have jargon to describe the relationships between these nodes:</p> 
 +<table> 
 + <tbody> 
 + <tr> 
 + <td>Parent:</td> 
 + <td>&lt;html&gt; is the parent of &lt;head&gt; and &lt;body&gt;</td> 
 + </tr> 
 + <tr> 
 + <td>Child:</td> 
 + <td>&lt;head&gt; and &lt;body&gt; are the children of &lt;html&gt;</td> 
 + </tr> 
 + <tr> 
 + <td>Sibling:</td> 
 + <td>&lt;head&gt; and &lt;body&gt; are siblings.</td> 
 + </tr> 
 + <tr> 
 + <td>Grandchild:</td> 
 + <td>This example doesn&#39;t have any grandchildren, but if we were to add another node (a link for example), to either &lt;head&gt; or &lt;body&gt;, that node would be the grandchild of &lt;html&gt;.</td> 
 + </tr> 
 + </tbody> 
 +</table>  
 + 
 +<p class="style1">A node can have an infinite number of child nodes, but only ever comes from <strong>one</strong> parent. Most pages are much more complicated than a blank white page. It is not uncommon for an HTML document to contain dozens or perhaps hundreds of element nodes.</p> 
 +<p class="style1">This network of relationships between nodes in an HTML document is the DOM.</p> 
 + 
 +</html> 
 + 
 +==== Why is the DOM important to you? ==== 
 + 
 +<html> 
 + 
 +<p class="style1">The DOM is how you give your computer directions to find a particular piece of information in an HTML document. Computers can&#39;t jump to a desired element node the way our eyes can jump to a particular point on the screen. They need to crawl to that node, visiting all the connected parent nodes along the way, starting at the root.</p> 
 +<p class="style1">Here is a simple Table in an HTML document on which to practice.</p> 
 + 
 +<div class="example1"> 
 + <p class="style3"> 
 + <span class="smallType">Example 4.3</span><br/> 
 + <span class="indent1">&lt;html&gt;</span><br/> 
 + <span class="indent2">&lt;head&gt;</span><br/> 
 + <span class="indent2">&lt;/head&gt;</span><br/><br/> 
 + <span class="indent2">&lt;body&gt;</span><br/> 
 + <span class="indent3">&lt;table&gt;</span><br/> 
 + <span class="indent4">&lt;tbody&gt;</span><br/> 
 + <span class="indent5">&lt;tr&gt;</span><br/> 
 + <span class="indent6">&lt;td>Precipitation&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td>Temperature&lt;/td&gt;</span><br/> 
 + <span class="indent5">&lt;/tr&gt;</span><br/> 
 + <span class="indent5">&lt;tr&gt;</span><br/> 
 + <span class="indent6">&lt;td>Rain&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td>Cool&lt;/td&gt;</span><br/> 
 + <span class="indent5">&lt;/tr&gt;</span><br/> 
 + <span class="indent4">&lt;/tbody&gt;</span><br/>  
 + <span class="indent3">&lt;/table&gt;</span><br/> 
 + <span class="indent2">&lt;/body&gt;</span><br/> 
 + <span class="indent1">&lt;/html&gt;</span> 
 + </p> 
 +</div> 
 + 
 +<p class="style1">Note: On a well&mdash;written web page, you can get some clues as to what is a child of what by looking at the indentation of the lines of code. For each indent, you have undergone another branching of the tree and therefore have a new child node. Unfortunately, not everyone writes clean, easy to read HTML like this. If the website you are translating does not, don&#39;t worry, there&#39;s a way to work around it that you&#39;ll learn about in the next chapter.</p> 
 +<p class="style1">Now there are 11 nodes.</p> 
 + 
 +<div class="example1"> 
 + <p class="style3"> 
 + <span class="smallType">Example 4.4</span><br/> 
 + </p> 
 + <ul class="indent"> 
 + <li class="style3">&lt;html&gt;</li> 
 + <li class="style3">&lt;head&gt;</li> 
 + <li class="style3">&lt;body&gt;</li> 
 + <li class="style3">&lt;table&gt;</li> 
 + <li class="style3">&lt;tbody&gt;</li> 
 + <li class="style3">&lt;tr&gt;</li> 
 + <li class="style3">&lt;tr&gt;</li> 
 + <li class="style3">&lt;td&gt;</li> 
 + <li class="style3">&lt;td&gt;</li> 
 + <li class="style3">&lt;td&gt;</li> 
 + <li class="style3">&lt;td&gt;</li> 
 + </ul> 
 +</div> 
 + 
 +<p class="style1">Note: &lt;tr&gt; stands for &quot;Table Row&quot; and &lt;td&gt; for &quot;Table Data.&quot; This is standard HTML.</p> 
 +<p class="style1">Your browser displays this HTML document as</p> 
 +<div class="example1"> 
 + <p class="style3"> 
 + <span class="smallType">Example 4.5</span> 
 + </p> 
 + <table> 
 + <tbody> 
 + <tr> 
 + <td class="style3">Precipitation</td> 
 + <td class="style3">Temperature</td> 
 + </tr><tr> 
 + <td class="style3">Rain</td> 
 + <td class="style3">Cool</td> 
 + </tr> 
 + </tbody> 
 + </table> 
 +</div> 
 + 
 +<p class="style1">If you wanted to tell someone where to find &quot;Cool&quot; in this table, you would probably say, &quot;Look in the bottom right corner.&quot;</p> 
 +<p class="style1">To give our computer instructions to find &quot;Cool&quot; you have to tell it where to find that particular node.</p> 
 + 
 +<div class="example1"> 
 + <p class="style3"><span class="smallType">Example 4.6</span><br/>&lt;html&gt; &rarr; &lt;body&gt; &rarr; &lt;table&gt; &rarr; &lt;tbody&gt; &rarr; 2nd &lt;tr&gt; &rarr; 2nd &lt;td&gt;</p> 
 +</div> 
 + 
 +<p class="style1">If you want to visualize this, a tree structure is perhaps easiest to understand.</p> 
 +<div class="example1"> 
 + <p class="style2"><img src="/member-projects/zotero-guide/images/DOM1.jpg" alt="DOM tree" class="middleImg" /><br/><span class="smallType">Fig 4.1: The DOM as a Tree</span></p> 
 +</div>  
 +  
 + 
 +<p class="style1">In plain English, you&#39;re looking for the second table data, contained in the second table row, of the table body, of the table, which is part of the body, which in turn is part of the document.</p> 
 +<p class="style1">Note: Notice that &quot;Cool&quot; is not the 4th &lt;td&gt; node. Rather, it is the 2nd &lt;td&gt; of the 2nd &lt;tr&gt;. This distinction will become important later (and will be addressed in <a href="/member-projects/zotero-guide/chapter5.html">Chapter 5</a>).</p> 
 + 
 +</html> 
 + 
 +=== A quick note on aunts and uncles === 
 + 
 +<html> 
 + 
 +<p class="style1">We have already learned a little family jargon to help us understand the DOM. We know that &lt;tbody&gt; is the grandparent of &lt;td&gt;. We know that &lt;head&gt; and &lt;body&gt; are siblings. But, not all nodes are related. In the DOM, aunts and uncles count for nothing.</p> 
 +<div class="example1"> 
 + <p class="style2"><img src="/member-projects/zotero-guide/images/DOM2.jpg" alt="Aunts &amp; Uncles" class="middleImg" /><br/><span class="smallType">Fig 4.2: Aunts &amp; Uncles in the DOM</span></p> 
 +</div>  
 +  
 +<p class="style1">These two nodes are essentially unrelated. You cannot tell the computer to get the &quot;Cool&quot; cell of the table by traveling through the first &lt;tr&gt; node. A legitimate path to &quot;Cool&quot; can only contain the first &lt;tr&gt; node if it also contains the second &lt;tr&gt; &mdash; the parent of &quot;Cool.&quot;</p> 
 +<p class="style1">For Example: </p> 
 +<div class="example1"> 
 + <p class="style2"><img src="/member-projects/zotero-guide/images/DOM3.jpg" alt="DOM tree 3" class="middleImg" /><br/><span class="smallType">Fig 4.3: DOM Tree w. Multiple Results</span></p> 
 +</div>  
 + 
 +<div class="example1"> 
 + <p class="style3"><span class="smallType">Example 4.7</span><br/> 
 + &lt;html&gt; &rarr; &lt;body&gt; &rarr; &lt;table&gt; &rarr; &lt;tbody&gt; &rarr; &lt;tr&gt; &rarr; 2nd &lt;td&gt;</p> 
 +</div> 
 + 
 +<p class="style1">Notice we have simply been less specific about what we were looking for (removed the 2nd in front of &lt;tr&gt;)but we now have created a path that directs us to both &quot;Temperature&quot; and &quot;Cool.&quot; This may or may not be desirable as you will see in <a href="/member-projects/zotero-guide/chapter5.html">Chapter 5</a>.</p> 
 +<p class="style1">Nodes need not be tables. Any HTML element is a node. Only certain kinds of nodes contain information that is displayed on web pages:</p> 
 + 
 +<table> 
 + <tbody> 
 + <tr> 
 + <td>&lt;h1&gt;</td> 
 + <td>A headline tag. Generally appears as big and bold (though that can be changed. There are also smaller renditions for less important headlines: &lt;h2&gt;&lt;h3&gt;&lt;h4&gt;etc.</td> 
 + </tr><tr> 
 + <td>&lt;p&gt;</td> 
 + <td>Paragraph tag. Usually contains text, but can also include images and links.</td> 
 + </tr><tr> 
 + <td>&lt;img&gt;</td> 
 + <td>Image tag. A link to the image, which your browser finds so that it can display the actual image</td> 
 + </tr><tr> 
 + <td>&lt;a href&gt;</td> 
 + <td>Link tag. Allows designers to embed a link in a word or series of words</td> 
 + </tr><tr> 
 + <td>&lt;li&gt;</td> 
 + <td>List item tag. Appears as an item or series of items, often accompanied by bullets.</td> 
 + </tr><tr> 
 + <td>&lt;td&gt;</td> 
 + <td>Table data tag. The content found in a table like the one we practiced on earlier.</td> 
 + </tr> 
 + </tbody> 
 +</table> 
 + 
 +<p class="style1">The rest of the nodes serve other purposes, often related to how or where a displayed node will appear on the page. Some examples are:</p> 
 + 
 +<table> 
 + <tbody> 
 + <tr> 
 + <td>&lt;div&gt;</td> 
 + <td>Division tag. Used to separate code into manageable chunks that can be moved around and formatted in a certain way.</td> 
 + </tr><tr> 
 + <td>&lt;span&gt;</td> 
 + <td>Span tag. Used to change the look of smaller pieces of code than &lt;div&gt; would be used for. A few words, for example.</td> 
 + </tr><tr> 
 + <td>&lt;ul&gt;</td> 
 + <td>Unordered List. Used in conjunction with a &lt;li&gt; tag.</td> 
 + </tr><tr> 
 + <td>&lt;table&gt;</td> 
 + <td>Table tag. Defines the opening of a table.</td> 
 + </tr><tr> 
 + <td>&lt;tr&gt;</td> 
 + <td>Table row tag. Defines a new line in a table.</td> 
 + </tr> 
 + </tbody> 
 +</table> 
 + 
 +</html> 
 + 
 +==== Practice ==== 
 + 
 +<html> 
 + 
 +<p class="style1">If you feel confident that you understand the DOM and how to find various element nodes in it, you can skip ahead to the next section. If you would like some more practice, here is a sample HTML document and some questions through which to work.</p> 
 + 
 +<div class="example1"> 
 + <p class="style3"><span class="smallType">Example 4.8</span><br/> 
 + 
 + <span class="indent1">&lt;html&gt;</span><br/> 
 + <span class="indent2">&lt;head&gt;</span><br/> 
 + <span class="indent2">&lt;/head&gt;</span><br/><br/> 
 + 
 + <span class="indent2">&lt;body&gt;</span><br/> 
 + <span class="indent3">&lt;table&gt;</span><br/> 
 + <span class="indent4">&lt;tbody&gt;</span><br/> 
 + <span class="indent5">&lt;tr&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;Day&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;Month&lt;/td&gt;</span><br/> 
 + <span class="indent5">&lt;/tr&gt;</span><br/> 
 + <span class="indent5">&lt;tr&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;Wednesday&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;September&lt;/td&gt;</span><br/> 
 + <span class="indent5">&lt;/tr&gt;</span><br/> 
 + <span class="indent4">&lt;/tbody&gt;</span><br/> 
 + <span class="indent3">&lt;/table&gt;</span><br/> 
 + <span class="indent3">&lt;a href=&quot;http://niche-canada.org&quot;&gt;NiCHE Homepage&lt;/a&gt;</span><br/><br/> 
 + 
 + <span class="indent3">&lt;table&gt;</span><br/> 
 + <span class="indent4">&lt;tbody&gt;</span><br/> 
 + <span class="indent5">&lt;tr&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;Title&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;Author&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;Place&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;Publisher&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;&lt;img src=&quot;http://nicheLogo.jpg&quot;&gt;&lt;/td&gt;</span><br/> 
 + <span class="indent5">&lt;/tr&gt;</span><br/> 
 + <span class="indent5">&lt;tr&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;NiCHE Homepage&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;Adam Crymble&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;London, ON&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;NiCHE&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;2009&lt;/td&gt;</span><br/> 
 + <span class="indent5">&lt;/tr&gt;</span><br/> 
 + <span class="indent5">&lt;tr&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;&lt;a href=&quot;http://www.Zotero.org&quot;&gt;Zotero&lt;/a&gt;&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;&lt;img src=&quot;http://AdamPhoto.jpg&quot;&gt;&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;&lt;img src=&quot;http://ZoteroLogo.jpg&quot;&gt;&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;&lt;a href=&quot;http://niche-canada.org&quot;&gt;NiCHE&lt;/a&gt;&lt;/td&gt;</span><br/> 
 + <span class="indent6">&lt;td&gt;Copyright Crymble&lt;td&gt;</span><br/> 
 + <span class="indent5">&lt;/tr&gt;</span><br/> 
 + <span class="indent4">&lt;/tbody&gt;</span><br/> 
 + <span class="indent3">&lt;/table&gt;</span><br/> 
 + <span class="indent2">&lt;/body&gt;</span><br/> 
 + <span class="indent1">&lt;/html&gt;</span> 
 + </p> 
 +</div> 
 +<br/> 
 +<ol> 
 + <li>How many element nodes are there in this example?</li> 
 + <li>How many children does the &lt;body&gt; node have?</li> 
 + <li>How many great&mdash;great&mdash;grandchildren does the &lt;body&gt; node have?</li> 
 + <li>Describe a path to find the link to the Home Page</li> 
 + <li>Describe the path to tell the computer how to find the element node containing the text &quot;Adam Crymble&quot;</li> 
 + <li>Describe <strong>one</strong> path to find &quot;Day&quot; <strong>and</strong> &quot;Wednesday.&quot;</li> 
 + <li>Describe <strong>one</strong> path that will lead to all the data contained in the 2nd &lt;table&gt;.</li> 
 + <li>Describe a path that will lead to all the images contained in the 2nd &lt;table&gt;.</li> 
 + <li>Describe a path that will lead to only the 2nd and 3rd images in the &lt;table&gt;.</li> 
 +</ol> 
 +<p class="style1"><a href="/member-projects/zotero-guide/chapter4answers.html">View Answers</a></p> 
 + 
 +</html> 
 + 
 +==== What you should understand before moving on ==== 
 + 
 +<html> 
 + 
 +<ul class="indent"> 
 + <li>At this point, you should understand the fundamentals of HTML documents and the HTML tags that make up those documents.</li> 
 + <li>Even if you cannot read the source code of a complicated website, you should understand the basic components, including &lt;html&gt;, &lt;head&gt;, and &lt;body&gt; as well as the common HTML elements used to output information to users: tables, text, images and links.</li> 
 + <li>You should understand nodes and the relationships that they have with other nodes within an HTML document.</li> 
 + <li>You should be able to explain in plain English how to find any element node in a basic HTML document using the DOM.</li> 
 + <li>You might read in other texts that you should &quot;Access the DOM.&quot; For your purposes, you can substitute &quot;Use an XPath&quot; for these words. You&#39;ll learn how to do this in <a href="/member-projects/zotero-guide/chapter5.html">Chapter 5</a> and <a href="/member-projects/zotero-guide/chapter11.html">Chapter 11</a>.</li> 
 +</ul> 
 + 
 +</html> 
 + 
 +==== Further Reading ==== 
 + 
 +<html> 
 + 
 +<ul class="indent"> 
 + <li><a href="http://www.w3schools.com/html/default.asp">W3Schools HTML tutorial</a></li> 
 + <li><a href="http://www.w3schools.com/htmldom/default.asp">W3Schools HTML DOM tutorial</a></li>  
 + <li><a href="http://www.w3schools.com/JS/js_obj_htmldom.asp">W3Schools JS DOM tutorial</a></li> 
 +</ul> 
 + 
 +</html>
  
 ===== Chapter 5: XPath directions ===== ===== Chapter 5: XPath directions =====
  
-[[http:/XPath directions/niche-canada.org/member-projects/zotero-guide/chapter5.html|HWZT chapter 5 (XPath directions)]]:+[[http://niche-canada.org/member-projects/zotero-guide/chapter5.html|HWZT chapter 5 (XPath directions)]]:
  
 The {DOM Inspector + XPather} workflow differs from that of Solvent. After opening the [[http://niche-canada.org/member-projects/zotero-guide/sample1.html|first sample page]], The {DOM Inspector + XPather} workflow differs from that of Solvent. After opening the [[http://niche-canada.org/member-projects/zotero-guide/sample1.html|first sample page]],
-  - Open DOM Inspector (aka //DI//) with C-S-i or from the Firefox main menu with Tools>DOM Inspector. XPather functionality is available from UI within the DI window.+  - Open DOM Inspector (aka //DI//) with CTRL+SHIFT+C or from the Firefox main menu with Tools>DOM Inspector. XPather functionality is available from UI within the DI window.
   - Hit button=Inspect at the upper right of the DI window. This will open pane=Browser in the DI window displaying the contents of the first sample page.   - Hit button=Inspect at the upper right of the DI window. This will open pane=Browser in the DI window displaying the contents of the first sample page.
   - To test the XPath string denoting the heading (text="Method and Meaning in Canadian Environmental History") of the first sample page,   - To test the XPath string denoting the heading (text="Method and Meaning in Canadian Environmental History") of the first sample page,
Line 197: Line 647:
   - The URI of the sample page has changed since HWZT, so you will need to enter Target=<code>http://niche-canada.org/member-projects/zotero-guide/</code>   - The URI of the sample page has changed since HWZT, so you will need to enter Target=<code>http://niche-canada.org/member-projects/zotero-guide/</code>
   - Hit button="Test Regex". You should get a result, in the "Test Frame" on the right of the tab, similar to that described in HWZT.   - Hit button="Test Regex". You should get a result, in the "Test Frame" on the right of the tab, similar to that described in HWZT.
-  - Instead of <code>Click on the "Detect Code" tab</code>, click on tab=Code.  +  - Instead of <code>Click on the "Detect Code" tab</code>, click on tab=Code. This is the tab where code should be entered
-  - In that tab enter <code>function detectWeb(doc, url) {+  - To excute code and debug, HWZT has you click an "Execute" button with a thunderbolt icon.  In newer versions of Scaffold, the single execute button as been replaced by one "Run doWeb" button (thunderbolt icon) and one "Run detectWeb" button (eye icon).  Starting in HWZT exampe 11.6, you'll be writing detectWeb functions, so you'll need to click the "Run detectWeb" button to run your detectWeb function.  For example, at example 11.6, when you click "Run detectWeb", you should get results like <code>12:00:00 Title:</code> 
 + 
 + 
 +As noted at HWZT, example 11.4, certain code needs to be included inside the top of every Function in which you have an XPath (container).  So, putting everything together, the code for example 11.6 should look like this: <code>function detectWeb(doc, url) {
   var namespace = doc.documentElement.namespaceURI;   var namespace = doc.documentElement.namespaceURI;
   var nsResolver = namespace ? function(prefix) {   var nsResolver = namespace ? function(prefix) {
Line 208: Line 661:
   Zotero.debug(myXPathObject);   Zotero.debug(myXPathObject);
 }</code>  }</code> 
-  - Click on icon="Run detectWeb" (the eye): you should get results like <code>12:00:00 Title:</code> 
  
-The code for the second complete Scaffold example (from "Example 11.10") is similarly+The code for the second complete Scaffold example (from "Example 11.9") is similarly
 <code>function detectWeb(doc, url) { <code>function detectWeb(doc, url) {
   var namespace = doc.documentElement.namespaceURI;   var namespace = doc.documentElement.namespaceURI;
dev/how_to_write_a_zotero_translator_plusplus.txt · Last modified: 2017/11/19 19:24 by adamsmith