Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
dev:technologies [2011/04/21 14:50] – [XPath] rmzelledev:technologies [2017/12/12 04:56] (current) – removed sean
Line 1: Line 1:
-A very brief introduction to some commonly used technologies used in translator development. 
- 
-====== XPath ====== 
-XPath provides a way to refer to specific parts of HTML or XML documents. It's usually the best way to extract data from webpages when writing a translator. 
- 
-An XPath expression is a chain of pieces that specify the path to a node of the document. The main pieces of expressions are: 
- 
-  * ''/'' Separator between parts of the path 
-  * ''*'' match any tag 
-  * ''%%//%%'' one or more levels deeper 
-  * ''..'' go up one level 
-  * ''[]'' match a tag that has this (the contents of the brackets) 
-  * ''@key'' an attribute named ''key'' 
-  * ''text()'' match a text node 
-  * ''[2]'' match the second matching node  
-  * ''[last()]'' match the last matching node 
-  * ''div[@class="important"]'' match a ''<div>'' with the attribute ''class'', with the value ''important''. 
-  * ''td[contains(text(),"Expect")]'' match a ''<td>'' which contains text that contains "Expect" 
-  * Plus much more. See the [[http://www.w3.org/TR/xpath/|XPath specification]] and the [[https://developer.mozilla.org/en/xpath|XPath documentation]] of the Mozilla Developer Network. 
- 
-The best introduction to XPath for use in translators is Mozilla's [[https://developer.mozilla.org/en/Introduction_to_using_XPath_in_JavaScript|Introduction to using XPath in JavaScript]], but it may be even easier to model your code off of the logic in existing translators, which provide a wide array of XPath techniques to pick apart fussy sites. 
- 
-=== Examples === 
-<code html> 
-<div id="names"> 
-  <span class="editor">George Spelvin</span>, 
-  <span class="translator">Andrea Johnson</span> 
- </div> 
- <table> 
-  <tr class="odd"> 
-   <td>Great Expectations</td> 
-   <td>Mediocre Plans</td> 
-  </tr> 
- </table> 
-</code> 
-For the sample document above, these expressions would refer to... 
-  * ''%%//tr[@class="odd"]/td%%'': a result set with the nodes ''<td>Great Expectations</td>'' and ''<td>Mediocre Plans</td>'' 
-  * ''%%//table//td%%'': Same as previous 
-  * ''%%//span[@class="editor"]%%'': a result set with the single node ''<span class="editor">George Spelvin</span>'' 
- 
-====== Regular Expressions ====== 
-  * ''.'' matches any character 
-  * ''[a-z01]'' matches any of the lowercase English letters and the numbers 0 and 1 
-  * ''()'' surround a match expression 
-  * ''+'' Match one or more of the preceding expression 
-  * ''*'' Match 0 or more of the preceding expression 
-  * ''?'' Match 0 or 1 of the preceding expression 
  
dev/technologies.1303411809.txt.gz · Last modified: 2011/04/21 14:50 by rmzelle