Differences

This shows you the differences between two versions of the page.

--- dev:technologies [2011/04/21 14:50] – [XPath] rmzelle
+++ dev:technologies [2017/12/12 04:56] (current) – removed sean
@@ Line 1: / Line 1: @@
-A very brief introduction to some commonly used technologies used in translator development.
-====== XPath ======
-XPath provides a way to refer to specific parts of HTML or XML documents. It's usually the best way to extract data from webpages when writing a translator.
-An XPath expression is a chain of pieces that specify the path to a node of the document. The main pieces of expressions are:
-  * ''/'' Separator between parts of the path
-  * ''*'' match any tag
-  * ''%%//%%'' one or more levels deeper
-  * ''..'' go up one level
-  * ''[]'' match a tag that has this (the contents of the brackets)
-  * ''@key'' an attribute named ''key''
-  * ''text()'' match a text node
-  * ''[2]'' match the second matching node
-  * ''[last()]'' match the last matching node
-  * ''div[@class="important"]'' match a ''<div>'' with the attribute ''class'', with the value ''important''.
-  * ''td[contains(text(),"Expect")]'' match a ''<td>'' which contains text that contains "Expect"
-  * Plus much more. See the [[http://www.w3.org/TR/xpath/|XPath specification]] and the [[https://developer.mozilla.org/en/xpath|XPath documentation]] of the Mozilla Developer Network.
-The best introduction to XPath for use in translators is Mozilla's [[https://developer.mozilla.org/en/Introduction_to_using_XPath_in_JavaScript|Introduction to using XPath in JavaScript]], but it may be even easier to model your code off of the logic in existing translators, which provide a wide array of XPath techniques to pick apart fussy sites.
-=== Examples ===
-<code html>
-<div id="names">
-  <span class="editor">George Spelvin</span>,
-  <span class="translator">Andrea Johnson</span>
- </div>
- <table>
-  <tr class="odd">
-   <td>Great Expectations</td>
-   <td>Mediocre Plans</td>
-  </tr>
- </table>
-</code>
-For the sample document above, these expressions would refer to...
-  * ''%%//tr[@class="odd"]/td%%'': a result set with the nodes ''<td>Great Expectations</td>'' and ''<td>Mediocre Plans</td>''
-  * ''%%//table//td%%'': Same as previous
-  * ''%%//span[@class="editor"]%%'': a result set with the single node ''<span class="editor">George Spelvin</span>''
-====== Regular Expressions ======
-  * ''.'' matches any character
-  * ''[a-z01]'' matches any of the lowercase English letters and the numbers 0 and 1
-  * ''()'' surround a match expression
-  * ''+'' Match one or more of the preceding expression
-  * ''*'' Match 0 or more of the preceding expression
-  * ''?'' Match 0 or 1 of the preceding expression