Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
pdf_fulltext_indexing [2007/08/16 02:00] – dstillman | pdf_fulltext_indexing [2018/04/29 19:11] (current) – dstillman | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ** This document is out of date. Zotero 1.0.0rc1 can automatically install the Xpdf tools through the Search preference pane. ** | + | ====== PDF Full-Text Indexing ====== |
- | Zotero 1.0 Beta 4 and higher include experimental support for fulltext indexing of PDF documents. | + | Zotero uses tools from the [[https://www.xpdfreader.com/|Xpdf project]] to extract full-text content |
- | + | ||
- | PDF fulltext indexing currently requires an external program to convert embedded text in PDF files into plaintext cache files. By default, | + | |
- | + | ||
- | Precompiled binaries of Xpdf for Windows and Linux can be downloaded from [[http:// | + | |
- | + | ||
- | The pdftotext utility can either be placed directly in the Zotero data directory or linked | + | |
- | + | ||
- | For example, for Intel Mac users who have installed Xpdf via MacPorts, | + | |
- | + | ||
- | < | + | |
- | ln -s / | + | |
- | </ | + | |
- | + | ||
- | After restarting Firefox, new snapshots should automatically be indexed when added to the Library. A future version of Zotero will allow the reindexing of existing files. | + | |
- | + | ||
- | By default, Zotero will index up to the first 100 pages of a PDF or 500,000 characters, whichever comes first. This value can be adjusted by changing the '' | + | |
- | + | ||
- | + | ||
- | Note that PDF fulltext indexing will not work with files that contain only images, though some image-based PDFs also include a hidden layer of searchable text.((As of March 28, 2007, [[http:// | + |