Differences

This shows you the differences between two versions of the page.

--- pdf_fulltext_indexing [2007/08/16 02:00] – dstillman
+++ pdf_fulltext_indexing [2018/04/29 19:11] – dstillman
@@ Line 1: / Line 1: @@
-** This document is out of date. Zotero 1.0.0rc1 can automatically install the Xpdf tools through the Search preference pane. **
+====== PDF Full-Text Indexing ======
-Zotero 1.0 Beta 4 and higher include experimental support for fulltext indexing of PDF documents.
+Zotero uses tools from the [[https://www.xpdfreader.com/|Xpdf project]] to extract full-text content from PDFs for searching. Since 5.0.36, the PDF tools are bundled with Zotero and do not need to be downloaded separately as in previous versions.
-PDF fulltext indexing currently requires an external program to convert embedded text in PDF files into plaintext cache files. By default, Zotero uses the pdftotext utility from the [[http://www.foolabs.com/xpdf/|Xpdf]] project, which is an open-source, cross-platform PDF viewer.
-Precompiled binaries of Xpdf for Windows and Linux can be downloaded from [[http://www.foolabs.com/xpdf/download.html|the project site]]. Mac users can install Xpdf from source or use [[http://www.macports.org/|MacPorts]] (recommended). After installing MacPorts, type ''sudo port install xpdf'' to begin the installation.
-The pdftotext utility can either be placed directly in the Zotero data directory or linked to from there. Either way, a platform-specific file must be created in the Zotero data directory, conforming to the format "pdftotext-''{platform}''", where ''{platform}'' is "Win32", "MacIntel", "MacPPC", "Linux-i686", etc. (To determine your current platform, type ''javascript:alert(navigator.platform)'' in the Firefox URL bar and hit Enter.)  The Windows version requires the .exe extension, i.e. "pdftotext-Win32.exe".
-For example, for Intel Mac users who have installed Xpdf via MacPorts, PDF indexing can be enabled by changing to the Zotero data directory via the Terminal and typing:
-<code>
-ln -s /opt/local/bin/pdftotext pdftotext-MacIntel
-</code>
-After restarting Firefox, new snapshots should automatically be indexed when added to the Library. A future version of Zotero will allow the reindexing of existing files.
-By default, Zotero will index up to the first 100 pages of a PDF or 500,000 characters, whichever comes first. This value can be adjusted by changing the ''extensions.zotero.fulltext.pdfMaxPages'' and ''extensions.zotero.fulltext.textMaxLength'' preferences via about:config. The default behavior may be adjusted in future versions.
-Note that PDF fulltext indexing will not work with files that contain only images, though some image-based PDFs also include a hidden layer of searchable text.((As of March 28, 2007, [[http://www.jstor.org|JSTOR]] is [[http://www.jstor.org/about/newfeatures.html|including an embedded text layer in its PDFs]].))