This is an old revision of the document!


PDF Fulltext Indexing

Zotero supports fulltext indexing of PDF documents. PDF fulltext indexing currently requires an external program to convert embedded text in PDF files into plaintext cache files. By default, Zotero uses the pdftotext utility from the Xpdf project, which is an open-source, cross-platform PDF viewer. Additional functionality is available through another Xpdf utility, pdfinfo.

Basic Installation

Customized, platform-specific versions of pdftotext and pdfinfo can be downloaded and installed automatically through the Zotero preferences.

After installing the tools, new snapshots should automatically be indexed when added to the Library. Existing attachments can be indexed via the Zotero prefs.

Note that PDF fulltext indexing will not work with files that contain only images, though some image-based PDFs also include a hidden layer of searchable text.1)

Advanced Installation

Zotero requires modified binaries of pdftotext and pdfinfo on Windows (to prevent the command-prompt window from appearing at indexing time) and a custom build of pdfinfo on all platforms that supports writing to a text file (source code available).

Users wishing to install the Xpdf tools manually (or on platforms for which we haven't built customized binaries) can do so by building the tools and either placing the binaries directly in the Zotero data directory or linking to them from there. Either way, a platform-specific file must be created in the Zotero data directory, conforming to the format “pdftotext-{platform}”, where {platform} is “Win32”, “MacIntel”, “MacPPC”, “Linux-i686”, etc. (To determine your current platform, type javascript:alert(navigator.platform) in the Firefox URL bar and hit Enter.) The Windows version requires the .exe extension, i.e. “pdftotext-Win32.exe”. A text file containing the installed version number can also be created in the format pdftotext-{platform}.version.

pdf_fulltext_indexing.1244734223.txt.gz · Last modified: 2009/06/11 11:30 by rmzelle