Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
pdf_fulltext_indexing [2007/08/16 02:00] dstillmanpdf_fulltext_indexing [2018/04/29 19:11] (current) dstillman
Line 1: Line 1:
-** This document is out of date. Zotero 1.0.0rc1 can automatically install the Xpdf tools through the Search preference pane. **+====== PDF Full-Text Indexing ======
  
-Zotero 1.0 Beta 4 and higher include experimental support for fulltext indexing of PDF documents. +Zotero uses tools from the [[https://www.xpdfreader.com/|Xpdf project]] to extract full-text content from PDFs for searchingSince Zotero 5.0.36, the PDF tools are bundled with Zotero and do not need to be downloaded separately as in previous versions.
- +
-PDF fulltext indexing currently requires an external program to convert embedded text in PDF files into plaintext cache files. By default, Zotero uses the pdftotext utility from the [[http://www.foolabs.com/xpdf/|Xpdf]] project, which is an open-source, cross-platform PDF viewer. +
- +
-Precompiled binaries of Xpdf for Windows and Linux can be downloaded from [[http://www.foolabs.com/xpdf/download.html|the project site]]. Mac users can install Xpdf from source or use [[http://www.macports.org/|MacPorts]] (recommended). After installing MacPorts, type ''sudo port install xpdf'' to begin the installation. +
- +
-The pdftotext utility can either be placed directly in the Zotero data directory or linked to from thereEither way, a platform-specific file must be created in the Zotero data directory, conforming to the format "pdftotext-''{platform}''", where ''{platform}'' is "Win32", "MacIntel", "MacPPC", "Linux-i686", etc(To determine your current platformtype ''javascript:alert(navigator.platform)'' in the Firefox URL bar and hit Enter.)  The Windows version requires the .exe extension, i.e. "pdftotext-Win32.exe"+
- +
-For example, for Intel Mac users who have installed Xpdf via MacPorts, PDF indexing can be enabled by changing to the Zotero data directory via the Terminal and typing: +
- +
-<code> +
-ln -s /opt/local/bin/pdftotext pdftotext-MacIntel +
-</code> +
- +
-After restarting Firefox, new snapshots should automatically be indexed when added to the Library. A future version of Zotero will allow the reindexing of existing files. +
- +
-By default, Zotero will index up to the first 100 pages of a PDF or 500,000 characters, whichever comes first. This value can be adjusted by changing the ''extensions.zotero.fulltext.pdfMaxPages'' and ''extensions.zotero.fulltext.textMaxLength'' preferences via about:config. The default behavior may be adjusted in future versions. +
- +
- +
-Note that PDF fulltext indexing will not work with files that contain only images, though some image-based PDFs also include a hidden layer of searchable text.((As of March 28, 2007, [[http://www.jstor.org|JSTOR]] is [[http://www.jstor.org/about/newfeatures.html|including an embedded text layer in its PDFs]].))+
pdf_fulltext_indexing.txt · Last modified: 2018/04/29 19:11 by dstillman