Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
pdf_fulltext_indexing [2007/08/16 02:00] dstillmanpdf_fulltext_indexing [2018/04/29 19:11] dstillman
Line 1: Line 1:
-** This document is out of date. Zotero 1.0.0rc1 can automatically install the Xpdf tools through the Search preference pane. **+====== PDF Full-Text Indexing ======
  
-Zotero 1.0 Beta 4 and higher include experimental support for fulltext indexing of PDF documents. +Zotero uses tools from the [[https://www.xpdfreader.com/|Xpdf project]] to extract full-text content from PDFs for searchingSince 5.0.36, the PDF tools are bundled with Zotero and do not need to be downloaded separately as in previous versions.
- +
-PDF fulltext indexing currently requires an external program to convert embedded text in PDF files into plaintext cache files. By default, Zotero uses the pdftotext utility from the [[http://www.foolabs.com/xpdf/|Xpdf]] project, which is an open-source, cross-platform PDF viewer. +
- +
-Precompiled binaries of Xpdf for Windows and Linux can be downloaded from [[http://www.foolabs.com/xpdf/download.html|the project site]]. Mac users can install Xpdf from source or use [[http://www.macports.org/|MacPorts]] (recommended). After installing MacPortstype ''sudo port install xpdf'' to begin the installation. +
- +
-The pdftotext utility can either be placed directly in the Zotero data directory or linked to from there. Either way, a platform-specific file must be created in the Zotero data directory, conforming to the format "pdftotext-''{platform}''", where ''{platform}'' is "Win32", "MacIntel", "MacPPC", "Linux-i686", etc. (To determine your current platform, type ''javascript:alert(navigator.platform)'' in the Firefox URL bar and hit Enter.)  The Windows version requires the .exe extension, i.e. "pdftotext-Win32.exe"+
- +
-For example, for Intel Mac users who have installed Xpdf via MacPorts, PDF indexing can be enabled by changing to the Zotero data directory via the Terminal and typing: +
- +
-<code> +
-ln -s /opt/local/bin/pdftotext pdftotext-MacIntel +
-</code> +
- +
-After restarting Firefox, new snapshots should automatically be indexed when added to the Library. A future version of Zotero will allow the reindexing of existing files. +
- +
-By default, Zotero will index up to the first 100 pages of a PDF or 500,000 characters, whichever comes first. This value can be adjusted by changing the ''extensions.zotero.fulltext.pdfMaxPages'' and ''extensions.zotero.fulltext.textMaxLength'' preferences via about:config. The default behavior may be adjusted in future versions. +
- +
- +
-Note that PDF fulltext indexing will not work with files that contain only images, though some image-based PDFs also include a hidden layer of searchable text.((As of March 28, 2007, [[http://www.jstor.org|JSTOR]] is [[http://www.jstor.org/about/newfeatures.html|including an embedded text layer in its PDFs]].))+
pdf_fulltext_indexing.txt · Last modified: 2018/04/29 19:11 by dstillman