Opened 9 years ago
Closed 9 years ago
#766 closed defect (fixed)
Zotero saves text/html URLs with .pdf extensions as PDFs
| Reported by: | dstillman | Owned by: | dstillman |
|---|---|---|---|
| Priority: | major | Milestone: | 1.0 RC 4 |
| Component: | data layer | Version: | 1.0 |
| Keywords: | Cc: | simon, stakats |
Description
So I think my workaround for #460 may have been unwise. It might be better for Zotero to simply refuse to import "PDF"s that return 'text/html' in the HEAD request in importFromURL() rather than forcing them to application/pdf, since with the current behavior it ends up creating broken fake PDFs that are actually login pages or other intermediate HTML pages.
If we did that, would there be another solution for sites like Oxford Journals? One option might be to hard-code a MIME type override in the translator, with translate.js passing a flag to importFromURL() to ignore the MIME type. But then the problem would still occur if a user tried to save a PDF via a right-click.
Hard-coding some of the bad sites in importFromURL() would be another option, and it'd make sense in that it's not a translator-specific issue, but I'm somewhat loath to start hard-coding sites in the data layer and the consequence of getting it wrong (downloading the PDF to the desktop or popping up the Firefox save dialog) is probably even worse than creating a broken PDF (which at least has a link back to the HTML page).
The crudest option would be to download the "PDF", inspect it, and delete the created attachment item if the PDF isn't valid (which is easy to test for). Inefficient, but probably the most reliable solution.
Anybody have thoughts?
Change History (2)
comment:1 Changed 9 years ago by simon
comment:2 Changed 9 years ago by dstillman
- Resolution set to fixed
- Status changed from new to closed
(In [1710]) Fixes #766, Zotero saves text/html URLs with .pdf extensions as PDFs
Addresses #460, importFromURL fails when importing PDFs from servers that do not properly support HEAD requests
Now inspects supposed PDFs after download and deletes if not actually PDF format
Also:
- Fixed bug when running importFromDocument() on a PDF on Windows that would result in an incomplete or missing (since r1688) attachment item
- importFromDocument() no longer returns an itemID, since it can be partly asynchronous now
- Added rudimentary 'text/html' support for Zotero.MIME.sniffForMIMEType()
I'd go for the crude option of inspecting the PDF. If it's just a page notifying the user s/he can't get the PDF, it will only be a few kilobytes of wasted download anyway.