Opened 9 years ago
Closed 5 years ago
#827 closed defect (fixed)
Don't save non-canonical URLs to URL field
| Reported by: | dstillman | Owned by: | mcburton |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | translators | Version: | 1.5 |
| Keywords: | helpwanted | Cc: | erazlogo, bdarcus |
Description
Only canonical URLs (so URIs, really--generally full-content pages direct from the publisher, like NYT articles) should be saved to the URL field. Others should be saved as links.
The most notable example is JSTOR, where you currently still get a value in the URL field--in addition to an identical link, if the snapshot pref is on. This means the URL shows up in citations as well, which is a very common complaint in the forums.
Change History (9)
comment:1 Changed 9 years ago by simon
- Cc erazlogo added
comment:2 Changed 9 years ago by dstillman
- Cc bdarcus added
Well, the citation issue is actually secondary to the real problem, which is that we're conflating different types of metadata in one field. The URL of an online NYT article is not equivalent to the database URL at which you retrieve a copy of a paper, since the first is canonical and the latter is quite likely available in other places in exactly the same form. This will become more of a problem when we start using values in the URL field as URIs on the server, but we should fix it now.
Essentially Chicago is saying that the database's version of the document is indeed a unique resource and should therefore be treated as the URL, but since there seems to be disagreement about that even within Chicago itself, I think it's better to be strict about what we mean by "URL" and add the notion of an "access URL". I can give you a data method to retrieve the earliest link attachment URL, and we could have a global pref to use that in the citation if the URL field is blank. That can also be handled at the CSL level, but it might necessitate adding the notion of canonical URLs and access URLs to CSL.
I can also add "Convert to attached link" and "Convert to URL" context menu options to help with fixing existing records.
comment:3 Changed 9 years ago by simon
Unfortunately, picking the earliest link attachment URL would pose a problem if one were to attach, for example, a book review or online errata to a book not available online. Is there any way around this that we can implement for 1.0.2? In 1.5, I hope that we can resolve this within the hierarchical data model.
comment:4 Changed 9 years ago by erazlogo
Chicago also gives the author leeway in deciding which URL to cite: 17.7 "When content is available from more than one online source, authors should consider whether, on the basis of the nature and practices of the publisher or sponsoring body, they have consulted the most permanent."
In theory, Zotero could automatically decide which URL is more permanent and add canonical URLs automatically, but if not, it seems the field should be editable by the user in the long run. CMS also doesn't address the issue of citing open access urls versus gated databases--even though Chicago would consider a gated database URL more "permanent," it would be more ethical to cite a reasonably permanent non-gated URL so everyone could find the source. As an open source project, Zotero should make this an option for authors if possible. See John Willinsky on this issue in relation to Wikipedia, http://www.firstmonday.org/issues/issue12_3/willinsky/index.html
Note also that Chicago gives the author the choice on whether to include the access date--maybe this should be an option in preferences: 17.12 "Access dates in online source citations are of limited value, since previous versions will often be unavailable to readers (not to mention that an author may have consulted several revisions across any number of days in the course of research). Chicago therefore does not generally recommend including them in a published citation. For sources likely to have substantive updates, however, or in time-sensitive fields such as medicine or law where even small corrections may be significant, the date of the author’s last visit to the site may usefully be added."
comment:5 Changed 9 years ago by simon
- Milestone changed from 1.0.2 to 1.5 Alpha 1
Deferring to 1.5 Alpha 1, since all complete resolutions appear to require schema changes. See #841 for the temporary resolution.
comment:6 Changed 9 years ago by simon
- Version changed from 1.0 to 1.5
comment:7 Changed 6 years ago by simon
- Component changed from ingester to translators
- Keywords helpwanted added
- Owner changed from simon to mcburton
comment:8 Changed 6 years ago by dstillman
- Milestone 2.0 Beta 3 deleted
Milestone 2.0 Beta 3 deleted
comment:9 Changed 5 years ago by ajlyon
- Resolution set to fixed
- Status changed from new to closed
I don't see what we are debating here. All new translators and translator edits take into account the necessity of not saving non-canonical URLs, and we're making the decision of what's canonical at the translator level-- and non-canonical ones are saved as attached links. The question raised by Elena of what to do about styles that require the accessed URL even when it is non-canonical (or non-stable?) is fair, but in the absence of concrete examples of styles that really want non-reliable URLs, it might as well be a contrafactual.
Apart from Dan's suggestion of a context menu option to convert URL fields into attached links (which I don't see happening, especially since we now have a way to add links to arbitrary URLs, #1563), I don't see why we can't close this ticket (as fixed, since we're slowly fixing translators).
If I'm being too hasty in closing this, feel free to re-open.
Chicago 17.8 says, "In many cases the content of the print and electronic forms of the same publication is identical, but the potential for differences, intentional or otherwise, requires that authors cite the form consulted." Then again, the CMoS Q&A says, "Notwithstanding the advice at 17.357, it can generally be considered unnecessary to cite the name or URL of a third-party database that provides access, typically through library Web sites, to published material. Instead, cite the original publication information of the article." Not sure which to believe in this case, but it seems like dissociating the URL from the metadata may not be the right way to solve this problem, since some styles may require the URL from which the article was accessed, even if it is also available in a published journal.
We could potentially handle the bibliography issues at the CSL level (if volume or issue numbers exist, don't include the URL for styles that state not to), or with a global preference.