Studying the Altmetrics of Zotero Data

In April of last year, we announced a partnership with the University of Montreal and Indiana University, funded by a grant from the Alfred P. Sloan Foundation, to examine the readership of reference sources across a range of platforms and to expand the Zotero API to enable bibliometric research on Zotero data.

The first part of this grant involved aggregating anonymized data from Zotero libraries. The initial dataset was limited to items with DOIs, and it included library counts and the months that items were added. For items in public libraries, the data also included titles, creators, and years, as well as links to the public libraries containing the items. We have been analyzing this anonymized, aggregated data with our research partners in Montreal, and now are beginning the process of making that data freely and publicly available, beginning with Impactstory and Altmetric, who have offered to conduct preliminary analysis (we’ll discuss Impactstory’s experience in a future post).

In our correspondence with Altmetric over the years, they have repeatedly shown interest in Zotero data, and we reached out to them to see if they would partner with us in examining the data. The Altmetric team that analyzed the data consists of about twenty people with backgrounds in English literature and computer science, including former researchers and librarians. Altmetric is interested in any communication that involves the use or spread of research outputs, so in addition to analyzing the initial dataset, they’re eager to add the upcoming API to their workflow.

The Altmetric team parsed the aggregated data and checked it against the set of documents known to have been mentioned or saved elsewhere, such as on blogs and social media. Their analysis revealed that approximately 60% of the items in their database that had been mentioned in at least one other place, such as on social media or news sites, had at least one save in Zotero. The Altmetric team was pleased to find such high coverage, which points to the diversity of Zotero usage, though further research will be needed to determine the distribution of items across disciplines.

The next step forward for the Altmetric team involves applying the data to other projects and tools such as the Altmetric bookmarklet. The data will be useful in understanding the impact of scholarly communication, because conjectures about reference manager data can be confirmed or denied, and this information can be studied in order to gain a greater comprehension of what such data represents and the best ways to interpret it.

Based on this initial collaboration, Zotero developers are verifying and refining the aggregation process in preparation for the release of a public API and dataset of anonymized, aggregated data, which will allow bibliometric data to be highlighted across the Zotero ecosystem and enable other researchers to study the readership of Zotero data.