WikiCite: Wikidata as a structured repository of bibliographic data - Talk at WikidataCon 2017
From Ewan McAndrew
WikiCite: Wikidata as a structured repository of bibliographic data
WikiCite is an ongoing effort to build an extensive bibliographic database in Wikidata to serve all Wikimedia projects. While the idea has been around for over a decade, the technology needed to sustain this effort is maturing, and the immediate goal of the initiative to produce a well-curated, high-quality structured dataset of all sources cited across Wikimedia projects in Wikidata is taking shape.
The current WikiCite project originated at Wikimania London and was supported by a first dedicated event in Berlin in the spring of 2016. In May 2017, a second, larger event took place in Vienna, co-located with the Wikimedia Hackathon, to further advance this work. The aim of the second event was to showcase progress made so far, identify technical gaps/needs, strengthen ties with key partners (such as Zotero, the Internet Archive, OCLC, Crossref, ORCID) as well as funders (the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, the Simons Foundation).
Significant progress has been made to date:
- the modeling of a schema to represent scholarly article metadata is nearly completed: WikiCite participants and other Wikidata users have catalogued over 2 million scientific articles on Wikidata and described over 3.3 million citations between scientific articles. These articles have begun to be used as references for Wikidata statements and could be used to improve the handling of references on other Wikimedia projects.
- tools and platforms to allow a rapid integration of references (such as SourceMD or PMIDTool) and to visualize the relations between scholarly knowledge and the rest of Wikidata (such as Scholia) have seen significant growth and adoption.
- multiple corpora have also been created as part of WikiCite, showcasing the value of linking up knowledge to its sources in a machine-readable way. Beyond efforts around the Zika Corpus, collaborations with scientific open data communities, such as the Gene Wiki project, Wikipathways, CIViC, Reactome are continuing.