How data papers present a unique contribution to open research in the humanities and social sciences - Mandy Wigdorowitz, Barbara McGillivray, Marton Ribary
From Neil Coleman
The open research movement and initiatives like the FAIR principles have been critical in establishing the importance of data in research, particularly within the sciences. Alongside the sciences, attention to openly available data in Humanities and Social Sciences (HSS) research has gradually grown. This growth is largely attributed to the increased availability of digital collections, the development of new data-intensive methods, an increasingly solid infrastructure, increased pressure from funders, the requirement of data management plans for preservation purposes, and the involvement of research libraries in data curation. In this context, attention to how data is produced, how it is openly and transparently shared, and how it can be reused has generated great interest, accompanied by an inevitable need for reputable data sharing outlets. One such outlet is the data paper – a peer-reviewed publication that focuses on describing a curated dataset. Data papers can be shared in traditional research journals as one subtype of article publication, or, more recently, in data journals which are dedicated to the publication of data papers. This presentation will focus on the work done by the open access Journal of Open Humanities (JOHD) in promoting the practice of publishing data papers with their accompanying open access datasets. JOHD was established with Ubiquity Press in 2015 to promote awareness, use, and reuse of humanities data. JOHD data papers promote the comprehensive description of how a dataset was assembled, where it may be accessed, and any crucial context including the research questions that framed the data gathering, including limitations to the original methods or scope of sources included. JOHD data papers suggest potential future reuses of data, which recent analytics seem to suggest has helped increase the visibility of datasets, and therefore their research impact (Marongiu et al., forthcoming; McGillivray et al., 2022). In addition, an overview of the three key elements (the “golden triangle”) that assess the impact of open research efforts as represented by different research outputs (datasets, data papers and research papers) will be presented, along with proposed initiatives for linking these. In doing so, we aim to (a) find a programmatic way to identify these links by extracting information from available metadata of datasets and verifying their accuracy, and (b) create a “ground truth” in a manual and/or machine-assisted way which would enable the training of more sophisticated NLP-based methods as a next step. We hope to illustrate the importance of including data papers into the research conversation given that they present a unique contribution to addressing global challenges within the open research arena.