1,721,011 research outputs found
Retractions in arts and humanities: an analysis of the retraction notices
The aim of this work is to understand the retraction phenomenon in the arts and humanities domain through an analysis of the retraction notices - formal documents stating and describing the retraction of a particular publication. The retractions and the corresponding notices are identified using the data provided by Retraction Watch. Our methodology for the analysis combines a metadata analysis and a content analysis (mainly performed using a topic modelling process) of the retraction notices. Considering 343 cases of retraction, we found that many retraction notices are neither identifiable nor findable. In addition, these were not always separated from the original papers, introducing ambiguity in understanding how these notices were perceived by the community (i.e. cited). Also, we noticed that there is no systematic way to write a retraction notice. Indeed, some retraction notices presented a complete discussion of the reasons for retraction, while others tended to be more direct and succinct. We have also reported many notices having similar text while addressing different retractions. We think a further study with a larger collection should be done using the same methodology to confirm and investigate our findings further
A quantitative and qualitative open citation analysis of retracted articles in the humanities
In this article, we show and discuss the results of a quantitative and
qualitative analysis of open citations to retracted publications in the
humanities domain. Our study was conducted by selecting retracted papers in the
humanities domain and marking their main characteristics (e.g., retraction
reason). Then, we gathered the citing entities and annotated their basic
metadata (e.g., title, venue, subject, etc.) and the characteristics of their
in-text citations (e.g., intent, sentiment, etc.). Using these data, we
performed a quantitative and qualitative study of retractions in the
humanities, presenting descriptive statistics and a topic modeling analysis of
the citing entities' abstracts and the in-text citation contexts. As part of
our main findings, we noticed that there was no drop in the overall number of
citations after the year of retraction, with few entities which have either
mentioned the retraction or expressed a negative sentiment toward the cited
publication. In addition, on several occasions, we noticed a higher
concern/awareness when it was about citing a retracted publication, by the
citing entities belonging to the health sciences domain, if compared to the
humanities and the social science domains. Philosophy, arts, and history are
the humanities areas that showed the higher concern toward the retraction
OpenCitations Index
A citation index is a bibliographic index recording citations between publications, allowing the user to establish which later documents cite earlier documents. Several citation indexes are already available, some of which are freely accessible but not downloadable (e.g. Google Scholar), while others can be accessed only by paying significant access fees (e.g. Web of Science and Scopus).
OpenCitations, as an infrastructure organization for open scholarship, has built the OpenCitations Index using the data available in particular bibliographic databases
Enabling text search on SPARQL endpoints through OSCAR
In this paper we introduce the latest version (Version 2.0) of OSCAR, the OpenCitations RDF Search Application, which has several improved features and extends the query workflow comparing with the previous version (Version 1.0) that we presented at the workshop entitled Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination (SAVE-SD 2018), held in conjunction with The Web Conference 2018. OSCAR is a user-friendly search platform that can be used to search any RDF triplestore providing a SPARQL endpoint, while hiding the complexities of SPARQL, thus making the search operations accessible to those who are not experts in Semantic Web technologies. We present here the basic features and the main extensions of this latest version of OSCAR. In addition, we demonstrate how it can be adapted to work with different SPARQL endpoints containing scholarly data, using as examples the OpenCitations Corpus (OCC) and the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI), both provided by OpenCitations, and also the Wikidata dataset provided by the Wikimedia Foundation. We conclude by reporting the usage statistics of OSCAR, retrieved from the OpenCitations website logs, so as to demonstrate its uptake
A protocol to gather, characterize and analyze incoming citations of retracted articles
In this article, we present a methodology which takes as input a collection
of retracted articles, gathers the entities citing them, characterizes such
entities according to multiple dimensions (disciplines, year of publication,
sentiment, etc.), and applies a quantitative and qualitative analysis on the
collected values. The methodology is composed of four phases: (1) identifying,
retrieving, and extracting basic metadata of the entities which have cited a
retracted article, (2) extracting and labeling additional features based on the
textual content of the citing entities, (3) building a descriptive statistical
summary based on the collected data, and finally (4) running a topic modeling
analysis. The goal of the methodology is to generate data and visualizations
that help understanding possible behaviors related to retraction cases. We
present the methodology in a structured step-by-step form following its four
phases, discuss its limits and possible workarounds, and list the planned
future improvements
The OpenCitations Index: description of a database providing open citation data
This article presents the OpenCitations Index, a collection of open citation data maintained by OpenCitations, an independent, not-for-profit infrastructure organisation for open scholarship dedicated to publishing open bibliographic and citation data using Semantic Web and Linked Open Data technologies. The collection involves citation data harvested from multiple sources. To address the possibility of different sources providing citation data for bibliographic entities represented with different identifiers, therefore potentially representing same citation, a deduplication mechanism has been implemented. This ensures that citations integrated into OpenCitations Index are accurately identified uniquely, even when different identifiers are used. This mechanism follows a specific workflow, which encompasses a preprocessing of the original source data, a management of the provided bibliographic metadata, and the generation of new citation data to be integrated into the OpenCitations Index. The process relies on another data collection—OpenCitations Meta, and on the use of a new globally persistent identifier, namely OMID (OpenCitations Meta Identifier). As of July 2024, OpenCitations Index stores over 2 billion unique citation links, harvest from Crossref, the National Institute of Heath Open Citation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center (JaLC). OpenCitations Index can be systematically accessed and queried through several services, including SPARQL endpoint, REST APIs, and web interfaces. Additionally, dataset dumps are available for free download and reuse (under CC0 waiver) in various formats (CSV, N-Triples, and Scholix), including provenance and change tracking information
Creating RESTful APIs over SPARQL endpoints using RAMOSE
Semantic Web technologies are widely used for storing RDF data and making them available on the Web through SPARQL endpoints, queryable using the SPARQL query language. While the use of SPARQL endpoints is strongly supported by Semantic Web experts, it hinders broader use of RDF data by common Web users, engineers and developers unfamiliar with Semantic Web technologies, who normally rely on Web RESTful APIs for querying Web-available data and creating applications over them. To solve this problem, we have developed RAMOSE, a generic tool developed in Python to create REST APIs over SPARQL endpoints. Through the creation of source-specific textual configuration files, RAMOSE enables the querying of SPARQL endpoints via simple Web RESTful API calls that return either JSON or CSV-formatted data, thus hiding all the intrinsic complexities of SPARQL and RDF from common Web users. We provide evidence that the use of RAMOSE to provide REST API access to RDF data within OpenCitations triplestores is beneficial in terms of the number of queries made by external users of such RDF data using the RAMOSE API, compared with the direct access via the SPARQL endpoint. Our findings show the importance for suppliers of RDF data of having an alternative API access service, which enables its use by those with no (or little) experience in Semantic Web technologies and the SPARQL query language. RAMOSE can be used both to query any SPARQL endpoint and to query any other Web API, and thus it represents an easy generic technical solution for service providers who wish to create an API service to access Linked Data stored as RDF in a triplestore
Developing Application Profiles for Enhancing Data and Workflows in Cultural Heritage Digitisation Processes
As a result of the proliferation of 3D digitisation in the context of cultural heritage projects, digital assets and digitisation processes – being considered as proper research objects – must prioritise adherence to FAIR principles. Existing standards and ontologies, such as CIDOC-CRM, play a crucial role in this regard, but they are often over-engineered for the need of a particular application context, thus making their understanding and adoption difficult. Application profiles of a given standard – defined as sets of ontological entities drawn from one or more semantic artefacts for a particular context or application – are usually proposed as tools for promoting interoperability and reuse while being tied entirely to the particular application context they refer to. In this paper, we present an adaptation and application of an ontology development methodology, i.e. SAMOD, to guide the creation of robust, semantically sound application profiles of large standard models. Using an existing pilot study we have developed in a project dedicated to leveraging virtual technologies to preserve and valorise cultural heritage, we introduce an application profile named CHAD-AP, that we have developed following our customised version of SAMOD. We reflect on the use of SAMOD and similar ontology development methodologies for this purpose, highlighting its strengths and current limitations, future developments, and possible adoption in other similar projects
OpenCitations Meta
OpenCitations Meta is a new database that contains bibliographic metadata of scholarly publications involved in citations indexed by the OpenCitations infrastructure. It adheres to Open Science principles and provides data under a CC0 license for maximum reuse. The data can be accessed through a SPARQL endpoint, REST APIs, and dumps. OpenCitations Meta serves three important purposes. Firstly, it enables disambiguation of citations between publications described using different identifiers from various sources. For example, it can link publications identified by DOIs in Crossref and PMIDs in PubMed. Secondly, it assigns new globally persistent identifiers (PIDs), known as OpenCitations Meta Identifiers (OMIDs), to bibliographic resources without existing external persistent identifiers like DOIs. Lastly, by hosting the bibliographic metadata internally, OpenCitations Meta improves the speed of metadata retrieval for citing and cited documents. The database is populated through automated data curation, including deduplication, error correction, and metadata enrichment. The data is stored in RDF format following the OpenCitations Data Model, and changes and provenance information are tracked. OpenCitations Meta and its production. OpenCitations Meta currently incorporates data from Crossref, DataCite, and the NIH Open Citation Collection. In terms of semantic publishing datasets, it is currently the first in data volume.26 pages, 7 figure
A Proposal for a FAIR Management of 3D Data in Cultural Heritage: The Aldrovandi Digital Twin Case
In this article we analyse 3D models of cultural heritage with the aim of answering three main questions: what processes can be put in place to create a FAIR-by-design digital twin of a temporary exhibition? What are the main challenges in applying FAIR principles to 3D data in cultural heritage studies and how are they different from other types of data (e.g. images) from a data management perspective? We begin with a comprehensive literature review touching on: FAIR principles applied to cultural heritage data; representation models; both Object Provenance Information (OPI) and Metadata Record Provenance Information (MRPI), respectively meant as, on the one hand, the detailed history and origin of an object, and - on the other hand - the detailed history and origin of the metadata itself, which describes the primary object (whether physical or digital); 3D models as cultural heritage research data and their creation, selection, publication, archival and preservation. We then describe the process of creating the Aldrovandi Digital Twin, by collecting, storing and modelling data about cultural heritage objects and processes. We detail the many steps from the acquisition of the Digital Cultural Heritage Objects (DCHO), through to the upload of the optimised DCHO onto a web-based framework (ATON), with a focus on open technologies and standards for interoperability and preservation.
Using the FAIR Principles for Heritage Library, Archive and Museum Collections [1] as a framework, we look in detail at how the Digital Twin implements FAIR principles at the object and metadata level. We then describe the main challenges we encountered and we summarise what seem to be the peculiarities of 3D cultural heritage data and the possible directions for further research in this field
- …
