What does the R in FAIR Data stand for?
Welcome to my 5-minute non-technical explanation of the reusability component of the FAIR Data Principle.
John was delighted after he found, accessed and interoperated the datasets relevant to his analytical work. Not minding the time he spent on finding, accessing, and integrating the datasets, he submitted his dissertation, and to the admiration of all, he got an excellent score. A few months later, with the decision of the institution’s governing council, he was certified competent in learning and character and was also awarded. Researching in such a restrictive field was a bold feat for him, and with his efforts to preserve and advance knowledge, he thought about making his research findings public.
At the behest of his supervisor, John collaborated and submitted excerpts from his dissertation for publication. With the publisher specifying conditions for publication, open access was required, and this refer to the practice of making research outputs freely available to the public without restrictions. It also aims to promote transparency, collaboration, and the spread of knowledge. They thought about how best to go about it, and they opted for a controlled access repository so that the data owner could grant conditions for use and access to the digital assets. This can also ease the re-use of the digital assets for future research.
How does FAIR Data help with this problem?
Reusability is essential in that it optimises the reuse of digital assets. It also prevents duplication of efforts, prevents time waste in gathering data, prevents extra cost in getting data, and encourages transparency and novel insights and innovation. This article will briefly shed light on the reusability component of the FAIR data principle.
R1. (Meta)data are richly described with a plurality of accurate and relevant attributes
Once a dataset is richly described, tagged and characterized, it will aid in finding and re-using the dataset. The reusability component is more related to how valuable the data is in a particular context, and this can be ascertained if the metadata is well described for it to be discovered by humans and machines and if the context under which it was generated was captured. This can include information like the research protocol, methods, instruments for creating the data, and investigator. So, this sub-principle deters being economical with attributes, and the metadata creator must be sufficiently generous in providing rich, descriptive information about the data.
R1.1. (Meta)data are released with a clear and accessible data usage license
This brings to the fore a legal dimension to interoperability. It specifies the conditions of use of the data and the rights associated with it. Metadata, therefore, must have non-ambiguous guidelines stipulating rights to the data and how to use the data. Ambiguity is a significant issue that can limit the re-use of data by organizations that might not grapple quickly with licensing restrictions, so clear and accessible data usage licenses must be used.
R1.2. (Meta)data are associated with detailed provenance
In data terms, there is a need to understand the past to gain insight into the future, which is what predictive analytics is all about. Datasets from the past are obtained and then used to predict the future. For people to understand how to re-use a dataset, they must know what led to the study that brought about the data and how the data came to be, which can be a clear capture of the origin of the dataset. Provenance information documents processes and activities that led to the creation, modification, and handling of the data from when it was initially created to the present state. It often contains information on the origin, history, context, entities and agents, activities, relationships, etc. So, the metadata must have detailed provenance as described, which can aid reproducibility, transparency and accountability, trust, etc.
R1.3. (Meta)data meet domain-relevant community standards
Community standards are established guidelines, conventions and practices relevant to a specific field or community, and they can guide how data is collected, described and shared. Following standards will aid the re-use of data assets. For instance, when data are organized in a standardized manner, well-documented information, defined field formats, etc., it will encourage the re-use of such datasets. For instance, in generic data terms, a common data format standard is the CSV (comma-separated values), while common metadata standards include the Dublin Core.
Summary
This article considers a non-technical explanation of the reusability component of the FAIR data principle. Datasets are valuable resources with the potential for re-use if they are well described with relevant attributes, captured with a clear and accessible data usage license, linked to other data, and meet domain-relevant community standards.