What does the A in FAIR data stand for?
Welcome to my 5-minute non-technical explanation of the accessibility component of the FAIR Data Principle.
When data is not accessible, a familiar case study
John continued searching for the required datasets for his analysis and considered digital and physical resources. In his frantic search for the datasets, John paused and looked away from the library catalogue after a long and thorough search for resources for his dissertation. His tiny resolve to find relevant datasets paid off as he came across a piece of information that could point him to the dataset. He jumped up, heaving a sigh of relief and a radiant smile was visibly registered on his face. His joy was short-lived, though. Finding the required dataset was not enough; for him to access the relevant dataset, he needed a protocol for access, authentication, and authorization.
How does FAIR DATA help with this problem?
The World Wide Web is a massive repository of information whose values might only be fully tapped once people understand how to access data from it once it has been found. The FAIR principles were originally published by FORCE11 as a panacea to this limitation. In this, FAIR was defined as findable, accessible, interoperable, and reusable, and some clear principles emerged as guidelines for good data management. Apart from knowing that digital resources exist in a particular location on the internet, there must be conditions specified on how to access them. The accessibility principle seeks to ensure easy access to data for anyone with the authorization or rights to access it. It does not mean unauthorized people or those without the necessary rights can access it. This article will briefly look into the accessibility component of the FAIR data principle.
What does the A in FAIR stand for?
The accessibility component of the FAIR data principle specifies clear access procedures, authorization requirements, consistent and reliable long-term access, and secure storage and transfer of data and other digital assets. The accessibility principle is not limited to humans alone; digital objects must be accessible by both humans and machines. The sub-principles of FAIR accessibility are explained below in simple terms.
A1. (Meta)data are retrievable by their identifier using a standardized communications protocol
In the context of FAIR, data retrieval should be made possible using a standardized communications protocol. The internet is a vast repository of information, and John and people like him can retrieve information from it with just a click. However, much more than clicks, there is a standardized communication protocol that the computer initiates and executes to return output or data to the browser. The protocol is a defined set of rules and conventions for transmitting information between devices on a network. Examples of standardized communications protocols are HTTP/HTTPS (hypertext transfer protocol/secure), TCP (transmission control protocol), IP (Internet Protocol), FTP (File Transfer Protocol), Bluetooth, Ethernet, Wi-Fi, etc.
A1.1 The protocol is open, free, and universally implementable
The FAIR data principles seek to maximize the utility and reuse of data. The standardized communications protocol should be open-source, free and universally implementable to ease data retrieval. For the protocol to be open, it must be publicly available for anyone to view, use, modify and distribute. It should also come at no cost to people and be universal. In this context, ‘universal’ gives a global perspective, and people from all geographies should be able to access this protocol. With an open, free, and universally implementable protocol like HTTPS, John can access the needed resources with a computer and internet connection.
A1.2 The protocol allows for an authentication and authorization procedure, where necessary
Though the accessibility principle seeks to ease the retrieval of digital resources, accessibility does not mean being open. With data in different categories, some might contain sensitive information that should be protected; hence, there is a need for requirements that specify the exact condition under which the data is accessible, necessitating an authentication and authorization procedure. Data with sensitive information which must be protected can be FAIR by specifying authentication and authorization procedures where necessary. An authentication and authorization procedure such as HTTPS can help set user-specific rights, guiding John on what can be done with the data.
A2. Metadata are accessible, even when the data are no longer available
The cost of maintaining repositories might not be sustainable in some contexts, and there are instances of repositories disappearing over time. This leads to a situation where datasets cannot be found in online portals after some time. With this gap, it becomes imperative to ensure metadata accessibility even when the data is no longer available. For instance, keep the label on a jar of jam even after the jam inside is finished. The label on the jar has information on the producer, address, etc., which can be used to access the jam. Storing metadata is also cheaper than storing data, and simple catalogues built on CKAN can be created to store metadata. Metadata contains rich information about the data, and it can give information on the data and how to get it, e.g., the author’s email, institution, date of publication, and other necessary information. When metadata is accessible, it provides a historical reference to the data, contextual understanding, and the opportunity for future use. For John, this helps him ensure that when he uses or cites datasets, he can access them and be sure they are accurate and relevant.
Summary
This article has considered a non-technical, non-jargon explanation of the accessibility component of the FAIR data principles. Knowing that data is available on a portal is not enough; there should be a means to access the datasets, hence the accessibility principle in the FAIR principles. Metadata is valuable as a pointer to data, guaranteeing long-term accessibility and sustainability even when the datasets are no longer available.