What does the F in FAIR data stand for?

Abegunrin Gideon
5 min readSep 16, 2024

--

Welcome to my 5-minute non-technical explanation of findability — the fulcrum of the FAIR data Principles

When data is not findable — a familiar case study

To get his doctorate, John must demonstrate that he is competent in learning and character, so in addition to passing his courses, he must engage in rigorous research. After spending a long time in the library, looking up resources for his dissertation on the internet, with a yawn that shows exhaustion, John cannot come to terms with how difficult it had been to find the appropriate datasets for his analytical work. With a fast-approaching deadline for submitting his piece, he must up his game and find datasets that fit his research idea. He knows they are out there. The research area he is looking into is a bit restrictive, and there are limits on data access, use, and distribution because of privacy, ethical, and proprietary reasons, which might jeopardize his research. Nevertheless, how will he produce accurate, data-informed research if he cannot find data he knows exists in the first place?

How does FAIR data help with this problem?

This has been the fate of many who needed access to the requisite dataset for analysis. With data locked up and inaccessible for various reasons, the reuse of such valuable digital assets has been limited before it began. The FAIR principles were published initially by FORCE11 as a panacea to this limitation. In this, FAIR was defined as findable, accessible, interoperable, and reusable, and some clear principles emerged as guidelines for good data management. The guidelines were designed to improve the findability, accessibility, interoperability, and reuse of digital objects. This principle has immense potential for scientific advancement, and various efforts have been made to propagate the principles globally. This blog will investigate the findable component of the FAIR data principle and provide a non-technical, non-jargon explanation.

Why are some datasets so hard to find?

Among digital objects, data takes a prominent position. Data is essential for all scientific endeavours, and we are in an era of rapidly expanding data sources. This comes with drawbacks such as poor structure, format, standards, and different levels of data literacy in people creating databases in the first place. The datasets exist in various forms (structured, unstructured, and semi-structured); some might need to be better described, standardized, and in formats that might make it difficult for software to access for machine actionability. This limits the potential of the datasets to be found by humans and machines and limits the opportunity to exploit this wealth of data.

What does the F in FAIR stand for?

The first component of the FAIR principle is findability, which is the fulcrum for the other three. As John had difficulty finding the appropriate datasets for his analytical piece, he could not access, interoperate, or reuse this data to support his research with empirical evidence. John must find various datasets that meet his needs to unlock the power that data offers in any other way. It does not matter whether the dataset is high or low quality, accessible, interoperable or reusable if John cannot find it in the first place.

When defining FAIR fully, there are also sub-components for each principle. The findability principle’s sub-components are explained below.

F1. (Meta)data are assigned a globally unique and persistent identifier

‘Metadata’ gives data context. It provides information about data, for instance, a digital photograph saved on a computer. Properties like the type of file, location, size, date created, date modified, etc., can be appropriate to describe the digital photograph. If sufficient, the context metadata can act like a compass, helping machines and humans find the data. Though there is no standard for comparison, metadata is as important as the data it describes. In developed nations, inhabitants often have a unique identifier to distinguish them across different systems. For example, the National Insurance Number (NI) in the United Kingdom. This collection of numbers and letters is unique to each individual, and it is meant for social reasons like National Insurance contributions (NICs) and tax. For UK citizens, this number remains the same for life; it`s recorded against a specific individual only. `Globally unique` implies that no two entities (metadata) share the same identifier worldwide, while ‘persistence` means long-term access, reliability, and sustainability. This can be made applicable to FAIR by ensuring that metadata are registered with a globally unique and persistent identifier, and this ensures that the data can be found even if there are changes in the location of the dataset. So, comprehensive information about the dataset can help John find the data easily.

F2. Data are described with rich metadata

When a dataset is described with ‘rich metadata,’ comprehensive information that can ease discoverability, use, and understanding is enhanced. Rich metadata also allows machines to perform routine and challenging tasks automatically, saving researchers valuable time. Data with rich metadata will be easier for John to find. When the data has been found, rich metadata increases the potential for use, reuse, and sharing.

F3. Metadata clearly and explicitly includes the identifier of the data they describe

As metadata and data are distinct, there is a need for a link between the two. Therefore, the relationship between metadata and data can be established by capturing the data’s globally unique and persistent ‘identifier’ in the metadata. Once the identifier is captured in the metadata, it is a dataset pointer. Examples of identifiers include digital object identifier (DOI), Uniform Resource Identifier (URI), ORCID ID, etc. In his search for datasets, the data might be more findable if it includes an identifier like a DOI.

F4. (Meta)data are registered or indexed in a searchable resource

To be findable, digital objects must ‘exist’ on the internet. Even a globally unique and persistent identifier and rich metadata may not be enough to ensure a dataset’s findability online—to exist online; it must be'registered’ and visible through searches. Once existence on the internet is established, people like John and the machines they might be using can then discover it. One means of making digital objects available online is through ‘indexing`. Google has spiders and crawlers that automatically find and read websites, indexing them and making them available on the Google search box.

Summary

We are in an era of massive information, and all face a deluge of data from across the globe. With the information in disparate forms, we need to ensure machines and humans can find these digital resources easily. This can be made possible by elements of the findability principle, such as globally unique and persistent identifiers, rich metadata and indexing in a searchable resource. The findability principle is also important as it is the basis for the other FAIR principles. People cannot maximize value from datasets if nobody can discover them, even if they know they exist.

--

--

Abegunrin Gideon

A researcher with a forte for scholarly writings. He is adept in the design and implementation of trans-disciplinary research procedure and data management.