As NASA works to practice open science by freely sharing data, information, and knowledge with the scientific community and public, a key element of this effort includes determining and providing the best means to easily find and access these data.
“To me, achieving the goals of open science includes offering data, software, and other resources that are findable, accessible, interoperable, and reusable. It’s just part and parcel,” said research scientist Mark Parsons from NASA’s Chief Science Data Office within the agency’s Science Mission Directorate (SMD).
With this in mind, Parsons is leading a NASA working group to make all of the agency’s data findable, accessible, interoperable, and reusable (FAIR) as part of the official science community-wide FAIR Data Principles movement. FAIR is a set of 15 guidelines created by experts from academia, industry, research funding agencies, and scholarly publishers to help data creators and managers make their assets easily used, particularly by computers. Data are increasingly more complex and are produced faster and in higher volumes, making machine access and processing essential for efficiently working with them.
What is FAIR?
In 2016, the “FAIR Guiding Principles for scientific data management and stewardship” were published in the journal Scientific Data. The authors originally created the principles in response to the increasing requirement by science funders, publishers, and governmental agencies for researchers to have data management and stewardship plans for data generated in publicly funded experiments. Following the guidelines aids researchers in producing high-quality digital holdings that facilitate and simplify this ongoing process of discovery, evaluation, and reuse of data in future studies.
FAIR places a lot of importance on providing well-written and complete metadata. Metadata include essential information for accessing and using data, such as its authors, format, unique identifier, archive location, and more.
The FAIR Data Principles are:
Findable: Machines need to be able to find data in order to use them. FAIR principles state that metadata and data should be machine-readable and include persistent, globally unique identifiers, such as digital object identifiers (DOI), so they are easy to find for both humans and computers.
Accessible: Machines and humans need to know how to access data. DOIs should be used to fetch the metadata for the data resource from the repository interface. Metadata should include information on how assets can be accessed. If interfaces require human input, data practitioners should create application programming interfaces (APIs) to increase asset accessibility.
Interoperable: Data need to integratable or interoperable with other data, applications, and workflows for analysis, storage, and processing. Practitioners should provide metadata in machine-readable formats and use standard vocabularies that are programmatically actionable.
Reusable: Ultimately, FAIR is about enabling the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated or combined in different settings. Practitioners should indicate within metadata whether the data being described are in the worldwide public domain. If not, they should provide both copyright and license information in the metadata. When possible, practitioners should use standard digital licenses with formal references.
FAIR Guidance for Everyone at NASA
For the past two years, Parsons and the SMD Working Group have been discussing how to implement FAIR practices across the directorate’s five scientific divisions. The work is to meet the requirements of the Scientific Information Policy for NASA’s SMD (SPD-41a), which stipulates that NASA’s data should be FAIR.
“Data should be FAIR. But what exactly does that mean for the different divisions and data repositories?” asked Parsons. “We’re asking what are some of the basic steps the repositories can do to make their data a bit more FAIR. We’re also looking at it from direction of human and machine users because it's ultimately the users who define what is FAIR. You might be technically meeting all those principles, but if users can't find the data, it's not doing any good.”
Creating and formatting rich metadata is one focus in how to implement FAIR at NASA because their details could vary across scientific disciplines. For example, location and time parameters are going to be different between location and time data associated with Earth and the Sun.
“Complying with FAIR should be viewed on a spectrum that is defined by the scientific discipline,“ said Parsons. “Therefore, we’re collaborating with the individual divisions and repositories to determine what are the appropriate guidance and standards to meet both their needs and the requirement.”
Ingrained in Earthdata
Many NASA groups have readily embraced the concepts of ensuring data are FAIR. The Earth Science Data Systems (ESDS) Program is one of them.
“Our practitioners of data management and stewardship have been following these principles for a long time, and I would argue that the ESDS Program has been a front-runner in upholding these principles since long before they were called FAIR,” said Dr. Rahul Ramachandran, project manager of the ESDS Interagency Implementation and Advanced Concepts Team (IMPACT). “What has changed over time is how to implement these principles and the policies that govern them—that’s what we have to keep up with.”
IMPACT has developed a number of projects to address FAIR principles and their components, including the Visualization, Exploration, and Data Analysis (VEDA) project and the Analysis and Review of CMR (ARC) project.
“VEDA looks at non-traditional ways of accessing data to make these data available to computers and people,” said Ramachandran.
The VEDA project provides visualizations, trend analyses, and comparative analyses of datasets, allowing users to explore the applications and implications of those data. The VEDA Dashboard makes it easier to access and use Earth science data, which helps enable faster science—one of the goals of the NASA open-source science initiative (OSSI).
ARC helps address this need by conducting quality assessments of NASA’s metadata records in the NASA Earth Observing System Data and Information System (EOSDIS) Common Metadata Repository (CMR). These records correspond to approximately 8,000 datasets collected from Earth observing satellites, airborne, and in situ instruments. Having high-quality metadata records is important since it is the content of these records that are indexed for search on the web, including through data portals such as NASA’s Earthdata Search.
NASA's Earth Science Data and Information System (ESDIS) Project is a proponent FAIR principles as well.
“We work with all of the Distributed Active Archive Centers [DAACs] to make their data open and accessible to researchers and the general public,” said Dana Ostrenga, ESDIS assistant project manager.
To foster greater interoperability of data, ESDIS creates a lot of tools and services, such as visualization, analysis, and data-transformation services, to assemble data in ways that aid rapid analysis,
“For users, we developed the Harmony system in the cloud, which is integrating all of these tools and services to produce data that are inherently more interoperable and make the services agnostic to the user,” said Ostrenga.
Harmony is NASA’s Earthdata Cloud Services System and allows users to produce analysis-ready data by subsetting, reprojecting, and converting data to a cloud-optimized format. The system ultimately allows users to simply and efficiently access and download only the FAIR data they need.
Spreading the Word
To increase the awareness and implementation of FAIR across NASA, the agency presented FAIR for NASA Data at the Open Source Science Data Repositories Workshop in September 2023. Pre-workshop webinars were held in July and August. NASA intends to hold this workshop annually as a way of addressing questions, issues, and advancing the implementation of FAIR across NASA.