NASA’s Earth System Observatory (ESO) is a coordinated series of missions designed to obtain measurements of multiple Earth processes. These missions will generate more data than any previous NASA missions. For example, the NASA/Indian Space Research Organization Synthetic Aperture Radar mission (NISAR; scheduled for launch in 2024) is expected to generate up to 50 petabytes (PB) of data per year (to put this in perspective, 1 PB is equivalent to approximately 500 billion pages of printed text).
In keeping with NASA’s open data policies, ESO data will be available to global research teams—or anyone with an internet connection—as early in the mission process as feasible. But these data must be processed before they can be used. This is where a Mission Data Processing System (MDPS) comes in.
The MDPS is the set of algorithms, software, compute infrastructure, operational procedures, documentation, and teams that process raw instrument data into science quality data products. The MDPS also includes the software tools that support the development of processing algorithms and the validation and analysis of the processed data.
The ESO Challenge
In the current NASA Earth science data processing architecture, each mission receives its raw instrument data through the multi-mission Near Space Network (NSN) and processes data using a mission specific MDPS. Created data products are then delivered to NASA’s Earth Observing System Data and Information System (EOSDIS) to be archived and distributed.
This siloed data processing architecture creates several difficulties for the ESO. It is not conducive to the coordinated mission nature of the ESO, it creates barriers to enabling broad and early access to science and related software, and it complicates intra-mission and instrument science.
To address these issues, NASA Science Mission Directorate (SMD) Chief Science Data Officer Kevin Murphy issued a challenge to the broad mission processing community to identify the best model for an open MDPS to support ESO missions. Specifically, Murphy challenged the community to identify a data processing architecture that not only meets ESO mission science processing objectives and supports Earth system science, but also promotes open science principles and enables data system efficiencies.
As Murphy observes, this type of MDPS offers many benefits to data users. “Open-source science data processing systems enable interoperability through greater accessibility and collaboration among researchers, leading to deeper insights into our planet’s complex systems and informing better decision-making,” he says.
To ensure the broadest collaborative, open process to explore data processing options, the Open Source Science for ESO Mission Data Processing Architecture Study was conducted.
Open Results from an Open Process
Open science is a foundational objective of NASA’s SMD, and is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity.
In keeping with these guidelines, the ESO MDPS study was fully open and encouraged participation by the broader science and data processing community. It was guided by open science practices and constraints:
- The study was conducted through public workshops and open requests for information, and included broad social media and outreach efforts
- Public workshops were designed to enable community participation, engage key stakeholders, and promote diverse and inclusive discussions
- The study solicited input from a broad and diverse set of flight project teams, industry partners, open science experts, and stakeholders across a wide spectrum of the science mission data systems community
- Representatives from all ESO missions participated as members of the study team
- The recommended MDPS is aligned with agency open data, software, algorithm, and publication policies (such as NASA’s SPD-41a)
The goal of the study was to determine if ESO data processing systems can be architected to allow greater interaction across individual mission data processing systems, enable efficiency and Earth system science, allow greater interaction with a broader set of users, and enable open-source science.
Two public workshops were conducted. Workshop #1 focused on collecting NASA stakeholder objectives and ESO mission requirements. Workshop #2 studied practices across NASA and other agencies for developing science data processing systems. The information from these workshops was used to inform and identify potential architectures that could meet the study objectives. A technical trade study was performed along with a programmatic trade study of different architectures. The results from these studies were then combined to establish a final recommendation.
The recommendation of the study team is for each ESO mission to develop a MDPS using a common architecture and services that are provided and managed by an overarching multi-mission organization. This governing organization will establish standards across ESO missions and develop and deliver infrastructure, data catalog, analysis, and (potentially) processing services.
The study team recommends a follow-on study to establish a preliminary data processing architecture and an approach to implement this design along with an additional study to look at specific use cases for cross-mission data analysis. These future studies will continue to be open and encourage input from the scientific community. More information about these next steps will be published on the SMD, Earthdata, and other relevant public websites.