"Data! Data! Data!" he cried impatiently. "I can’t make bricks without clay." ― Arthur Conan Doyle, The Adventure of the Copper Beeches.
As Doyle's fictional detective Sherlock Holmes knew, data are the foundation of scientific investigation. For NASA's Earth Science Data Systems (ESDS) Program, providing unrestricted access to one of the largest archives of scientific data on the planet—the building blocks of scientific discovery—is an ever-evolving challenge of matching data user needs with technology and mission requirements.
New technology is enabling the development of satellite-based instruments capable of the most detailed observations of our planet in history. These next-generation observations come with a price, however: higher data volumes than any previous instruments or missions. This, in turn, requires new ways of thinking about how these data can be processed and distributed to ensure they are available rapidly and openly, as well as in the many formats data users need.
"Two of our program's top priorities are ensuring stability—providing the data you need when you need it—and driving innovation," says NASA Earth Science Data Officer Katie Baynes. "My goal within the Earth Science Data Systems Program is to provide continued free open access data stewardship and support at a world class level. At the same time, we want to expand support for new users; we want to reduce the amount of time that users spend fiddling with their data to get it to play nice with other data."
An ongoing ESDS initiative, the Multi-Mission Data Processing System Study, is one component of the next phase in how the world works with NASA Earth science data. Other components include the evolution of NASA's Earth science data collection to the commercial cloud (Earthdata Cloud Evolution) and the consolidation of ESDS web properties into the Earthdata website (Web Unification).
Having a foundational system capable of processing data from multiple, disparate missions will help data users work more efficiently with higher volumes of data, apply these data more rapidly to global issues, and further the ESDS objective of providing NASA Earth science data that are findable, accessible, interoperable, and reusable (FAIR).
The Need for a Foundational Data Processing System
Mission data depend on algorithms, software, compute infrastructure, operational procedures, documentation, and data management teams for processing raw data into a variety of data products. Collectively, these elements comprise the Mission Data Processing System (MDPS), which also includes software tools that support the development of processing algorithms and the validation and analysis of processed data.
For more than 40 years, individual missions have created their own data processing systems. But there are problems with this approach.
"Right now, for every new mission, we create common capabilities to process and deliver science data products. We make it stove-piped. Even if we try to share some core processing capabilities, each mission wants to start anew. This creates inefficiencies in time and cost," says Irina Strickland, Instrument Software and Science Data Systems Section Manager at NASA’s Jet Propulsion Laboratory (JPL) in Southern California.
Baynes agrees and observes that time spent developing complex processing infrastructures for each mission takes time away from science teams creating and refining algorithms for their specific user communities. "We really don’t want or expect that our mission teams should have to go through reinventing the wheel every time," she says.
But what if there were a common system that could be used as the foundation for processing data from multiple missions? This would enable science teams to spend more time developing interoperable data products and looking at how users want to use these data products and less time developing mission-specific processing architectures. This was the genesis of the Multi-Mission Data Processing System Study (MDPS Study), which began in 2021 and is being coordinated by Strickland and her team at NASA JPL.
"The initial study goal was to figure out what type of system, what kind of architecture could satisfy the needs of both projects and missions and at the same time enable open science and support science discovery," Strickland says.
The Study
"The MDPS Study is our attempt to promote open science principles, enable efficiencies, reduce time to science, and increase the amount of time that the science team can spend on developing and collaborating on innovative data products," says Baynes.
Baynes' use of the word "our" refers to the large team guiding the MDPS Study. Along with members of NASA's ESDS Program, representatives from NASA Headquarters (which includes Baynes), and Strickland's NASA JPL team, the effort involves participants from NASA's Ames Research Center in California's Silicon Valley; NASA's Goddard Space Flight Center in Greenbelt, MD; NASA's Langley Research Center in Hampton, VA; and NASA's Marshall Space Flight Center in Huntsville, AL.
But "our" also refers to the larger scientific community, which has been involved in the study from the start. "Everything is being done openly," says Strickland. "Even commercial companies [are participating]."
The study comprises four phases, each of which builds on the work of the previous phase:
- Phase 1: Concept study to identify a recommended MDPS architecture (October 2021 to March 2023)
- Phase 2: Design review and architecture study of the recommended MDPS (March 2023 to September 2024)
- Phase 3: System development
- Phase 4: System implementation
All aspects of the study adhere to agency requirements for the open development of software and systems, and the Phase 1 and Phase 2 workshop recordings and reports are available on their respective Earthdata website pages. The MDPS Study team continues to hold numerous open meetings and public workshops and encourage participation by representatives from upcoming missions. The team is also soliciting input from a broad and diverse set of flight project teams, industry partners, open science experts, and stakeholders across a wide spectrum of the science mission data systems community.
Earth Science to Action
Data are meant to be used, and the development and implementation of a foundational data processing system for multiple missions will enable data to be processed and delivered to global users more efficiently. Having these data readily available in the formats needed by data users for their applied work is a primary goal of the MDPS Study.
"We want to provide a system that allows people to hit the ground running and reduce the time it takes to develop science [data] products," Baynes says. "It frees us up to focus on higher level processing and enables users to [apply] these products in innovative and fused ways. It's fostering a community that will lead to action."
Putting Earth science data to action for addressing societal needs is a cornerstone of the agency's Earth Science to Action initiative. In keeping with their foundational role in scientific work, NASA data form the broad base of the Earth Science to Action pyramid developed by NASA’s Earth Science Division.
"It's not just science for science's sake," observes Strickland. "We anticipate NASA Earth observation data becoming even more integrated into many aspects of scientific research, policymaking, and, eventually, everyday life. This evolution of Earth observation data will strengthen NASA's Earth Science to Action capability by providing more detailed, timely, and actionable scientific insights."
The Future of NASA Earth Science Data
The move from data to information to knowledge is a natural progression, one that is made easier by providing data openly along with the tools and resources for using the data. Having a foundational MDPS for upcoming high-data volume missions will streamline this progression and broaden the number of communities able to use mission data.
Strickland also sees machine learning as an important component of this process by providing an initial check on data quality for the tremendous volumes of data with which users will be presented.
"We need to be smarter in the way data are processed, and maybe not all data need to be processed," Strickland says. "The hope is to integrate machine learning and other techniques to [help make decisions] on which data are not the best or the correct quality for processing. I think technology is advancing in leaps and bounds, and we’re here to reap the benefits."
Strickland and Baynes agree that the next five years will see not only the continued exponential growth in the volume of NASA Earth observation data, but the use of these data by an ever-growing community of global data users. The MDPS Study, combined with Web Unification and the continued evolution of NASA Earth science data to the cloud, will constitute a system that enables these communities to get the NASA Earth science data they need, when they need it, and in the formats that work for them. For Baynes, the next five years will see many positive changes to how NASA user communities interact with data.
"If we can create a place where we have containers that can be run in any environment and that can be portable across whether you need to do your work on your laptop or on a high-end computing system or in the cloud or across the planet, we have created a system that just works," Baynes says. "You’re going to be able to leverage things that you're comfortable with using, whether that be GIS Stack or Jupyter Notebooks or even downloading [the data] and processing them with tools you've historically used. Hopefully it becomes less of a burden for you as a researcher. That's where I want us to go over the next five years."