Between the end of September 2018 and November 2019 the volume of data in NASA’s Earth Observing System Data and Information System (EOSDIS) collection grew from 27.4 petabytes (PB) to more than 33 PB. This significant growth is expected to not only continue, but increase at an even more rapid rate with several upcoming Earth observing missions that will add a tremendous amount of new data to the EOSDIS collection over the next five years.
With NASA’s charge to provide these data freely to global data users, NASA’s Earth Science Data and Information System (ESDIS) Project launched Cumulus—a multi-year effort to develop a cloud-based framework for data ingest, archive, distribution, and management. For more information about overall efforts to host EOSDIS data in the commercial cloud, please see the EOSDIS Cloud Evolution page.
Cumulus is an open source workflow system specific to the Earth science archive domain. The system is intended to be used by the EOSDIS Distributed Active Archive Centers (DAACs) as they are migrating their archived data to the Amazon Web Services (AWS) commercial cloud, which has been approved for use by NASA's Office of the Chief Information Officer. The core team for Cumulus comprises several contributing members, including developers from the Land Processes DAAC (LP DAAC) and the National Snow and Ice Data Center DAAC (NSIDC DAAC). The diverse composition of the core team not only provides different perspectives, but also brings to bear decades of archival experience to tackle the challenges related to this cloud migration effort.
Significant accomplishments by the ESDIS Cumulus Core team and EOSDIS DAACs in 2019 furthered this effort toward fruition. These included the addition of new features and capabilities along with enhancements to make Cumulus more robust and secure.
Throughout 2019, the Cumulus Core team focused on meeting the needs of two specific user communities: integrators and operators.
Integrators are traditionally software developers beginning to deploy and use the system for developing product workflows, and are driving the technical capabilities of the system by focusing on system scalability, robustness, and security. These highly technical users are dependent on well-documented, well-tested, and easily-extended interfaces and Application Program Interfaces (APIs). The Cumulus Core team continues to refine the system based on the experience and feedback received from integrators.
Operators, on the other hand, are the day-to-day users working with the Cumulus ingest, archive, and distribution system to ensure that NASA’s EOSDIS data are processed properly into the system as they arrive and are archived and maintained via vigilant data stewardship. Operators require streamlined and intuitive tools that provide dashboards, metrics, and alerts that are relevant and actionable. Throughout 2019, the Cumulus Core team worked to develop new systems and enhance existing systems to provide these necessary tools and metrics.
Along with supporting integrators and operators, the Cumulus Core team continued preparations for upcoming data-intensive missions. One of these is the Surface Water and Ocean Topography (SWOT) mission, scheduled for launch in 2021. The mission will make the first global survey of Earth’s surface water, observe the fine details of ocean surface topography, and measure how water bodies change over time.
Over its three-year planned mission, SWOT is expected to generate as much as 23 PB of data. These data will be archived at and distributed by NASA’s Physical Oceanography DAAC (PO.DAAC). During 2019, PO.DAAC, as the SWOT Systems Integrator, completed all SWOT technical qualification requirements for hosting and distributing this high volume of data in the commercial cloud.
Existing mission data also continued their evolution to the cloud in 2019. NASA’s Global Hydrology Resource Center DAAC (GHRC DAAC) and Alaska Satellite Facility DAAC (ASF DAAC) supported operational transition of beta data products from the Spaceborne Imaging Radar C (SIR-C, operational April to October 1994) project to the commercial cloud. SIR-C was part of the joint U.S./German/Italian SIR-C/X-Band Synthetic Aperture Radar (SIR-C/X-SAR) project that used a highly sophisticated imaging radar carried aboard the Space Shuttle Endeavour to capture images of Earth.
The ASF DAAC efforts are paving the way for another large mission on the horizon: the joint NASA-Indian Space Research Organisation Synthetic Aperture Radar (NISAR) mission. NISAR will use a dual-frequency Synthetic Aperture Radar (SAR) to study natural hazards and environmental change. Over its planned three-year mission, NISAR is expected to produce an unprecedented volume of data—more than 128 PB, or almost four times the current size of the entire EOSDIS data collection.
Along with these high-level achievements, numerous new features and capabilities were added to Cumulus in 2019 that will facilitate a better user experience (UX) and metrics collection. These include:
- Enhanced Cumulus dashboard capabilities based on feedback from operational users and UX designers
- Implementation of auto-scaling functionality to support large ingest loads
- The development of support for bulk re-ingest and ingest prioritization
- Integration of automated Disaster Recovery backup and recovery procedures into system workflows and the Cumulus dashboard
- Implementation of a cloud-based metrics system
In addition, enhanced security measures designed to comply with NASA’s General Application Platform (NGAP), which provides a cloud-based Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS) for ESDIS applications, were implemented. This was complemented with an integration of the Cumulus dashboard and APIs with NASA Access Launchpad authentication. Access Launchpad, or Launchpad, is an internal NASA system that enables secure access to NASA applications.
Of course, a complex undertaking like Cumulus also requires training. New training materials were developed during 2019 to support the on-boarding of new developers and integrators into the Cumulus ecosystem, and sessions during the annual ESDIS Systems Engineering Technical Interchange Meeting (SE-TIM) were devoted to introducing DAAC staff to Cumulus. Finally, the Cumulus Core team continued to engage integrators and operators through working groups designed to solicit feedback and smooth the transition from on-premise systems into the Cumulus cloud-based system. You can see our work in progress by looking at the Cumulus documentation and source code.
The work and accomplishments of the Cumulus Core team and EOSDIS DAACs during 2019 laid a firm foundation to move Cumulus forward in 2020. Feedback and input from integrators and operators will continue to play a large role in work by the Cumulus Core team to further refine requirements, systems, and tools for migrating EOSDIS data in the commercial cloud and efficiently providing these data to worldwide data users.