When users interact with data in NASA’s Earth Observing System Data and Information System (EOSDIS) collection, their primary point of contact is with an EOSDIS Distributed Active Archive Center (DAAC). The discipline-specific DAACs play integral roles archiving and distributing EOSDIS data, providing support for using these data, and developing tools and applications for working with these data.
NASA’s Earth Science Data and Information System (ESDIS) Project is responsible for overall management of EOSDIS operations, including management of the DAACs. As the manager of the ESDIS Project’s Science Operations Office, Drew Kittel oversees DAAC operations and the work of DAAC engineers. As he points out, upcoming high-volume data missions will add a tremendous amount of data to the EOSDIS collection over the next five years. He and his team are working with the DAACs to prepare for not just these new data, but new ways of working with these data.
What is the role of the EOSDIS DAACs?
The primary function of the EOSDIS DAACs is to be stewards of NASA EOS [Earth Observing System] mission data, whether it’s satellite, airborne, or field campaign data. DAACs provide free, full, and open access to these data and provide tools and ancillary information to global data users. They also make sure that these data are interoperable and accessible, both for research and for societal benefit.
The DAACs perform a wide variety of functions and services in accomplishing these goals. Along with data ingest, archive, distribution, and preservation, they also do data provenance, metadata curation, data science operations, development, and maintenance. They provide tools and services for working with data. They provide search and discovery capabilities and help prepare for new missions and new data onboarding.
On top of all of this, the DAACs also provide a wide variety of user services. These include providing discipline-specific expertise, assisting with data acquisition and manipulation, facilitating access to data handling and visualization tools, developing data recipes for manipulating data, presenting informational webinars, and keeping their communities updated on data news and technical issues.
What are some of your responsibilities as a DAAC engineer? What does this mean?
ESDIS is responsible for managing 12 discipline-specific DAACs that are spread out across the country at different institutions and at different government facilities. This is a lot of ground to cover! ESDIS assigns a lead DAAC engineer to each DAAC with the task of providing technical leadership and direction in developing, implementing, and operating complex data systems at the DAAC to which they are assigned.
It’s a multi-faceted role, and it’s not just technical. DAAC engineers evaluate the DAAC’s yearly work and budget plan and ensure that there are appropriate levels of effort and personnel resources applied to hardware and software to satisfy data systems development and operations. They ensure that the DAAC’s engineering is being done at appropriate levels and specs, and that their operations and mission science data management requirements are being fulfilled.
Additionally, DAAC engineers oversee and coordinate new mission efforts. We have a lot of this going on right now. This includes the collection of new data, development of interfaces with science data systems, and coordination of science data product archive and distribution efforts. DAAC engineers also are involved with finance and contractual-type functions. They fulfill the project management activities of monitoring cost schedule and performance of everything that occurs at the DAAC.
One really important role is their work with a broad variety of stakeholders. They constantly interact with folks at NASA Headquarters, all levels of DAAC personnel, contractors across the EOSDIS, DAAC UWGs [User Working Groups], other government agencies and foreign partners, and with all the people who are carrying out ESDIS evolutionary activities that are being implemented at the DAACs.
You mentioned the DAAC User Working Groups, or UWGs. Tell me more about their role at the DAACs.
DAAC User Working Groups typically consist of 12 to 15 recognized data users from the discipline represented by the DAAC they serve, and bring a depth of experience in specific areas of research germane to that DAAC. UWGs provide an extremely valuable service of providing input and feedback to the DAACs, ESDIS, and to NASA Headquarters.
UWGs provide oversight and guidance on the totality of DAAC activities, including dataset acquisition. In fact, they often bring to ESDIS and the DAACs datasets that they feel should be incorporated into the EOSDIS collection and made available through the DAAC they serve. We have a formal ESDIS process for the accession of new datasets, tools, and services. UWGs assist us in various steps of this process by providing recommendations and underlying justifications for why a dataset should be considered for accession into the collection.
The volume of data in the EOSDIS collection is expected to grow exponentially over the next few years thanks to upcoming high-volume data missions. How are you and your fellow DAAC engineers working with the DAACs to prepare for this surge in data?
From our current ingest volumes of about 23 terabytes a day, we’re preparing for ingest volumes in excess of 125 terabytes a day over the next five years. Just the physics of that amount of data requires a new way of thinking about how you’re going to work with that amount of data.
For many [research] problems that people are trying to solve, the old way of downloading data and processing the data on-premise simply won’t work. This requires a rethinking about how to work with these large volumes of data in the cloud.
Helping the DAACs and our user community prepare for these upcoming large-volume data missions has been a collective and a highly collaborative effort across the entire ESDIS organization. A number of collaborative efforts over the past few years have incrementally proved the concept of using the commercial cloud for data ingest, archive, and distribution, including interfacing with science data systems and identifying new processes and security needs that will be required to do operations in the cloud.
Over the last three or four years, a lot of pathfinding efforts have been undertaken to prove the concept of large-volume ingest flows and the distribution of large volumes of data. For example, a long effort with the NISAR [NASA-Indian Space Research Organization Synthetic Aperture Radar] mission science data systems folks proved the capability of being able to generate NISAR products in a cloud-native data processing system and then transfer these products in-region to a cloud-native data ingest system. Through this work we’ve been able to standardize cloud data transfer and interface mechanisms and learned a lot of lessons about what you can and can’t do within the cloud environment.
We also are seeing increased collaborative efforts between DAACs and across missions. In order to help facilitate the on-boarding of new users of data in the cloud, a vast organization of folks at the DAACs are developing a number of cloud primers. It’s everything from how to cloud to being able to do meaningful work in the cloud.
As EOSDIS data evolve to the commercial cloud, how will this change how the DAACs archive and distribute EOSDIS data?
This will necessitate changes in the underlying technology and processes that the DAACs currently employ, including developing and deploying systems that are within a common secure cloud platform framework. This includes all of the ingest, archive, and distribution functions that the DAACs use, including storage, backup, and the operational monitoring of tools and services. It will also include developing and deploying new mission archives using a common framework for data ingest, archive, and distribution, back-up, metrics collection, metadata curation, and cataloging functions.
Additionally, they’ll need to develop and employ common frameworks for data services tools, change services, and operations like performing data transformations, subsetting, mosaicking, reformatting, and regridding. This will include not just common services, but also what we used to call DAAC Unique Extensions, which is getting to the tools and services that will plug into those frameworks and serve the unique needs of DAAC users.
Will the role of the DAACs change with this cloud evolution?
Fundamentally, the role of the DAACs won’t change. DAACs will remain the stewards of our data products and of the products in their catalogs. They will continue to ensure that NASA Earth science data remain free, open, and accessible to all who need and want these data.
Now, the location of the archive and the data may change, but the development activities, the engineering, the operations support, and user support activities will remain. The DAACs will continue to be responsible for archiving and distributing the data they always have been responsible for. They will continue to develop and provide the tools and applications for using the data they distribute. They certainly will continue to provide user services and support their data user communities in the ways their community has come to expect. The DAACs will always serve as centers for discipline-specific expertise.
Some of the user support services that the DAACs provide will necessarily need to change. Our objective is to smooth the transition to use of the cloud by our communities of data users and help them determine how cloud capabilities can best be brought to bear in solving their specific needs. A lot of this initially entails working with our data users to change their mindsets a little bit about how they use and interact with data. We’ll have to assist their transition to the cloud by providing scenarios and information about how they can perform significant processing, data reduction, and analysis close to the data that are in the cloud.
Looking ahead, what does the future hold for EOSDIS data users and their interactions with the DAACs?
As more NASA data evolve to the cloud, the biggest immediate changes that we’ll see are in the types of support that users will be expecting and requesting from the DAACs.
When we take a look at what future cloud-resident data lakes look like, and we have multi-platform, multi-instrument, multi-discipline data in them, this presents opportunities that did not exist previously and will allow for newer, faster, more powerful ways to work with these data and new opportunities for science.
The DAACs will be providing expert user support in their respective disciplines and facilitating support in other DAAC disciplines where it’s necessary. They’ll be serving as facilitators to help users bring their analysis and processing into the cloud and take advantage of all the benefits that the cloud provides. Naturally, the DAACs will be sharing and leveraging the growing body of knowledge and lessons-learned from all of our preparatory work across EOSDIS organizations.
What are you most excited about over the next five years in terms of EOSDIS data and how data users will interact with these data?
Early in my career I worked with people who had very big ideas about what they wanted to do with remotely-sensed data. Technology was always a limiting factor, whether this was processing power or storage capabilities or network bandwidth.
In our transition to the cloud, this concept has been totally reversed. The limiting factor now is only people’s imagination. What can they do with all of this power and these vast data resources in the cloud? There are certainly things our users want to do at large-scale, but there are plenty of investigations that have not even been thought of. I’m really excited to see where this goes.
When we look historically at how our users have related to EOSDIS data, they spend a lot of time downloading and preparing data and then working with it. We now have an opportunity to flip this strategy of using data – less time getting the data ready and more time utilizing the information derived from the data. What this means, in the end, is more time for science and more time for applying the data to a wider range of applications.
We’ll have the ability to do more complex science with larger datasets or with multiple datasets across multiple disciplines spanning larger geographic areas and longer periods of time. We also have the potential for the user community to develop value-added products and community-contributed code and functionality that can be utilized by others in the cloud.
There’s also the potential for influencing the thinking of future missions and sensors. We’re proving that we can deal with volumes and densities of data that are orders of magnitude greater than we once were able to handle even just a few years ago. Maybe this helps inform how science teams look at the development of science products and higher-level informational products.
Over the last five years it has been amazing to see not just the evolution of the technology, but also how our community has evolved to become closer, stronger, and more collaborative. This collaborative functionality is built into the DNA of everything we do. It’s gratifying to see that you can have a large, diverse community of developers and stewards of data all pulling in the same direction. This has been a wonderful thing to be a part of.