When a bird develops the feathers needed to fly, it becomes a fledgling and is ready to leave the nest to explore the broader world. For participants in NASA Openscapes who have learned the skills necessary to work with NASA’s open archive of Earth observation data in the cloud, fledging refers to taking their new knowledge back to their organizations and setting up their own cloud-based environment for conducting scientific research.
“Fledging to me is like spreading your wings and soaring off,” says Dr. Julia Stewart Lowndes, the Openscapes founding director. “Fledging is really trying to answer the question of where researchers go after they’ve learned how to use the NASA Earthdata Cloud in our cloud environment designed for training and first experimentation.”
Two Openscapes participants who are fledging are Dr. Aronne Merrelli from the University of Michigan and Dr. Elizabeth (Eli) Holmes from NOAA’s National Marine Fisheries Service (NMFS). The two researchers and their organizations are benefitting from their work using NASA Openscapes resources (including the NASA Earthdata Cloud Cookbook and the earthaccess Python library) and interacting with mentors and cloud data experts from NASA’s Distributed Active Archive Centers (DAACs). Their fledging experiences provide a glimpse into NASA Openscapes along with the benefits and the challenges of working with NASA and other scientific data in the cloud.
An Open Invitation to Use NASA Earth Science Data
Thousands of data collections can be freely explored and downloaded using NASA Earthdata Search. These data from satellite, airborne, and ground-based observations have a volume of more than 116 petabytes (PB) as of the end of August 2024 and are one of the largest open Earth science data collections on the planet. Moving data to the Earthdata Cloud provides greater efficiencies for using these data collaboratively, working with large data volumes, and analyzing multiple data collections simultaneously. Cloud-based data also can help further open science and efforts to make data findable, accessible, interoperable, and reusable (FAIR).
NASA Openscapes is an initiative co-led by Lowndes of Openscapes and Erin Robinson of Metadata Game Changers and is funded by NASA’s Earth Science Data and Information System (ESDIS) Project. Openscapes’ work with NASA began in 2021 as a three-year effort to grow a mentor community of data experts from across the 12 NASA DAACs to create common resources and teaching approaches to support scientific researchers using NASA Earth science data in the cloud.
The first phase of Openscapes is called onboarding. During this phase, NASA Openscapes mentors guide scientists in their first hands-on experience working with cloud-based NASA data in an open-source JupyterHub environment managed by the International Interactive Computing Collaboration (2i2c). Onboarding also includes workshops, hackathons, and learning events such as the Openscapes Champions program.
After the onboarding mentorship phase, scientists move to the fledging phase, where they set up their own cloud environment for scientific investigations and share their lessons-learned with their colleagues and in their organizations. “How do they reuse the computing environments that we developed? How do they think about storage and costs? How do they get funding [for cloud computing]? There are a lot of parts to fledging,” says Lowndes.
Leaving the Nest—Two Fledging Experiences
Dr. Aronne Merrelli, Associate Research Scientist, College of Engineering, University of Michigan
Merrelli describes himself as a “Level 1 and Level 2 algorithm scientist” who works with data from instruments managed by NASA, NOAA, ESA (European Space Agency), and other organizations, primarily in the Python coding ecosystem. He went through the NASA Openscapes Champions program in 2023 and says the cloud enables him to look at new science questions.
“I see the cloud as a new capability that’s allowing me to do analyses on big datasets that would have been hard to do on [non-cloud-based] machines,” he says. While Merrelli observes that cloud computing likely will not replace any of his existing computing environments (including his personal laptop, a research group server, and a university-based shared computing cluster), the cloud is his destination of choice for processing large datasets.
Before going through the NASA Openscapes Champions program, Merrelli says he had effectively “zero useful experience” working with data in the commercial cloud and the benefits of working in a cloud-based environment were unclear. In addition, the expenses for working in the cloud appeared to be expensive and difficult to grasp. Finally, existing packages and training materials for working in the cloud were either for model data or required specialized Python packages or formats and were not related to his work with Level 1 and Level 2 data. “There was kind of a disconnect between what was out there and what I needed,” he says.
During his Openscapes onboarding, Merrelli learned how to work with cloud-based data in the Openscapes 2i2c JupyterHub environment and experimented with setting up cloud-based workflows under the guidance of DAAC mentors. Following onboarding, Merrelli took his scientific workflows to an outside cloud-based system developed by a company called Coiled, which simplifies deployment of Python workflows to cloud computing environments. “I didn’t need to modify my workflow, I just had to add Coiled to deploy [my workflow] to the cloud when I needed it,” he says.
Merrelli observes that the cloud may not be the best choice for every task. Once a near-term task is identified that may be appropriate for cloud-based processing, he notes that it should have at least three basic characteristics:
- The task is something you were going to do anyway (e.g., you’re not creating a new task simply to do work in the cloud or experiment in the cloud)
- The task requires the use of data already in the cloud, such as a NASA dataset in the Earthdata Cloud
- The task is large enough so that doing it in the cloud will end up saving research and analysis time
Regarding costs for using cloud-based workflows, one take-home message for Merrelli following his Openscapes work is that $100s per year for cloud computing expenses will go a long way. He emphasizes that there can be significant effort in optimizing general workflows for the cloud. However, many satellite data analyses are time-consuming primarily due to the large size of the data records, and most of the processing time is consumed by data input/output (I/O). These types of workflows can be very efficient when run in cloud computing environments due to the ability to use large numbers of parallelized processes to access a dataset.
Overall, Merrelli’s Openscapes cloud experience has been a game-changer for how he now approaches his science. “[The cloud environment] feels like a superpower for processing these satellite datasets,” he says. “It’s a new capability that’s allowing new analyses and the fusion of multiple satellite data records.”
Dr. Elizabeth (Eli) Holmes, Statistician, Mathematical Biology and Systems Monitoring Program, NOAA National Marine Fisheries Service (NMFS)
Holmes currently leads the NMFS Open Science initiative. The overarching vision of NMFS Open Science is to support scientists, developers, resource managers, and policy analysts within NMFS in fulfilling NOAA’s Open Science mandates.
Holmes was an early NASA Openscapes participant, first becoming involved during the Openscapes 2021 Cloud Hackathon. This was also her first introduction to JupyterHubs. “For me to be able to drop right into this environment and have it all set up to do cool science, I was like, this is the way,” Holmes says during a session at the 2024 Earth Science Information Partners (ESIP) Summer Meeting.
After her onboarding and work with NASA Openscapes DAAC mentors, Holmes took on a leadership position in her NOAA Fisheries office to oversee training in open science and NOAA efforts toward open data, open collaboration, and open workflows in the cloud. “My focus is on helping staff—particularly the scientific staff—adopt new analysis workflows based on reproducible environments,” she says.
Through her ongoing work in cloud-based environments, Holmes observes that one of the barriers staff in large organizations face for adopting reproducible workflows is just setting up these workflow environments. She notes that while most people can set up environments that require one or two simple installations of software or processing systems, this becomes more difficult when complex environments need to be installed. Setting up environments for the analysis of large cloud-based data collections often requires IT assistance and getting the time necessary for people with the requisite expertise to work on an individual system.
One of Holmes’ realizations is that putting the software, processing systems, and other supporting elements into a container makes it much easier to accelerate innovation, as a staff scientist needs only to install the container and they have their entire cloud-based processing environment ready to go. “You set up the environment once and then thousands of people can use it,” says Holmes.
Another realization for Holmes through her Openscapes and NOAA work is that the cloud is not one-size-fits-all. Individuals from different organizations and environments have different needs that must be considered.
“When you’re thinking about fledging, we’re in this JupyterHub way up high in this Openscapes environment and it’s easy to think that the only people who need to fledge are birds,” says Holmes. “But some of those people [who need to fledge] are zebras, and zebras have very different needs, and they still need to get down from the nest. It’s important to remember that it’s not all just birds in the nest.
Learn More
- Openscapes Blog: Onboarding and “fledging”: How NASA Openscapes supports NASA Earthdata users in the Cloud (August 30, 2024)
- Openscapes Blog: First Forays into the Cloud (Aronne Merrelli; July 22, 2024)
- Openscapes YouTube Channel
- Information about upcoming Openscapes professional development opportunities