Big Data Meet Open Science

A NASA-funded project is making synthetic aperture radar (SAR) data easier to work with—and preparing NASA for a SAR data explosion.
author-share

NASA’s Jet Propulsion Laboratory (JPL) in Southern California is a long way from the Alaska Satellite Facility (ASF) at the University of Alaska Fairbanks. When it comes to synthetic aperture radar (SAR) technology, however, the two couldn’t be closer.

JPL is the location of the Advanced Rapid Imaging and Analysis (ARIA) Project. ARIA, a collaboration between JPL and Caltech, uses SAR and global positioning system (GPS) observations to develop state-of-the-art methods for detecting and measuring ground change and for developing physical models. ASF is the home of NASA’s ASF Distributed Active Archive Center (ASF DAAC), which archives and distributes NASA’s collection of SAR data, including SAR data produced by ARIA.

An ARIA-led open science initiative involving teams from ASF and scientists from across the U.S. is making SAR products available rapidly and more easily used without the need for complex processing. This project is part of NASA’s Advancing Collaborative Connections for Earth System Science (ACCESS) program. In keeping with NASA policy, all data, products, and code developed from this work are fully open and available. The result is an open science success story.

The Project

SAR is a technology of choice for analyzing landform change over time or assessing terrain change in the aftermath of natural events such as earthquakes and volcanic eruptions. When two SAR images acquired over the same location but at different times are combined to create an interferogram, researchers can visualize elevation change as colored fringes and these changes can be precisely measured.

Image
ARIA-produced InSAR image showing surface displacements caused by an eruption of Hawaii's Mauna Loa volcano between November 22 and December 4, 2022. Each color cycle (blue-green-yellow-orange-red) represents about 2.8 centimeters of surface motion towards the satellite or away from the satellite. Sentinel-1 data provided by ESA (European Space Agency). The image contains modified Copernicus 2022 data, processed by ESA and analyzed by NASA-JPL. Credit: NASA/JPL-Caltech/Grace Bato and Paul Lundgren.

One downside to SAR and interferometric SAR (InSAR) data, though, is that the data volumes are very large. In addition, processing SAR data is computationally intensive and the software tools for working with these data are often difficult to install and use. The ARIA-led team is mitigating these challenges through a NASA-funded project called Enabling Cloud-Based InSAR Science for an Exploding NASA InSAR Data Archive. And NASA’s InSAR data archive is, indeed, on the verge of explosive growth.

The upcoming NASA/Indian Space Research Organisation SAR mission (NISAR; scheduled for launch in 2024) is expected to generate more than 50 petabytes (PB) of data each year over its three-year mission (in comparison, NASA’s Earth Observing System Data and Information System, EOSDIS, had a total archive volume of approximately 83 PB at the end of April 2023). In addition, the JPL-led Observational Products for End-Users from Remote Sensing Analysis (OPERA) project is generating SAR-based products for surface water extent, surface displacement, and surface disturbance to meet inter-agency needs identified in the 2018 Satellite Needs Working Group biennial assessment.

Image
Examples of OPERA products. Inset images (left to right): Firth River Yukon, Canada, with data showing drainage basin. Credit: USGS/John Jones; Lava boiling out of the Kilauea Volcano, Hawaii, USA, with interferogram. Credit: ASI/NASA/JPL-Caltech; Firefighting helicopter carrying water bucket to extinguish a forest fire with data image of burned area. Credit: Hansen/UMD/Google/USGS/NASA. These higher-level OPERA products are helping alleviate the need for extensive SAR data processing. Credit: OPERA/NASA JPL-Caltech.

Hosting these vast volumes of SAR data in the cloud enables global teams to collaboratively work with the data using an internet connection and allows this collection to scale easily as its volume increases. While putting these data in the cloud sounds straightforward, the project team is working to overcome some large hurdles.

“The InSAR community is not yet used to the idea of cloud-based processing,” says Dr. David Bekaert, the project’s principal investigator (PI) as well as the ARIA PI. “We’re focusing on getting the community used to processing alongside the data and having the ability to scale processing to cover very large areas quickly. We’re not improving the imagery; we’re improving the tools for using the imagery.”

Bekaert notes that the project already has:

  • Improved DAAC utilities for data discovery, in-place access, and cloud-based pre-processing
  • Enabled virtual data access and handling in open-source SAR software tools
  • Prepared the community for cloud-based processing through training workshops and interactive Jupyter Notebook tutorials
  • Created the ARIA Sentinel-1 Geocoded Unwrapped Interferogram (ARIA-S1-GUNW) product, which is now one of the largest publicly available InSAR archives

Processing of raw SAR data was accomplished by building workflows from JPL's InSAR Scientific Computing Environment version 2 (ISCE2) into ASF's Hybrid Pluggable Processing Pipeline (HyP3). ISCE2 is an open source, modular software that performs InSAR processing of raw SAR data, and HyP3 enables the project to utilize ISCE2 at scale to generate ARIA-S1-GUNW interferograms. These interferograms are then delivered to the ASF DAAC for archiving and distribution. Currently, more than 1 million ARIA-S1-GUNW InSAR products are available through the DAAC.

“The [ARIA-S1-GUNW] products provide a low barrier to entry into InSAR analysis, and the open-source ARIA-developed tools get users in and moving quickly,” says Dr. Joseph H. Kennedy, a senior research software engineer at ASF and a co-investigator on the InSAR cloud project. “It’s just a really good product and it’s awesome to be able to grow this archive so substantially to prove new science capabilities and create better products.”

The Open Solution

Image

A key project element is that all the code, software, and tools developed by the project team are fully and openly available. “We developed everything in the open-source domain; everything is reproducible with documentation and community members are encouraged to actively engage with us, see the development of products, and contribute to product development,” Bekaert says. “This is important since we are developing capabilities that can benefit others. We’re enabling people to spend more time doing science.”

Bekaert is quick to note that open source is not the same as open science. “Reproducibility and documentation and the ability for people to take whatever you have produced, make their own modifications, and apply these to their own applications—this has been a significant focus of our work,” he says, stressing the open science aspect of this project.

Making this work open provides better transparency, allows for greater oversight, and enables users to improve the products and find issues. “By making these products available to everybody, they start using them in ways you may have never considered,” Bekaert says. “The more eyes you have on the product, the more scrutiny and feedback you receive, which leads to a stronger product.”

Kennedy at ASF agrees. “The story of open science is open collaboration,” he says. “Just making repositories open source is not quite enough. You really needed to develop openly, and you need to work with the community and build a community. That open collaboration, the ability to work across institutions, is really, really powerful.”

Community Use

Project products are already being used in investigations into land change, such as studies of plate tectonic motion and rock physics in Tibet and for mapping vertical land motion in California. Bekaert notes that the ability to provide cloud-based scalability for low-latency, large volume InSAR processing is invaluable for disaster response. Capabilities developed by the research team have been used to respond to volcanic events in Portugal and Hawaii along with seismic events in Mexico, California, and Turkey/Syria.

InSAR imagery play a key role in Dr. Zhong Lu’s studies of volcanic activity along the Aleutian Islands in far western Alaska. Lu, a professor of geophysics at Southern Methodist University in Dallas, TX, and his research team are using nearly 5,000 InSAR images to better understand how volcanoes work in this far northern part of the Ring of Fire. Lu is also a member of the InSAR in the cloud project team and is testing the efficacy of the project’s capabilities in his research.

“InSAR processing is time-consuming and can take a lot of disk space, particularly over a large region such as the Aleutian volcanic arc,” says Lu. “Cloud-based processing speeds up work with these data. Add open science, and this makes it even more powerful.”

Lu notes that many volcanoes in the Aleutian Arc are not instrumented with ground-based GPS networks; satellite-acquired InSAR imagery are necessary for mapping volcanic deformation and forecasting eruptions. He also observes that the project’s openness is beneficial.

“Open science enhances science because we can take advantage of open algorithms and improve or adopt them for our study,” Lu says. “Additionally, many open science codes have been used and validated by various users; open science facilitates collaborative efforts by various groups, which further improves the algorithms. Finally, open science allows results to be easily reproducible and comparable.”

Findings and papers resulting from Lu’s research will be published in open journals to further enable work by the global community. “By making these data open, people can add their own knowledge to [them],” he adds.

The (Open) Road Ahead

As the InSAR in the cloud ACCESS project comes to the end of its funding, Bekaert notes that all project-related code and documentation are available through public GitHub libraries. ARIA-S1-GUNW products have been vetted by NASA and delivered to the ASF DAAC; these are available for downloading without restriction. Bekaert anticipates that more workflows will continue to be added to the cloud.

Kennedy at ASF mentions a future possibility of making the ARIA-S1-GUNW archive a user-driven archive. “We have an open-source, on-demand processing pipeline that enables users to request products for where they want them and where they will be most useful to them, and we can then make these products available to everybody,” he says. “This will truly make the archive represent what the science community is interested in. This is a very open, collaborative effort to transform the way [these] data are viewed and used.”

As a data user, Lu is still getting used to working with cloud-based InSAR data, but acknowledges that this is the future of how he and his colleagues will need to work with these data. “NASA processes data for you; all you need to do is go to the cloud to get these data,” he says. “Now we can take giant steps forward. This is the way to do it.”

Explore ARIA-S1-GUNW Interferograms

Last Updated