Data Tool in Focus: OpenScienceLab

OpenScienceLab promotes open science by reducing barriers to working with synthetic aperture radar (SAR) data in the cloud.
author-share
Image
An interferogram of Sentinel-1 data showing the location of an earthquake
This interferogram created with SAR data from Sentinel-1 acquired on February 5 and February 18, 2018, shows an earthquake fault slip on a subduction thrust fault in Mexico that caused as much as 40 centimeters of uplift. The motion from the earthquake's epicenter has been contoured with 9-centimeter color contours, also known as fringes. Credit: NASA Applied Sciences Disasters Program.

Synthetic aperture radar (SAR) satellite data have some distinct advantages over the more commonplace optical satellite observations. Because SAR uses the microwave region of the electromagnetic spectrum, it can penetrate thick cloud cover (including that associated with severe weather) and “see” in the dark, thereby allowing a unique view of flood inundation, changes in land cover, and disturbances on Earth’s surface resulting from landslides and earthquakes.

Yet, working with SAR data can be a challenge, as there are several common obstacles users may encounter as they learn to access and work with the data. SAR datasets consist of large files, which can impact users’ data download and storage capabilities. Further, working with SAR data may require the installation of complicated computing environments.

OpenScienceLab (OSL), a service managed by the Alaska Satellite Facility (ASF) at the University of Alaska Fairbanks (UAF), helps users address these challenges. ASF is the location of NASA's ASF Distributed Active Archive Center (ASF DAAC), which archives and distributes NASA's collection of SAR data. OSL provides free, limited access to a cloud-hosted JupyterHub that sits alongside the ASF data archives in Amazon Web Services (AWS), making the transfer of SAR data to users’ persistent storage volumes fast and free.

“OSL is a portal and a framework that provides several JupyterHub deployments, including OpenSARLab, which is supported by the ASF DAAC,” said OSL Developer Alex Lewandowski. “It is open to any users who come our way, and it offers access to a variety of [cloud] computing and storage resources.”

Although OpenSARLab is just one of several labs within the OSL ecosystem (the other OSL deployments are designed to host short-lived university classes or training sessions and long-term research initiatives funded by other organizations), it lies at the center of OSL thanks to its mission of opening the door to the world of SAR processing in the cloud via a JupyterLab environment for the development of algorithms and interactive data exploration.

OSL offers a range of features designed to facilitate the use of SAR data and encourage collaboration. These include:

  • Free limited access to a cloud-hosted JupyterHub
  • Free fast data transfer to users’ storage from ASF AWS archives
  • Identical, fully configured, persistent computing software and hardware environments that multiple users can share
  • An open library of data recipes
  • Use of a JupyterHub in AWS and a JupyterLab development environment, with authenticated accounts and persistent storage
  • A collaborative environment ideal for scientific work requiring large datasets, complicated development environments, and repeatability
  • Deployments tailored to specific use cases and the offer of the exact computational resources required to prevent unnecessary AWS costs
  • Custom deployments of the lab for research teams and classes

OSL’s use of JupyterHub is significant, as it gives users access to computational environments in the cloud without requiring them to perform installation and maintenance tasks. Among those environments is JupyterLab, which is software that runs Jupyter Notebooks, an interface that allows users to write Python code alongside visual and interactive diagrams and see incremental output during the development process.

Image
This workflow diagram for OpenScienceLab (OSL) shows the four main steps of working with SAR data.  Step 1 involves ordering on-demand Level-2 products from HyP3 on Vertex. Step 2 is the downloading HyP3 products into OpenSARLab. Step 3 is analyzing data and developing new algorithms in Jupyter Notebooks and Step 4 is sharing reproducible results with colleagues.
This graphic shows the four main steps of working with SAR data in ASF's OpenScienceLab. Step 1 involves ordering on-demand Level-2 products from HyP3 on Vertex. Step 2 is the downloading HyP3 products into OpenSARLab. Step 3 is analyzing data and developing new algorithms in Jupyter Notebooks. Step 4 is sharing reproducible results with colleagues. Credit: NASA's ASF DAAC.

OSL developers have created a set of SAR-related notebooks available via GitHub (link below in Additional Resources), which Lewandowski says are valuable learning tools because they’re annotated, allowing users to learn about coding while they use them. “This format intersperses documentation with code, so users can learn how to code or work with a particular algorithm in such a way that they can read about what they’re doing as they’re doing it,” he said. “They can also go in and make changes to it because that code is editable and can be re-run. It’s sort of like a playground for learning [code].”

OSL has also supplemented its JupyterHub deployment with enhanced storage options, which frees users from having to download their work by allowing them to save their work in the cloud.

“There are tools [for working in the cloud] that are similar, like Binder, except [it] provides no persistent storage. So, when you use Binder and run it through a notebook, anything you create or any changes you make must be downloaded to your computer,” Lewandowski  said. “We’ve provided a volume that users can save things to. So, they can work on their project, log off, come back later, and everything is still there as they left it.”

In addition, since data volume has a direct impact on the cost of working in the cloud, OSL’s developers have included volume management capabilities.

“We have a storage lifecycle [that] deletes volumes after four days, but every day those volumes are backed up as snapshots; two weeks down the line, when users log back in, those snapshots can be used to re-create their data volume,” said Lewandowski. “So, not only have users not lost anything, they still have all their data. These tools help keep costs as low as possible by not having users’ volumes just sticking around in the cloud for an extended period of time.”

The inclusion of these tools also makes OSL a useful service for classes, training, and collaborative research, as its user-friendliness helps users learn at the same pace as well as from their peers.

“OSL is used to teach classes on the use of SAR and for applications like seismology. One of the classes that use it is a Massive Open Online Course (or MOOC) that last year had about 400 participants,” said Sargent Shriver, the OSL Product Owner. “So, it’s simple enough that it can be used for a setting like that, but it’s powerful enough that it can be used by researchers who are performing analyses for peer-reviewed journal articles. It’s very flexible.”

OSL is also integrated with other ASF user-oriented services designed to facilitate data discovery and on-demand processing, thereby providing an end-to-end pipeline for SAR data analysis. For example, OSL interfaces with Vertex, the ASF graphical search interface for finding SAR Data, and ASF's Hybrid Pluggable Processing Pipeline (HyP3), which allows users to request SAR processing on-demand by submitting input data and a few optional parameters. These services further help avoid the complexity of SAR processing.

Minimizing complexity and generally making it easier for data users around the globe to capitalize on the benefits of SAR is what OSL is all about.

“SAR is hard. It’s always been hard,” said Eric Lundell, the OSL Lead Developer. “It used to be that you had a few hundred people around the world who spent months trying to download and process data. Now, we can have people who’ve seen very little SAR data go in and do something that used to take days [and] be done within an hour or two, which is a game changer.”

According to Shriver, lowering the barriers to working with SAR also facilitates open science

“OSL enables open science and it enables SAR science. It facilitates the analysis of SAR data and it enables people who do not have expertise in fields like computer science,” he said. "We’re in this moment where OSL is opening doors to accessing [SAR] data, and there’s this new mission—[the NASA/Indian Space Research Organisation] SAR (NISAR)—that’s going to provide a lot of exciting new data. There’s something really interesting and topical about this combination.”

Lundell concurred and says the interest in open science is what propelled ASF to expand OpenSARLab, OSL’s predecessor, into the broader OpenScienceLab ecosystem.

“NASA is making a push toward open science and we want to be part of it, so this service is a great way to do that,” he said. “The purpose is to find new ways to help people do what they need to do in their science so they can create new and better products.”

Additional Resources

Tutorials and Webinars

Last Updated