Data Management and Machine Learning at #AMS2022

Check out previews of two IMPACT presentations scheduled for the American Meteorological Society’s 2022 annual meeting.
author-share

We are back from the holiday break and more than ready for the American Meteorological Society’s 2022 annual meeting. Several IMPACT team members will be presenting at #AMS2022. Below are previews of two of the presentations.

Deborah Smith will be spotlighting the recent work of the Airborne Data Management Group (ADMG) in the presentation titled “Improving the Discoverability of NASA Airborne and Field Investigation Data.” Airborne and field investigations produce complex, and highly heterogeneous data products that play an important role in NASA’s Earth Science research efforts. However, these data are lesser-known and more difficult to discover due to the various approaches to data stewardship, limited, metadata content and quality, different data archival organization, and distributed access. As a result, airborne and field data are largely under-utilized by research communities beyond the original science teams.

This presentation will describe the many efforts carried out by ADMG to develop an inventory of all current and historical NASA airborne data. Current status and content of the inventory effort will be described. In addition, ADMG facilitates the transition of hidden data to NASA data centers. This data archeology work vastly improves discovery of and access to valuable historical data and information, some of which would otherwise be lost. Through these efforts, ADMG improves data reuse and enhances NASA’s return on investment.

Image
Screenshot of the Catalog of Archived Suborbital Earth Science Investigations (CASEI) website
Home page of the CASEI tool
 

In June, ADMG released the web-based user interface of CASEI, the Catalog of Archived Sub-orbital Earth Science Investigations. CASEI serves as a data discovery portal for users wishing to learn more about NASA airborne and field investigations and their associated data products. The highly linked and detailed metadata allows for complex search queries and provides many types of needed background information to support appropriate data reuse. A demonstration of CASEI’s latest features and capabilities will be shared in the presentation.

.  .  .

Machine learning has risen to the forefront for use in solving various problems in scientific research. The machine learning approach differs from traditional problem solving methods by modeling presented data using stochastic processes (i.e., random probability distributions) as opposed to deterministic processes. This method presents unique challenges to overcome for successful implementation, namely data parallelization and scalable computation. Muthukumaran R will be demonstrating how IMPACT is working to address these challenges in the presentation “Enabling Scalable Machine Learning Pipelines for Earth Science using Cloud Based Services.”

While machine learning algorithms are being widely adopted across the scientific community, setting up scalable data and computation environments are a difficult barrier to overcome. Modern cloud providers offer services that address these challenges by automatically provisioning the environment which enables scientists to focus solely on the algorithm details. This presentation will show how SageMaker, an Amazon web service that aims to accelerate ML research, can be used for an Earth science use-case.

The presentation will also showcase the ImageLabeler tool: a cloud-native labeling tool for labeling Earth science events. The tool simplifies the importation of labeled datasets into cloud environments such as SageMaker.

Last Updated