Spread throughout the United States, NASA’s 12 Distributed Active Archive Centers (DAACs) process, archive, document, and distribute NASA Earth science data, services, and tools to support scientists and researchers working in specific disciplines. At the same time, the DAACs also work in concert with one another to provide reliable, robust services to data users whose needs cross the traditional boundaries of a particular field of study. It’s a big and multifaceted job that requires a skilled and technologically savvy staff to complete. To help the DAACs achieve their missions and to provide real-world skills and experience for the next generation of data scientists, managers, and archivists, NASA’s DAACs enlist the assistance of interns eager to expand their abilities and expertise.
During the summer of 2023, five DAACs hosted interns who worked on everything from developing new datasets and application programming interfaces (APIs), to machine learning applications and data visualization tools. The following is a snapshot of the young men and women working around the country to help DAACs advance Earth science discoveries.
Alaska Satellite Facility Distributed Active Archive Center (ASF DAAC)
Kaleb Burris, Sophomore, Computer Science, University of Alaska Fairbanks (UAF)
Samuel Gallagher, Senior, Computer Science, UAF
Burris and Gallagher were part of an ASF DAAC development team tasked with building a new Sentinel-1 Superseding Search API that allows users to easily search for an updated granule ID when the ID they have has been superseded. Burris, Gallagher, and their fellow team members worked on both cloud-based and on-premise architectures for the API. They also built an automated build-and-deploy system to facilitate rapid and reproducible prototyping. The advanced techniques used to develop the API will make this a flexible solution that can be integrated with several popular applications, like the ASF's Vertex Data Search.
Jacob Jakiemiec, Graduate, B.S. in Computer Science, UAF
Andrew Kozak, Graduate, B.S. in Computer Science, UAF
Lily Larsen, Sophomore, Political Science and Philosophy, Reed College (Portland, Oregon)
Jakiemiec, Kozak, and Larsen were part of the ASF DAAC’s artificial intelligence (AI) team whose focus is on building a neural network to determine whether a synthetic aperture radar (SAR) interferogram contains deformation. Initially, the application focused on detecting earthquakes and volcanoes, but the team worked on and improved the application’s Python code, training data, documentation, software packaging, and user interface. The team led the ASF initiative to learn about and develop cutting-edge AI and machine learning (ML) applications.
Sumana Sahoo, Ph.D. candidate, Department of Natural Resources and Environment, UAF
Sahoo’s research focuses on the development of products to monitor forest health in Alaska via remote sensing. This summer, she investigated the potential of SAR products made available by the ASF DAAC to monitor forest disturbances such as insect damage and moisture stress using the capabilities of OpenScienceLab to perform SAR change-detection and time-series analyses. ASF is dedicated to working with Ph.D. students to help them incorporate SAR data into their studies. Sahoo's research has helped ASF expand the functionality of their applications to suit new use cases, specifically using OpenScienceLab with multiple data sources.
Victor Devaux-Chupin, Ph.D. candidate, Department of Geosciences, UAF
Devaux-Chupin’s research focuses on coding an operational flood forecasting system based on previous work from NASA's HydroSAR project. This summer, he worked on integrating Deep Learning methods to assist flood forecasting and explored variables that are the primary drivers of floods in specific areas (e.g., rain, river discharge, etc.). The project’s ultimate goal is to propose various algorithms to help monitor hydrological-related hazards (floods, landslides) from SAR satellite imagery. Devaux-Chupin’s research aligns well with ASF’s AI efforts and its desire to use SAR processing to predict potential flood events.
Goddard Earth Sciences Data and Information Services Center (GES DISC)
Lisette Kamper-Hinson, Senior, Computer Science, University of Mississippi (Oxford, Mississippi)
Armin Rezaiyan-Nojani, Sophomore, Computer Science, Montgomery College (Rockville, Maryland)
During their internship with NASA's GES DISC, Kamper-Hinson and Rezaiyan-Nojani worked on a collaborative effort designed to harness the capabilities of open-source language models combined with vector databases and knowledge graphs. Specifically, their aim was to enhance the extraction and discoverability of information from unstructured sources like Portable Document Format (PDF) documents that include items such as README files, data product user guides, Algorithm Theoretical Basis Documents (ATBDs), and similar content; integrate this information into a knowledge graph; and combine the information with structured data sources like collection metadata. This approach demonstrated the potential of merging modern computational tools with both structured and unstructured data reservoirs to produce enriched, comprehensive insights.
Zachariah Abueg, Graduate, B.S. in Mathematics, University of Central Florida (Orlando, Florida)
Abueg worked to create visualizations using the emerging technology of virtual and extended reality to display geospatial data from GES DISC in a new way. This work resulted in novel visualizations that were fun to interact with and learn about, and that demonstrated the potential for new ways of using geospatial data to share information.
Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)
Meba Tadesse, Junior, Computer Science, University of Maryland-College Park
Tadesse’s work examined how downloading varies as a function of time-since-publication for ORNL DAAC datasets. Her work identified different types of download patterns and demonstrated that the week-to-week variability of individual dataset downloads is substantial. This work will be useful in determining how the ORNL DAAC applies different storage tiers for its data in the Earthdata Cloud.
Physical Oceanography Distributed Active Archive Center (PO.DAAC)
Eric Pham, Junior, Computer Science, University of California-San Diego
Using simulated Surface Water and Ocean Topography (SWOT) mission data from NASA's PO.DAAC, Pham performed a cost-benefit analysis of various cloud storage formats (e.g., NetCDF, ZARR-Python, and Kerchunk) in both local and cloud environments and benchmarked metrics of performance, such as loading/reading time, statistical operations, and time series analysis. The results of this project will lead to recommendations to help SWOT users access and work with SWOT data in a cost-effective manner.
Zoë Walschots, Senior, Information Systems and Business Analytics and Marketing, Loyola Marymount University (Los Angeles, California)
Walschots worked on a project to integrate the earthaccess Python library for NASA Earthdata into existing PO.DAAC Cookbook tutorials and expand existing tutorials to show additional use cases. earthaccess was created to provide streamlined programmatic search and access to NASA Earth science data regardless of storage location. This library reduces processes that once required multiple blocks of code to just a few lines of code, expanding the accessibility of NASA Earth science data to more users. According to PO.DAAC staff, Walschots’ work has been pivotal in bringing PO.DAAC tutorials up to date with the latest DAAC innovations and standardized access patterns presented across DAAC tutorials.
Ayush Nag, Senior, Computer Science, University of Washington-Seattle
Nag worked on a project to build a cloud-based dashboard to visualize and analyze SWOT mission sea surface height data. This dashboard will help researchers, including members of the SWOT science team, perform analyses on the full, high-resolution dataset. As a result of Nag’s work, users can simply download the notebook, connect to PO.DAAC data in the cloud, and begin manipulating the data. The creation of this user-friendly dashboard is noteworthy, as the SWOT mission is expected to generate as much as 20 terabytes (TB) of data each day. Nag's efforts address the challenge of working with this large amount of data by enabling users to easily construct a data analysis pipeline encompassing data exploration, subsetting, and higher-level analysis within a single notebook, thereby streamlining the process for users.
Socioeconomic Data and Applications Center (SEDAC)
Ljupcho Atanasov, Senior, Computer Science, City University of New York-Lehman College (Bronx, New York)
Christina Deodatis, Master’s candidate, Climate and Society, Columbia University (New York, New York)
Djakaridia Diabagate, Master’s candidate, Mathematics, City University of New York-Lehman College
Hieu Tran, Master's candidate, Computer Science, City University of New York-Lehman College
Atanasov, Deodatis, Diabagate, and Tran spent their summer working at an intersection of geospatial analysis and social science with SEDAC staff, and helped in the development of Version 5 of the DAAC’s popular Gridded Population of the World (GPW) dataset. Using data at the most detailed spatial resolution available from Population and Housing Censuses conducted in recent years, the GPW dataset models the distribution of the human population on a continuous global raster surface and provides a spatially disaggregated population layer compatible with datasets from social, economic, and Earth science disciplines as well as remote sensing. These attributes make the GPW dataset an important source of globally consistent and spatially explicit data that researchers and policymakers can use to make informed decisions for communities around the world.
Ryan Huber, Master’s candidate, Sustainability Science, Columbia University
Huber's summer internship project involved collecting, organizing, and processing data, and assessing methods of predicting regional economic factors to create a worldwide model that will result in a gridded economic map with contributions by sector for each pixel. Such a map will help fill gaps in knowledge about regional economic sectors around the globe and help with disaster risk reduction planning efforts by highlighting exposed assets by sector.
NASA’s highly-competitive Internship Program brings together college and graduate school students (along with recent graduates and qualified high school students) to work on projects at NASA centers and facilities across the nation. Internships are available throughout the year, with summer internships lasting a minimum of 10 weeks and fall and spring internships lasting a minimum of 16 weeks. Detailed information and an electronic application can be found on the NASA Internships and Fellowships website.