In a typical summer, NASA's Goddard Space Flight Center in Greenbelt, Maryland, would be home to a group of NASA’s Earth Science Data and Information System (ESDIS) Project interns tasked with advancing initiatives to support ESDIS’s ongoing efforts to process, archive, and distribute NASA Earth science data. Summer 2021 has not been typical, though. ESDIS staff are telecommuting, as they have been since mid-March of last year, and the same is true for personnel all of NASA’s facilities at universities and other locations that host or support NASA Earth Observing System Data and Information System (EOSDIS) Distributed Active Archive Centers (DAACs). Yet, even with pandemic restrictions in place, NASA Earth observing data are still getting processed, archived, and distributed. And, thanks to interns working from home across the country this summer, important ESDIS and DAAC projects are moving ahead.
This summer, ESDIS has two interns: Vanessa Chatman, a graduate student in computer science at Northeastern University, and Andrew Cramer, a graduate student studying computer science with a focus on artificial intelligence at Texas A&M University. Under the guidance of their mentors, EOSDIS System Architect Dr. Christopher Lynnes and EOSDIS computer scientist and former ESDIS intern Vincent Inverso, Chatman and Cramer have spent their 10-week internships enhancing ESDIS’s Usage-based Discovery Tool (UBD), which helps users navigate NASA datasets based on how they have been used in applications and research articles. Their task was to construct a machine learning pipeline to automate the classification of research articles by geophysical topics. Further, this project is also intended to serve as a roadmap on how to add machine learning to an ongoing project while using existing machine learning tools and models. Once fully integrated, this machine learning solution will create a wider representation of how NASA Earth science data products are used and provide the end-user with a set of usage-relevant datasets based on an article’s topic classification.
Both interns have found their experience of working on machine-learning pipeline academically and personally rewarding.
“[This internship has] given me a lot more experience in the other side of data science that I hadn’t really gotten a chance to address in my classes in terms of data curation and standard processing techniques, and also getting the chance to build this machine learning pipeline from the ground-up,” said Cramer.
Chatman agreed.
“My program is in computer science with sub-concentrations in machine learning and software engineering. I’m very early in the machine-learning track, so I’ve been getting a lot of hands-on experience with machine learning algorithms and data science, and a lot of exposure to well modularized Python code,” she said. “It’s definitely helping me in the sense that, this coming semester, I’ll be taking my first full-fledged machine learning course.”
Chatman and Cramer readily admit their virtual internship experience has presented them with a few challenges in terms of logistics, networking with peers, learning from NASA, and experiencing everything that being on Goddard's campus, but both have been able to adjust.
“It seemed like the start was a little slower,” said Cramer. “This is how it was with classes as well, just getting everything coordinated. I think we got over that pretty quickly and now that we’ve got everything coordinated and gotten to know one another and get used to one another’s schedules, it’s gotten easier.”
Chatman reported similar bumps along with way, but noted at least one advantage too.
“There have been a lot of concept discussions that I would have probably had a quicker grasp of if I’d been in a room with someone and we were able to look at the same resources, but it hasn’t been anything major.” said Chatman. “And there have been a couple of benefits to being virtual. For instance, Andrew and I have been trying to figure out how to coordinate this really huge document share. If we were in the same space, we would have found some quick workaround, but it’s been helpful to find what online tools I can use to get a very large file where it needs to go, so there’s been a greater reliance on the cloud.”
The mentors experienced a few challenges with the shift to virtual internships too.
“I won’t deny it’s a little more difficult to make those personal connections because we can’t meet each other face-to-face,” Lynnes said. “I think we’re doing as well as can be done in the virtual environment, but I know from past years we’d be dropping in all the time on the interns. So, we do miss out on some of the contact, but I’m surprised how well some of these virtual internships have been going once we got the logistics straightened out.”
At the same time, the prospect of having virtual interns has allowed ESDIS to open the opportunity to a larger pool of applicants.
“A lot of the interns that we’ve had recently we would not have even been able to include in the internship program because of where they live and the constraints that introduces,” Inverso said. “So, we’ve had a little bit better access to talented people like Vanessa and Andrew because of this new paradigm.”
Clearly, ESDIS has profited by putting Chatman’s and Cramer’s talents to good use.
“One of the major goals of ESDIS is to be a steward of scientific data and a major part of that is making sure scientists have easy access to that data. The broader project that Andrew and Vanessa are working on, called Usage Based Discovery, is a big component of that data discovery process,” said Inverso. “What they’re doing with this machine learning — automating this process of creating content within the UBD tool — is a critical part of the discovery process and will help scientists discover data.”
At the time we spoke, Chatman and Cramer had a few weeks left before their internships came a close. Yet, neither hesitated when I asked if they would recommend an ESDIS internship to their peers.
“I would definitely recommend a NASA summer internship,” said Cramer. “My reasons for wanting to get into a NASA internship to begin with was the nature of the work. It seemed like there was a great opportunity for very interesting projects and I think that’s what I found here.”
Chatman said she already had.
“Who doesn’t want to intern with NASA?” she asked.
Both Chatman and Cramer attribute their positive experiences to the commitment and dedication of their mentors.
“At a most basic level, internships are a learning opportunity, so we make sure that we’re setting them up for success in terms of learning and, vice versa, to ensure that we’re learning from them,” said Inverso.
In addition to being a learning opportunity for the Chatman and Cramer, Lynnes hopes the work they produce will result in a learning experience for the entire organization.
“We’re looking for this project to help ESDIS, and the wider EOSDIS community, get a better idea of not only what machine learning can do for us as data systems professionals, but also to give the organization a little bit more of an idea of what’s involved in going from end to end in a machine learning project,” he said. “The machine learning classification part of [this project] is brand new, so that has been something of a challenge, but the benefit of that is now they will be able to teach the organization — this is how you start a machine learning project, this is how you build a machine learning pipeline.”
Chatman and Cramer will have the chance to do that teaching, and hone their presentation skills, when they present their work to the program at NASA headquarters, and members of the EOSDIS community.
CSDA Interns Build Resource for Commercial Data User Community
The 2021 summer interns with NASA’s Commercial SmallSat Data Acquisition (CSDA) program — Nia Asemota, a rising senior studying computer science and biomolecular science at New York University, and Damian Ugalde, a rising senior at California Polytechnic State University Pomona pursuing a double major in computer science and aerospace engineering — enjoyed their NASA internship experiences as well.
“Would I recommend this? Yes,” said Asemota. “Through this experience I’ve learned what it means to be a software engineer. I know everything was virtual, but just to get a sense of all the meetings that go on, all the back-end stuff, all the planning, making the requirement sheets, what happens in a scrum — it was really insightful.”
Ugalde appreciated the opportunity to make connections and learn new technologies.
“I learned a lot about new technologies that I had never tried before, which was very helpful,” he said. “Most importantly, I made connections and talked to people. Seeing everyone on the team and seeing everyone in the scrums, it was very helpful.”
A component of NASA’s Earth Science Data Systems (ESDS) Program, CSDA was established to identify, evaluate, and acquire data from commercial sources that support NASA's Earth science research and application goals.
Asemota and Ugalde supported CSDA by working on what their mentor Aaron Kaulfus, applications data management research assistant at NASA’s Marshall Space Flight Center in Huntsville, Alabama, refers to as a “data-in-action concept” that will make a significant contribution to the user community.
“They’ve been developing a content management system and a web interface that will allow not only our team, but potentially outside users as well, to write up short articles, write snippets of code, and then publish them, with review, out to the public,” he said. “They’re really contributing to the building of a resource based around the commercial datasets that we have and that’s really going to benefit the end user community and our ability to communicate and bring more people into this commercial data use world.”
“What they’re working on, we have plans to pull it in operationally, so they are not only learning, but they are building a solution that we will be able to use,” said Alfreda Hall, a computer engineer and project manager with CSDA who also serves as a mentor to Asemota and Ugalde. “They should feel very proud.”
Like Lynnes and Inverso, Kaulfus and Hall strove to help their mentees learn about NASA, learn about the larger effort they were supporting, learn what it means to work as part of a team, and, of course, grow their skills both as coders and computer scientists.
“Everything we do is very much a team effort,” said Kaulfus. “There’s no one siloed off, so helping them get integrated in that process was a big part of my job.”
Working virtually made that more challenging than usual, but, like their ESDIS-intern peers, Asemota and Ugalde worked through it.
“I missed just being able to see people and getting instant feedback,” Ugalde said. “It’s different, you don’t get the same feel when you do it by text. But something that I really appreciate from both of our mentors is that we meet every day and we meet face-to-face, virtually. That was very helpful.”
Asemota was equally grateful.
“I’m a people person, I love collaborating and working on teams, so having that aspect removed and having to communicate with everyone through a screen was challenging at first,” she said. “I appreciate that we were able to work in this capacity and still contribute to the team effort.”
And contribute they did. As the ESDIS and CSDA virtual summer internships come to an end, the interns can return to their studies with the knowledge that, after a summer of hard work, virtual meetings, and daily check-ins with their mentors, they were able to make the best of a challenging situation to advance ESDIS and CSDA, and make the Earth observation data in NASA’s holdings more accessible to users around the globe.
2021 EOSDIS DAAC Summer Interns
Asemota, Chatman, Cramer, and Ugalde weren’t the only interns whose worked supported the mission and goals of the ESDIS project. Scattered across the United States were several other summer interns supporting a wide range of projects at EOSDIS DAACs.
The DAACs process, archive, document, and distribute data from NASA's past and current Earth-observing satellites and field measurement programs and, acting in concert, provide reliable, robust services to users whose needs may cross the traditional boundaries of a science discipline — all while continuing to support the particular needs of users within the discipline communities.
Five of NASA’s 12 DAACs hosted interns the summer. The following is a snapshot of the young men and women working around the country to help NASA’s DAACs provide users with the data, tools, and resources they need to advance Earth science discoveries.
Atmospheric Science Data Center (ASDC)
Located in the Science Directorate at NASA's Langley Research Center in Hampton, Virginia, ASDC supports more than 50 projects and provides access to more than 1,000 archived datasets. These datasets were created from satellite measurements, field experiments, and modeled data products. ASDC projects focus on the Earth science disciplines: Radiation Budget, Clouds, Aerosols, and Tropospheric Composition.
This year, ASDC had two interns: Landon Clime, a rising senior studying computer science and mathematics at William and Mary, and Kathy LaMarsh, who graduated from D’Youville College in May with a master’s degree in nutrition and dietetics. Both Landon and Kathy worked with the ASDC’s Earth Venture Sub-Orbital Support Team.
Clime explored the possibility of automating assignment of standard variable names to non-standard variable names in data files that generally require subject matter expert intervention. He also supported NASA’s In-situ Net Flux within the Atmosphere of the Earth (INFLAME) program, which deploys instruments on small aircraft that provide direct measurements of the net flux radiative, flux divergences, and heating rates in clear sky, in the presence of aerosols, and in the presence of cloud particles, by providing Python scripts for instrument ground calibration analyses.
LaMarsh supported quality assurance and quality control efforts that ensured the platform and variable information associated with the North Atlantic Aerosols and Marine Ecosystems Study (NAAMES) campaign, which investigated the potential connection between the plankton life cycles and the formation of clouds, were put in ASDC’s Sub-Orbital Order Tool (SOOT) correctly. SOOT supports data discovery for users within the sub-orbital community and aims to promote sub-orbital research and analysis within the Earth science disciplines of radiation budget, clouds, aerosols, and tropospheric composition.
According to mentors Lindsay Parker, ASDC group manager, and Makhan Virdi, ASDC scientist, both interns added tremendous value to the ASDC’s Earth Venture Sub-Orbital Support Team.
“From metadata quality control to streamlining the ingest prep processes that encourage enhanced data, search and discovery, they provided the team with solutions to meticulous problems that will have long-term benefits for suborbital data archival and distribution,” the mentors said.
Crustal Dynamics Data Information System (CDDIS)
CDDIS was established at Goddard in Greenbelt, Maryland, in 1982 as a dedicated data bank to archive and distribute space geodesy related data sets. Today, CDDIS archives and distributes mainly Global Navigation Satellite Systems (GNSS), satellite laser ranging (SLR), Very Long Baseline Interferometry (VLBI), and Doppler Orbitography, and Radio-positioning Integrated by Satellite (DORIS) data for an ever-increasing user community of geophysicists.
CDDIS had one summer intern this year, Egan R. Jett-Parmer, a rising senior at Virginia Polytechnic Institute studying aerospace engineering. Jett-Parmer worked on the development of a CDDIS dashboard designed to present data and metrics for GNSS, SLR, DORIS, and VLBI, as well as error reporting, rates of ingest, and other key performance indicators (KPIs). The dashboard also provides real-time updates to the CDDIS team, letting them know if any essential servers are down, if file ingest rates are lower than expected, the amount of duplicate file uploads, and more. Further, the project is expected to grow as the dashboard is connected to the MariaDB database, a community-developed, commercially supported fork of the MySQL relational database management system.
“Egan has done an incredible amount of work in a very short time for NASA’s CDDIS team,” said Justine Woo, CDDIS lead software developer, and Taylor Yates, CDDIS Metrics, DORIS, and VLBI operations lead and software engineer. “The team wished to develop an internal dashboard to report archive statistics in real-time and system performance for years, but never had the time or bandwidth to do so until this summer. The framework, video walk-through, and documentation that he will deliver at the end of his SSAI internship will be valuable for all CDDIS team members (including Operations, Development, Metrics, and System Administrators) to streamline their diverse reporting to management in a sleek, comprehensive, cohesive, qualitative, and quantifiable manner.”
Goddard Earth Sciences Data and Information Services Center (GES DISC)
Located at Goddard in Greenbelt, Maryland, GES DISC archives and supports datasets applicable to several NASA Earth Science Focus Areas, including atmospheric composition, water and energy cycles, and climate variability. GES DISC had four interns this summer: Nathaniel Crosby, a rising junior studying computer science at Amherst College; Rohan Dayal, a rising junior majoring in computer science at Columbia University; Sam Smith, a rising senior studying computer science and math at Clemson University; and Kristina Stoyanova, a rising junior majoring in computer science at the California Institute of Technology (CalTech). Each intern led a project designed to enhance users’ experience of the GES DISC archive and then presented the results of that effort in a poster presentation.
Crosby’s project, Creating a Knowledge Graph to Connect Scientific Publications and Datasets for Improving Discovery of GES DISC’s Data and Services, discussed the creation of a Knowledge Graph prototype connecting research paper citations and data collection metadata. These relationships could serve as the basis for web applications that use this information to connect published research to GES DISC datasets and services.
Dayal’s project, Automated Classification of Scientific Publications Linked to GES DISC Datasets, focused on the development and application of machine learning classifiers to research papers that discuss the collections, algorithms, validations, and applications of GES DISC data. The DAAC collects citations for these papers and makes available to users, categorizing them based on how they present or evaluate GES DISC datasets. Currently, this process is performed manually. To determine if the process could be automated with machine learning, Dayal and his colleagues trained and monitored machine learning algorithms and then evaluated their performance. They achieved classification accuracy substantially better than the baseline accuracy, thus greatly improving the efficiency of the publication categorization.
Smith’s project, Automated Classification of GES DISC User Support Tickets (ACOUSTICS), sought to develop a machine learning model to classify user-submitted GES DISC help desk tickets as belonging to one of four categories: findability, accessibility, interoperability, or reusability. The goal of this work was to categorize submitted tickets to gain a broader understanding of the GES DISC user needs and challenges. Through experiments with machine learning and natural language processing practices, efforts that entailed the pre-processing of textual data, extracting features, and evaluating classification algorithms, Smith and his team developed a model to classify user tickets. Although their preliminary results showed their model did not possess the accuracy they were looking for, they concluded that more data would to improve the model’s performance.
Lastly, Stoyanova’s project, Improving Earth Science Dataset Search with Publications Content Via Knowledge Graph Linkage, leveraged the content of the thousands of papers based on GES DISC datasets published each year to facilitate the discovery of the datasets based on applicational research. By linking these publications and the datasets they used in a Knowledge Graph, Stoyanova and her colleagues retrieved phenomena and domain information using Semantic Web for Earth and Environment Technology Ontology and produced a set of keywords linked to the datasets. Then they evaluated the strength of these links according to the frequency of dataset usage in the papers mentioning the keywords. They found that these relationships can improve dataset search outcomes by comparing the search results obtained from Common Metadata Repository (CMR) search and publication data.
According to Dr. Jennifer Wei, lead scientist at GES DISC, the interns’ trailblazing work in artificial intelligence and machine learning applications to enhance access to GES DISC Earth science data and user services was nothing short of impressive.
“Not only have our interns demonstrated their enthusiasm, by contributing their own projects, and their team spirit, by working together, but we—their mentors—have learned from them as well,” she said. “We plan to extend these interns’ work to the next building block. Surely, they have raised the bar to the next level!”
Physical Oceanography DAAC (PO.DAAC)
Operated by NASA's Jet Propulsion Laboratory (JPL) in Pasadena, California, PO.DAAC is the official archive for NASA’s physical oceanography measurements and it manages and distributes data, tools, and resources focused on sea surface topography, ocean temperature, ocean winds, salinity, gravity, and ocean circulation. PO.DAAC had two interns this summer, Matthew Thompson, who will be starting graduate study in computer science at the University of Vermont in the fall, and Joshua Garde, a rising senior and computer science major at the California State Polytechnic University in Pomona. Both are supporting PO.DAAC’s ongoing effort to migrate its tools and services to the cloud, where users will have an easier time analyzing, manipulating, and storing data.
Thompson has been working on the application of cloud technology to support scientific use cases, such as using the cloud to extract and apply large volumes of Multi-scale Ultra-high Resolution Sea Surface Temperature (MUR SST) data from the Group for High-Resolution Sea Surface Temperature to examine thermal stress off the Hawaiian Islands.
“Matthew has been looking at use case applications of data extraction and processing in the cloud,” said Thompson’s mentor Dr. Jorge Vazquez. “The science focus has been on application of high-resolution sea surface temperature data to examine thermal stress on coral reefs off Hawaii.”
Although thermal stress has resulted in coral reef decline, not all corals respond to stress in the same way. Observations have established that during a single bleaching event, different coral species on the same reef may respond to thermal stress differently. Scientists’ ability to analyze SST data in the cloud will help them better understand the factors affecting this relationship.
Garde has been involved in similarly important work modernizing a legacy Level 2 data transformation service that merges granules into a single granule that can be used in the DAAC’s High-level Tool for Interactive Data and Extraction (HiTIDE) tool, which allows users to subset and download popular PO.DAAC Level 2 datasets.
“Joshua rewrote our legacy L2 granule concatenation service using Python,” said Garde’s mentor Stepheny Perez. “This new service will be integrated with Harmony so it can be used across other DAACs in addition to PO.DAAC. Joshua has done a fantastic job writing, testing, benchmarking, and optimizing this service. This service is required as part of the effort to migrate the HiTIDE tool to the cloud, so his work has been tremendously impactful to our team.”
Socioeconomic Data and Applications Center (SEDAC)
SEDAC is operated by the Center for International Earth Science Information Network (CIESIN), a unit of the Earth Institute at Columbia University based at the Lamont-Doherty Earth Observatory in Palisades, New York. As part of its mission to synthesize Earth science and socioeconomic data and information for policymakers and applied science users, SEDAC archives, manages, and distributes the data and tools that pertain to both the Earth and social sciences.
This summer, SEDAC had one intern, Kerri Anne Hoolihan, a graduate student at Columbia University’s School of International and Public Affairs pursuing a master’s of public administration in environmental science and policy. During her time with SEDAC, Hoolihan worked with Senior Systems Analyst and GIS Developer Kytt MacManus in support of the fifth iteration of the Gridded Population of the World (GPW) dataset. She also supported SEDAC’s collaboration with the United Nation's Second Administrative Level Boundaries (SALB) Program, which promotes accessible, interoperable, and global data and information on subnational units and boundaries, for better decisions, stronger support to people and planet, and monitoring progress toward Sustainable Development Goals.
“Kerri's work in support of the fifth iteration of SEDAC’s flagship data collection Gridded Population of the World is essential for integrating information from the 2020 round of global censuses,” said MacManus. “For the UNSALB data set, she is helping to provide officially recognized sub-national boundaries and information on historic changes to boundaries for use in a wide variety of applications, including disaster preparedness and response, and natural resources management.”
About NASA Internships
NASA’s highly-competitive Internship Program brings together college and graduate school students (along with recent graduates and qualified high school students) to work on projects at NASA centers and facilities across the nation. Internships are available throughout the year, with summer internships lasting a minimum of 10 weeks and fall and spring internships lasting a minimum of 16 weeks. (Detailed information and an electronic application can be found on the NASA Internships and Fellowships website.)