Principal Investigator (PI): Michael Rilee, Rilee Systems Technologies, LLC.
Co-Investigators: Kwo-sen Kuo, Bayesics, LLC; James Frew, University of California, Santa Barbara (UCSB); James Gallagher, Open-source Project for a Network Data Access Protocol (OPeNDAP)
Current Earth science data processing features large, centralized archives providing exceptional browse and search capabilities used by researchers who identify and then download data files to local compute/storage resources for preprocessing and integration prior to analysis. This data flow forces end-users to devote scarce resources and considerable time to the transfer, storage, and management of archived data, as well as specialist expertise in the various datasets used in their research domain.
SpatioTemporal Adaptive-Resolution Encoding (STARE) simplifies this flow by moving preprocessing activities to the archived data. This eliminates the costs of transferring, creating, and maintaining redundant, idiosyncratic local archives, which often are developed by researchers who are generally not archivists nor the expert producers of the original data. With STARE providing a unifying platform for diverse data models (swath, point, grid), Earth Observing System Data and Information System (EOSDIS) Distributed Active Archive Centers (DAACs), which archive and distribute EOSDIS data, will be able to produce higher-level products made to order for end-user researchers.
STARE addresses ACCESS17 focus area 2.1.3: Cloud Optimized Preprocessing and Data Transformation.
STARE’s spatial component (SC) has descended from the Hierarchical Triangular Mesh (HTM) spherical indexing originally developed for the Sloan Digital Sky Survey (SDSS), in which storage and computational efficiency was key. The STARE/SC recursively divides Earth’s surface into a set of quad-trees allowing any point on Earth to be identified with a single number.
STARE’s temporal component (TC) has similar properties. For observations, these STARE indices contain both location and resolution information. This, in turn, promotes efficient data placement on distributed, cloud resources and minimizes costly data transport between nodes for operations such as joining, intersecting, (conditional) subsetting, and re-gridding diverse datasets.
STARE automatically co-aligns, or harmonizes, diverse data in the cloud, placing spatiotemporally close data on the same compute/storage node for a relatively small cost in metadata. STARE thus allows diverse data to be efficiently integrated for analysis without requiring homogenization by interpolation in Cloud and provides a foundation on which existing tools and processing methods can be placed (see illustration above).
Much capability (e.g. preprocessing, searching, visualization, etc.) has been developed to support researchers’ use of Earth science data. In this project, existing tools and methods benefit from the STARE-enabled platform. This occurs via a tight integration as has been done, for example, incorporating STARE with the distributed array database SciDB, along with re-gridding functions and fast parallel, geographic intersections (as illustrated in the image on right). This also occurs when current tools are applied to the results of STARE-enabled distributed processing (e.g., fast granule intersection) in a more conventional, but cloud-based, data processing flow. When needed, sidecar files with STARE indexes can be used to bring the benefits of STARE to legacy file formats.
As a unifying platform, STARE supports conventional processing, analysis, and visualization tools, bringing the opportunity for massively increasing the amount of data researchers can use. In the longer term, as tools evolve to take greater advantage of STARE’s integrative capabilities, researchers can move from the current focus on the expensive low-level manipulation of data files to an ability to interact with Earth science data at a higher level, with query-based declarative tools and user interfaces that favor scientific inquiry rather than data management. STARE helps automate critical spatiotemporal functions while making efficient use of cloud computing, which will help eliminate the need for researchers to devote time, money, and expertise to the redundant transfer of archived data to local systems. The time and effort saved improves scientific quality and the productivity of researchers and reduces the cost-of-entry for others using EOSDIS data resources.
STARE has demonstrated its potential to address challenges associated with the variety and volume of Big Data. It is also adaptive to different compute-storage architectures. The technology will reduce processing time and is poised to flip the notorious 80/20 dilemma plaguing data science endeavors – where 80% of a researcher’s time is spent finding, cleaning, and reorganizing huge amounts of data and 20% is spent on actual data analysis.
- Core STARE library and API functions established.
- Many science usability functions, mostly spatial, implemented.
- PySTARE functional for experimental scientific work.
- OPeNDAP Hyrax integration started.
- UCSB snow cover science use case in progress.
- Basic cloud services in place. The STARE library and PySTARE API are usable and in relatively stable development.
- OPeNDAP added an initial set of STARE-aware functions and is ready for testing in an infusion environment as is the STAREmaster georeferencing file (sidecar) tool, which is a key component for deployment.
- STAREPandas ready for relatively modest datasets and will be improved for better scalability, including using STAREmaster sidecars.
- A basic set of STARE tutorials created. The project provisioned a JupyterHub with STAREindexed data and tools to aid the development, education, and training of STARE-based integrative analysis techniques.
- STARE tested in science use cases (ongoing).
Publications & Presentations (listed alphabetically)
Project Year One
Bauer, M., Kuo, K. S., Oloso, A. & Rilee, M. L. (2018). “Exploring the Spatio-temporal Connectivity of Blizzard Conditions and Mid-latitude Cyclones: A Template for a Process-based Workflow.” American Geophysical Union (AGU) Fall Meeting, Washington, D.C. Session IN24A-08, 11 December 2018.
Kuo, K.S. et al. (2019). “Best-value Data-intensive Analysis Architecture Deduced Using ‘Geo-lly’ Beans.” Earth Science Information Partners (ESIP) Summer Meeting, Tacoma, WA. 15-19 July 2019.
— (2019). “STARE and data packaging.” ESIP Summer Meeting, Tacoma, WA. 15-19 July 2019.
Kuo, K.S., Yu, H., Pan, Y. & Rilee, M. (2019). “Leveraging STARE for Co-aligned Data Locality with netCDF and Python MPI.” IEEE Geoscience and Remote Sensing Society (IGARSS) Symposium, Yokohama, Japan. Session THP1.PT: Big Data and Machine Learning - New Trends in Remote Sensing I, 1 August 2019.
Rilee, M.L. & Kuo, K.S. (2018). “The Impact on Quality and Uncertainty of Regridding Diverse Earth Science Data for Integrative Analysis.” AGU Fall Meeting, Washington, D.C. Session IN43C-0916, 13 December 2018.
Rilee, M., Kuo, K.S., Frew, J., Griessbaum, N., Gallagher, J. & Neumiller, K. (2019). “STARE Compatibility.” ESIP Summer Meeting, Tacoma, WA. 15-19 July 2019.
Project Year Two
Gallagher, J., Hartnett, E., Rilee, M. & Kuo, K.S. (2020). “STARE Companion Files for NASA Earth Science Data (Vision Paper).” International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2020), Seattle, WA, 3-6 November 2020.
Griessbaum, N., Frew, J., Gallagher, J., Rilee, M. & Kuo, K.S. (2020) “Solving Science Use Cases with STARE (Demo Paper).” ACM SIGSPATIAL 2020, Seattle, WA, 3-6 November 2020.
Griessbaum, N., Frew, J., Rilee, M., Kuo, K.S., Gallagher, J. & Neumiller, K. (2020). “STARE data frames for geospatial analysis - a high level STARE interface.” ESIP Winter Meeting, Bethesda, MD, 2-7 January 2020.
Kuo, K.S. & Rilee, M. (2020). “Analytics Optimized Geoscience Data Store with STARE-based Packaging.” 22nd EGU General Assembly, held online 4-8 May 2020.
Kuo, K.S. & Rilee, M.L. (2019). “Supporting Efficient Parallel Processing for Integrative Analysis in Cloud with STARE-based Hierarchical Packaging.” AGU Fall Meeting 2019, San Francisco, CA. Poster: IN11D-0691.
Kuo, K.S., Yu, H., Rilee, M.L., Pan, Y. & Wang, J. (2019). “STARE-based Interactive Analytics for Earth Science Big Data.” AGU Fall Meeting 2019, San Francisco, CA. Poster: IN13B-0717.
Rilee, M., Griessbaum, N., Kuo, K.S., Frew, J. & Wolfe, R. (2020). “STARE-based integrative analysis of diverse data using DASK Parallel Programming. Demo Paper.” ACM SIGSPATIAL, Seattle, WA, 3-6 November 2020 [doi:10.1145/3397536.3422346].
Rilee, M., Kuo, K.W., Frew, J., Gallagher, J., Griessbaum, N., Neumiller, K., & Wolfe, R. (2020). “STARE into the future of geodata integrative analysis.” Earth Science Informatics, accepted.
Rilee, M., Kuo, K.S., Frew, J., Griessbaum, N. & Gallagher, J. (2020). “STARE towards integrative analysis with minimized data wrangling hassle.” IGARSS 2020, virtual symposium. Paper TU2.R7.8, 29 September 2020.
Rilee, M., Kuo, K.S., Gallagher, J., Frew, J., Griessbaum, N., Hartnett, E., Wolfe, R., Heber, G. & Khalsa, S.J. (2020). “STARE-PODS: A Versatile data store leveraging the hdf virtual object layer for compatibility.” ESIP Summer Meeting (virtual), 14-24 July 2020.
Rilee, M., Kuo, K.S., Gallagher, J., Frew, J., Griessbaum, N., Neumiller, K., Wolfe, R., Yu, H. & Clark P. (2019). “STARE for scalable unification of diverse data within Earth, Space, and Planetary Science.” 2019 AGU Fall Meeting, San Francisco, CA. Poster: IN31B-0791.