Introducing PODPAC: the Easy Way to Analyze NASA and Non-NASA Earth Science Data Via the AWS Cloud
Observational and modeled data products from agencies such as NASA, ESA, and academic institutions encompass petabytes of scientific data available for analysis, analytics, and exploitation. Unfortunately, these data sets are highly underutilized by the scientific community due to: (1) vast computational resource requirements; (2) disparate formats, projections, and resolutions that hinder data fusion and integrated analyses across different data sets; (3) complex and disjoint data access and retrieval protocols; and (4) task specific and non-reusable code development processes that hinder algorithm sharing and collaboration. In response, NASA programs such as Earth Observing System Data and Information System (EOSDIS) are actively investigating migration of their vast data archives to storage on commercial cloud services such as Amazon Web Services (AWS). However, to maximize the benefit of cloudbased data storage, cloud-based data analysis and analytics are needed to process data “close” to where it is stored. Recognizing that migrating workflows to the cloud requires a high degree of cloud computing expertise, we have developed the Pipeline for Observational Data Analysis and Collaboration (PODPAC). PODPAC is a Python library designed to automatically harmonize disparate data sources, seamlessly access NASA earth science data, and analyze data in the AWS cloud. PODPAC is built around the tools of the Python data ecosystem (NumPy, Scipy, XArray) and aims to bridge the gap between data sources, analysis, and the cloud. Short-course participants will receive hands-on experience working with PODPAC to analyze earth science data on the AWS cloud.
The primary goal of this course is to enable participants to efficiently work with multiple earth science datasets in the cloud using PODPAC. Participants will work through a series of handson demonstrations using Python-based Jupyter notebooks to understand: (1) basic PODPAC concepts such as coordinates, nodes, and pipelines; (2) data retrieval of NASA SMAP and NOAA GFS data; (3) data harmonization and analysis; (4) how to transition an application to the cloud; and (5) how to customize PODPAC for specialized applications. These examples will build towards deploying a SMAP-based drought-monitor application on AWS Lambda, where the data is retrieved and processed on-demand. The science behind this drought monitor will also be briefly covered by the NASA SMAP science team leader. At the conclusion of the course, participants will understand how PODPAC can accelerate their analysis of multiple disparate data sources, and how it can help them to easily transition their workflows to the AWS cloud.
Instructors include the main developers of the PODPAC library, and the science team leader of the SMAP program, Dr. Dara Entekhabi.
Participants are expected to bring their own laptops to be able to participate in the hands-on exercises.
For more information please contact Matt Ueckermann ([email protected]).
All short course/workshop attendees must register and wear a badge/ribbon. Short course/workshop registration is not included in the 99th Annual Meeting registration, and short course/workshop registration does not include registration for the 99th AMS Annual Meeting.