Module 2
Syllabus | Module 2
Day 1 | Monday, March 4, 2024
Module 2 Introduction (45 min)
- Welcome Back
- Interim Check In
- Where we are in Earth Systems Data Science in the Cloud
- Course Goals and Objectives
- Module Goals and Objectives
- Course Logistics
Going to Cloud9 (60 min)
- Introduction to Cloud9
- AWS Credentials Management
- Connecting Git
- Installing Packages
To the Cloud | Introduction to AWS (60 min)
- Intro to AWS
- AWS Services:
- EC2
- S3
- Sagemaker
- Lambda
Lunch and Learn
1200-1300 March 4, 2024
- Individual and Team Progress Check In
Setting up a Team Project Repo (60 min)
- Initializing GitLab Repos
- Collaborations
- Issues & Branching
- SSH Setup
Day 2 | Tuesday, March 5, 2024
Input/Output (I/O) (60min)
- Data Formats
- I/O on the Cloud
- Foundations for Performant Data Science
Team Project Check In (30 min)
- Team Name
- Project Ideation
- Project Idea Curation
Containers: Reproducible Computing Environments (60 min)
- Containers & Containerization
- Dependency Management
- Deployment, Use, and Sharing
- Using Containers on Cloud9
Rich Signell | Visiting Speaker (60 min)
- Pangeo: A community platform for open, reproducible, and scalable geoscience
Lunch and Learn
1230-1300 March 5, 2024
- Individual and Team Progress Check In
Programmatic Cloud Access (60 min)
- AWS CLI
- Boto3
- Other tools
Team Project Play (60 min)
- Intro to Exploratory Data Analysis
- Finding Data
- Getting Data on the cloud
Day 3 | Wednesday, March 6, 2024
Team Project Work (120 min)
- Review from Yesterday
- Finding Data
- Getting Data on the cloud
- Exploratory Data Analysis
Managing Containers (60 min)
- Finding Running Containers
- Entering Running Containers
- Removing Running Containers
Lunch and Learn
1200-1300 March 6, 2024
- EDA Questions and Check In
I/O in Python (60 min)
- Lazy Loading Constructs
- Tabular Data
- Gridded Data
Overleaf & LaTeX: Production Publishing (60 min)
- Overleaf Introduction
- Overleaf Configuration
- Accelerating collaboration & publication
Day 4 | Thursday, March 7, 2024
Introduction to Data Cleaning (60 min)
- Grammar of Data
- Order of Operations
- Time Complexity (Big O Notation)
- Functional Programming (Mapping)
- Building a Pipeline
- Troubleshooting
- Performance
Parallel Computing in Python | Single Machine (120 min)
- Paradigms
- Templates
- Multiprocessing, Polars, Dask
Lunch and Learn
1200-1300 March 7, 2024
- Team Project Check In
Team Presentation Training (120 min)
- Presentation Goals
- Clear Communication
- Primacy, Frequency, and Recency
- Body Language
- START Method
- Training
Day 5 | Friday, March 8, 2024
Team Presentations | EDA (120 min)
- Capstone Presentations and Feedback.
Module Wrap Up (30 min)
- Closing
- Architecting Data Product Development based on EDA.
- Interim Period
- Next Steps