Course Description:
"Data Science 3" is currently offered through the Department of Statistics and Applied Probability as "PSTAT 100."
Overview and use of data science tools in Python for data retrieval, analysis, visualization, reproducible research and automated report generation. Case studies will illustrate practical use of these tools. This new course will focus on concepts that are relevant for data science by using some of the popular software tools in this area. Doing data science is more than using isolated methods. Creatively using a collection of concepts and domain knowledge is emphasized to clean, transform, analyze, and present data. Concepts in data ethics and privacy will also be discussed. Case studies will illustrate real usage scenarios.
Prerequisites:
- Probability and Statistics I (PSTAT 120A)
- Linear Algebra (MATH 4A)
- Prior experience with Python or another programming language (CMPSC 9 or CMPSC 16).
- Simulate, retrieve, organize, summarize, and visualize, and model data using scientific computing tools in Python.
- Practice critical thinking about the relationship between data collection and scope of inference, and assess the plausibility of assumptions required to meaningfully model real data.
- Use appropriate programming style, conventions, and practices to write readable, organized, and reproducible codes.
- Demonstrate good data science workflow and communication practices through completing a collaborative data analysis project and preparing a written summary of results.
Additional Information:
Week | Lecture Topic(s) | Lab | Assessment |
1 | Data science lifecycle | Jupyter Notebooks | |
2 | Sampling and inference | Summary statistics and simulation | |
3 | Data wrangling and tidy data | Pandas | HW1 due |
4 | Elements of data visualization | Plot types and aesthetics | |
5 | Exploratory analysis I | Density estimateion | HW 2 due |
6 | Exploratory analysis II | Dimension reduction | |
7 | Statistical models I | Simple linear regression | HW 3 due |
8 | Statistical models II | Multiple regression | |
9 | Classification | Project workflow | HW4 due |
10 | TBD | TBD | |
11 | None (finals week) | None | Project due |
Course Level:
- Undergraduate
Course Number:
Course Time:
Winter 2022
Mon/Weds: 8:00 - 9:15 am
Labs: Tues at 2 pm, 3 pm, and 4 pm
Spring 2021
Tues/Thurs: 3:30 - 4:45 pm
Labs: Wed at 10 am, 2 pm, and 6 pm