New for Fall 2021: Find this course in the General Catalog as CMPSC 5A.
Instructor approval no longer required.
Overview and Learning Objectives
Data Science is a rapidly evolving field that does not have a uniformly agreed-upon definition. It is an inter-disciplinary field that uses scientific methods, statistical and computer science concepts and processes to extract and communicate knowledge and insights from data. The key component that differentiates Data Science from Computer Science or Statistics is its connection to and the need to understand the contexts from other domains such as Biology, Environmental Science, Economics, etc.
In this course, we will introduce and use critical concepts and skills in computer programming and statistical inference to analyze real-world datasets and interpret real-world phenomena. The purpose of this course is to develop some of the foundational skills needed to consume data and create information. The main theme in the course is understanding the sources of data, the variability inherent in data, biases and fallacies, and the inherent uncertainty associated with conclusions drawn from data.
By the end of the course students will have practiced and learned the following concepts:
- Foundational programming concepts:
- Memory concepts: variables, name, type, value, assignment statements, scope of variables
- Control structures: for loops, if/else, while loops
- Basic data structures: lists
- Functions: function call vs. function definition, formal vs. actual parameters (arguments)
- Input/output concepts: printing output, reading from files
- Problem solving strategies and code design:
- breaking down a problem into a sequence of steps
- abstracting specific problems into general ones, and
- finding general solutions
- Debugging programs by:
- reading code and predicting the output of programs or parts of code
- using print statements to localize program bugs
- Data Science concepts:
- Basic operations on tables: loading data, extracting columns, extracting rows
- Selecting rows that match certain criteria, composite queries, join / aggregate
- Computing summary statistics
- Exploratory Data Analysis (EDA)
- Simulating experiments
- Probability basics
- Analysis and critical thinking with a special attention to biases and fallacies, ethics and fairness
CMPSC 5A does not have any prerequisites beyond high-school algebra (and a desire to learn). The curriculum and format is designed specifically for students who have not previously taken statistics or computer science courses. The course was created to target first- and second-year students.
CMPSC 5A is unavailable to students who have taken CMPSC 8, ENGR 3, ECE 3.
Students who have taken several statistics or computer science courses should instead take a more advanced course, such as Data Science 2, CMPSC 9 (aka Intermediate Python Programming) and PSTAT 100 (Principles and Techniques of Data Science).
Prior to Fall 2019, this course was taught as INT 5. In 2020-2021, it was offered as CMPSC 90DA.
Description (from the General Catalog): Introduction to data science methods and Python programming language for students with little to no experience in the subjects. Topics include foundational programming concepts, problem solving strategies and code design, and such data science concepts as table operations, exploratory data analysis, basic probability, and more.
Offered every quarter.
Winter 2022: CMPSC 5A
Tue/Thur: 11:00-12:15 pm, BUCHN 1930
Lab 1: Weds at 10:00, GIRV 2127
Lab 2: Weds at 11:00 am, ARTS 1356
Fall 2021: CMPSC 5A
Tue/Thur: 11:00-12:15 pm, Phelps 1260
Lab 1: Weds at 10:00, Bld 387, Rm. 1011
Lab 2: Weds at 11:00 am, Girvetz 2116
Summer 2021; CMPSC 90DA
Session C (June 21 - Aug 10)
Tue/Wed/Thur: 9:30-10:20 am
Lab: Thur at 11:00 am
Mon/Wed: 11:00 - 12:15 pm
Labs: Wed at 3:00 and 4:00 pm
Mon/Wed: 11:00 - 12:15 pm
Labs: Wed at 5:00 and 6:00 pm