Core Data Science Curriculum
DS 1: Introduction to Data Science 1 (aka CMPSC 5A)
This course introduces students to inferential thinking and computational thinking in the context of real-world problems. How does one analyze data resulting from a real-world process in order to understand the process? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership. This course has no prerequisite requirements.
DS 2: Introduction to Data Science 2 (aka CMPSC 5B)
New for Spring 2021, this course continues the themes of Data Science 1. Students explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction, and decision-making. The course focus is on languages for transforming and analyzing data; machine learning methods including regression, classification and clustering; principles behind data visualizations; concepts of measurement error and prediction; and techniques for scalable data processing. Prerequisite: DS 1.
CS 9: Intermediate Python
Intermediate topics in Computer Science using the Python programming language. Topics include object oriented programming, runtime analysis, data structures, and software testing methodologies. Prerequisite: CS 8 or Eng 3 or DS 2.
This course explores the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction, and decision-making. It focuses on quantitative critical thinking and the key principles and techniques that are needed. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing. Pre-requisite is PSTAT 120A, Math 4A, and prior experience with Python or another programming language.
Elective Courses for Domain Knowledge
Introduction to Electrical Engineering
Introduction to fundamental design problems in Electrical Engineering through programming in Python. This course is open to EE majors only, but listed here for comparison to Datascience 1.
Principles of Data Science with R
An overview of data analytic thinking through examples. Introduction to descriptive statistics and linear regression. Fundamentals of programming using R. Basic graphics in R. Relational database management systems and simple data manipulation using SQL.
Introduction to Statistical Machine Learning
This course explores Statistical Machine Learning to discover patterns and relationships in large data sets. Topics will include: data exploration, classification and regression tress, random forests, clustering and association rules. Building predictive models focusing on model selection, model comparison and performance evaluation. Emphasis will be on concepts, methods and data analysis using R. Students complete a significant class project, individual or team based, using real-world data. Prerequisite: PSTAT 120A-B and PSTAT 126.
Statistical Data Science
Overview and use of data science tools in R and Python for data retrieval, analysis, visualization, reproducible research and automated report generation. Case studies will illustrate practical use of these tools. Prerequisite: PSTAT 120B and PSTAT 10 and one course from Computer Science 8, or 16 or Engineering 3.
Big Data Analytics
This course introduces concepts of distributed data storage, retrieval, processing and cloud computing. Overview of methods for analyzing big data from both high dimensional statistics and machine learning - topics chosen from penalized regression, classification/clustering, dimension reduction, random projections, kernel methods, network clustering, graph analytics, supervised and unsupervised learning among others.
Principles of Environmental Data Analysis
This course will provide an introduction to the principles of environmental physics and their application to ecological sciences, with a focus on programming and data analysis in Python. Course activities will use data analysis to quantify environmental patterns and processes. Emphasis will be placed developing coding skills in Python and applying these skills to environmental and biophysical problems. Course goals:
- To develop expertise in the Python programming language and the use of Python’s data science stack to effectively store, manipulate, and gain insight into environmental data.
- To be able to apply this understanding to characterize data on environmental patterns and processes at varying spatial and temporal scales.
- To use data to model environmental processes of energy and mass transfer.
Data Wrangling for Economics
Using R, students develop skills in organizing economic data, learning how to summarize and display data to answer substantive economic questions. Emphasis is placed on communication of results.
Data Stories: Theory and Practice of Data-driven Narratives in the Digital Age
ENGL 146DS (New in Winter 2021)
“Data Stories” introduces students to an increasingly important genre of discourse in today’s society: data-driven narrative--e.g., as it appears in journalism; science, medical, and political reporting; business or government writing; and even some literary and artistic forms. The course brings humanities approaches such as narrative theory, genre theory, and media theory into conjunction with readings about data journalism and data visualization to ask this central question: how do you make a good story out of data? Among other assignments, students will create a project in which they take a dataset and make a narrative about it.
Data Science Use in Communication Research
This special topics course in the Department of Communication begins by asking, “What is Data Science and why should we care?” The course provides an introduction to Computational Social Sciences and an overview of the societal and ethical impacts of Big Data research. Core subject matter includes Quantitative Research Methods in Communication, Data Visualization and Interpretation, and Computational Techniques such as basic programming, network analysis, and APIs. Students will discuss case studies from actual social science research. Professor Matni will reserve 20 add codes for undergraduate data science students, available on a first-come, first-served basis. Please contact him directly for a code.
Programming for Linguists
(Course number to be assigned) - Fall 2020 Syllabus
This course is for absolute beginners, with no prior programming experience. It will teach you how to program in the Python language, as well as best practices for programming more generally. The emphasis is on ways to manage and interact with textual data, with applications drawn from areas of broad interest within linguistics.
Foundations of Computational Linguistics
LING 110 - Winter 2021 Syllabus
Computational linguistics concerns the question of how computers can represent and process human language. Computational linguists develop, analyze, and apply computational models of language, both to create language technologies, and to inform the scientific understanding of language structure and usage. In this course, you will learn about, practice, and critically evaluate some of the most foundational ideas in computational linguistics.
Advanced Computational Linguistics: Text Processing
LING 111 - Spring 2021 Syllabus
Introduction to Deep Learning
Introduction to multilayered neural networks, early models of perceptrons and associative memory; back-propagation learning; convolutional neural networks; recurrent neural networks; attention models; application to natiural language processing and computer vision. Prerequisite: open to EE, Computer Engineering, and Computer Science with upper-division standing.
Data Science Capstone
Fall: CMPSC 190DD / PSTAT 197A
Winter: CMPSC 190DE / PSTAT 197B
Spring: CMPSC 190DF / PSTAT 197C
In the first quarter of this three-course sequence, students will study data science from the systems engineering perspective, introduce and address a variety of ethical issues that arise in data science projects, and engage students in project-based learning through a series of carefully selected and curated data science studies. A major overarching goal is to prepare students to make a positive impact on the world with data-intensive methodologies. In line with this, we will study fairness, ethics, and responsible data practice. Another major focus will be on correctly interpreting, explaining, and communicating the results of analyses. This component of the course will focus on decision making under uncertainty, the role of correlation and causation, and drawing attention to common statistical traps and paradoxes that drive erroneous conclusions.
Prerequisites: Students should have a background in computing and statistics before enrolling in capstone. Suggested courses include the following:
PSTAT students: 120A, 100, 126
COMSC: 16, 130A, 165A, 165B
In the second and third quarters, students form teams and collaborate with industry partners and research labs. Current course and project information is here.
Projects in Visualizing Information
This is a ten-week comprehensive overview of visualization for Data Science, from data queries/knowledge discovery, and algorithms, resulting in projects in 2D and interactive 3D. Enrollment is limited to 15, and the course usually includes participants from Bren, COE, Geography, Statistics, Physics, Art, Political Science, etc. Students who have taken or have been teaching assistants for the course have since organized, curated paper and exhibition sessions at VISAP, Siggraph, etc. Past project results.
The Humanities and Data Science
This course explores today’s quickening mutation of the “liberal arts” into “data science,” a new universal mode of knowledge touching all fields. The course focuses on the join, but also split, between how the humanities and data science find meaning (scientific, epistemological, sociopolitical, and cultural) in patterns. Topics to be probed include: the history and present state of the humanities, the concept of “data science” (including the shape of today’s new programs and majors in the field), the idea and structures of “data,” the idea and infrastructures of “big data,” humanities corpora and datasets (including the social and ethical problem of “representative” datasets), narrativizing data, visualizing data, and interpreting data. The course includes but is not limited to approaches related to the digital humanities.