Examples of Ongoing Data Science Projects
Researchers across campus have been engaged in data science in the context of diverse application domains. The examples below illustrate both the breadth of data science-related work occurring at UC Santa Barbara and demonstrate the interdisciplinary nature of work on campus. These include applications in which text and graph analytics, multimodal big data, and human-computer interfaces define the basis of inquiry.
In today’s computerized and information-based society, people are inundated with vast amounts of text data, ranging from news articles, social media posts, scientific publications, to a wide range of textual information from various vertical domains (e.g., corporate reports, advertisements, legal acts, medical reports). How to turn such massive and unstructured text data into structured, actionable knowledge, and how to enable effective and user-friendly access to such knowledge is a grand challenge.
Networked data arises in a number of application domains ranging from IoT, cloud computing, software analysis, neuroscience, biology, geography, to social sciences. Accordingly, network/graph analysis has emerged as a major paradigm for exploring complex processes behind observed data. Compared to high dimensional data, analysis of network data is more challenging due to interdependencies between entities, the presence of attributes, and the natural evolution of networks over time.
Multimodal Big Data
Unstructured multimodal data (1D time sequences, including audio, and 2d/3d/4d/5d images) is routinely collected but is hard to analyze. Ranging from social sciences to biology, remote sensing to materials research, such data come from a variety of sources: web pages, camera networks, mobile sensors, smart phones, microscopes, satellite and aerial imagery, and medical instruments, to name a few. This data is typically dynamic, spatial, and heterogeneous; in other words, the data are disparate and obtained with non-uniform sampling in space and time. Researchers often collate multiple datasets, collected in a variety of different ways, which quantify different aspects of the same scientific process of interest. Thus, it is increasingly rare that hard questions can be answered with experimental data of a single type. To address these challenges, we need new techniques that combine multiple data sources effectively in order to deliver new scientific insights.
Environmental Data Science
Environmental problems are becoming increasingly complex, requiring multi-disciplinary approaches to address them. These problems can no longer be solved solely with a disciplinary focus, and they increasingly demand data-driven solutions. The rise of big data and new technologies for observing earth systems and the human actions that rapidly change and respond to the environment demand that resource management and conservation decisions be informed by data in a fully transparent and repeatable way. These demands are shifting the landscape of the skills that are needed to tackle environmental problems.
Affiliated Projects and Centers
- Broom Demography Center
- National Center for Ecological Analysis and Synthesis (NCEAS)
- Earth Research Institute (ERI)
- Brain Initiative
- Quantitative Biology Initiative and BMSE
- IGERT on Network Science
- Center for Spatial Studies
- Center for Information Technology and Society (CITS)
- Institute for Energy Efficiency (IEE)
- Center for Bioengineering
- Center for Bioimage Informatics
- Mellon-funded WhatEvery1Says project (WE1S)
- UCSB Smart Farm
- Where's the Bear? (WTB)
- Aristotle Project
- Center for Responsible Machine Learning