Syllabus

Term 2025 Winter 2: January - April 2026

Course info

Instructors:
Katie Burak
Website: https://katieburak.github.io/
Email:
Slack: @Katie Burak

Gabriela V. Cohen Freue
Website: https://gcohenfr.github.io/
Email:
Slack: @gcohenfr

Office hours:
See Canvas for times and locations.

Course webpage:
WWW: https://ubc-stat.github.io/dsci-200/
Canvas: https://canvas.ubc.ca/courses/

Lectures/Labs:
See Canvas for times and locations.

Prerequisite:
DSCI 100

Course objectives

The course objective is to train students in navigating the many aspects of data needed to successfully work through the data stage of the data analysis lifecycle. Students will broaden their applied statistical knowledge base and skillset required to dive into statistical and computational modelling in subsequent courses. Students will learn how to explore different layers of the data structure, consider the limitations introduced by the study design, acknowledge security and ownership features related to data, and address problems encountered in the data at hand. The course also aims to demonstrate how simulation studies can be carried out to examine properties of estimators and algorithms (i.e., beyond the data stage). The course reinforces and refines the computational skills (i.e., writing computer scripts), tools and resources learned in DSCI_V 100 to analyze data, as well as to draw and communicate appropriate conclusions.

Learning outcomes

  • Constructively reflect on how the data was collected for the statistical question being asked. Identify improvements and where improvements cannot be made and clearly discuss the limitations of the study design with regards to the conclusions that can be drawn for the question being asked.

  • Determine the data acquisition strategy needed for a given data source and use common data science tools to write reproducible computer scripts to read the data into the chosen programming language from that data source.

  • Identify when data simulation is a useful technique for assessing an analysis method, as well as plan and carryout appropriate simulations based on the task at hand.

  • Identify outliers and data anomalies, justify and apply strategies for managing the these, and reflect on the consequences with regards to the conclusions of the chosen method.

  • Identify when and why data are missing, justify and apply strategies for managing the missing data, and reflect on the consequences with regards to the conclusions of the chosen method.

  • Plan and carry out exploratory data analysis using statistical and visualization techniques for the purpose of generating hypotheses and planning a further valid statistical analysis.

  • Appraise the impact of measurement scales, units, significant digits, and measurement error on data analysis, as well as using simple methods to mitigate these effects on results of the analysis.

  • Evaluate and justify the data privacy needs for a given data analysis, and where needed, choose and apply an appropriate data privacy technique. Reflect on the consequences with regards to the conclusions of the chosen method.

  • Evaluate and justify who owns the data for a given case.