Skip to content
/ DS-2026 Public

This repository contains the course materials for DS 2026 (Computational Probability).

License

Notifications You must be signed in to change notification settings

UVADS/DS-2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Computational Probability, Fall 2024

Overview

This course is all about variation, uncertainty, and randomness. Students will learn the vocabulary of uncertainty and the mathematical and computational tools to understand and describe it.

Instructors

Section 001: Thomas Stewart
1919 Ivy Rm 348
thomas.stewart@virginia.edu
Github: thomasgstewart

Section 002: Gianluca Guadagni
1919 Ivy Rm 431
gg5d@virginia.edu
Github: gg5d

Teaching assistants

Ethan Nelson
Graduate student in Data Science
ean8fr@virginia.edu
Github: eanelson01

Lathan Gregg
Graduate student in Data Science
uua9gw@virginia.edu
Github: lathangregg

Instruction & Office hours

Format of the class: In-class time will be a combination of lectures, group assignments, live coding, and student presentations. Please note: Circumstances may require the face-to-face portion of the class to be online.

Time & Location: Tues & Thurs, Data Science building Rm 206

Section Time
1 9:30 - 10:45am
2 11:00 - 12:15am

Instructor Office Hours:

Time Location
Tuesdays @ 2pm SDS Rm 431
Tuesdays @ 4pm SDS Rm 348

TA Office Hours:

Time Primary location Alternate location*
Mondays @ noon SDS Hub SDS 4th floor puzzle space
Mondays @ 1pm SDS Hub SDS 4th floor puzzle space
Wednesdays @ 2pm SDS Hub SDS 4th floor puzzle space
Thursdays @ 4pm SDS Hub SDS 4th floor puzzle space

*If the Hub is being used for an event, office hours will be on the 4th floor landing of the SDS building.

Textbooks

The following textbooks are freely available online via the UVA library.

Understanding uncertainty by Dennis V. Lindley

Understanding Probability, 3rd edition
by Henk Tijms

Introduction to Probability: Models and Applications
by N. Balakrishnan, Markos V. Koutras, Konstadinos G. Politis

The following textbooks may also be helpful.

Probability and Statistics for Data Science
by Norman Matloff

Introduction to Probability Models
by Sheldon M. Ross

Course notes

Course notes (link)

Computing

The course will be taught using R.

Big ideas & Learning Outcomes

The following are the four ideas that I hope will persist with students after the minutia of the Poisson distribution has faded from memory. Expand each section to see the associated learning outcomes and topics.

Probability is a framework for organizing beliefs; it is not a statement of what your beliefs should be.
Learning outcomes Topics
compare and contrast different definitions of probability, illustrating differences with simple examples
  • long-run proportion
  • personal beliefs
  • combination of beliefs and data
express the rules of probability verbally, mathematically, and computationally
  • AND, OR, complement, total probability
  • simulation error (relative and absolute)
illustrate the rules of probability with examples
using long-run proportion definition of probability, derive the univariate rules of probability
organize/express bivariate random variables in cross tables
define joint, conditional, and marginal probabilities
identify joint, conditional, and marginal probabilities in cross tables
identify when a research question calls for a joint, conditional, or marginal probability
describe the connection between conditional probabilities and prediction
derive Bayes rule from cross tables
apply Bayes rules to answer research questions
determine if joint outcomes are independent
calculate a measure of association between joint outcomes
apply cross table framework to the special case of binary outcomes
  • Sensitivity
  • Specificity
  • Positive predictive value
  • Negative predictive value
  • Prevalence
  • Incidence
define/describe confounding variables
  • Simpson's paradox
  • DAGs
  • causal pathway
list approaches for avoiding confounding
  • stratification
  • randomization
Probability models are a powerful framework for describing and simplifying real world phenomena as a means of answering research questions.
Learning outcomes Topics
list various data types
match each data type with probability models that may describe it
  • Bernoulli
  • binomial
  • negative binomial
  • Poisson
  • Gaussian
  • gamma
  • mixture
discuss the degree to which models describe the underlying data
tease apart model fit and model utility
express probability models both mathematically, computationally, and graphically
  • PMF/PDF
  • CMF/CDF
  • quantile function
  • histogram/eCDF
employ probability models (computationally and analytically) to answer research questions
explain and implement different approaches for fitting probability models from data
  • Tuning
  • Method of Moments
  • Maximum likelihood
  • Bayesian posterior
  • kernel density estimation
visualize the uncertainty inherent in fitting probability models from data
  • sampling distribution
  • posterior distribution
  • bootstrap distribution
explore how to communicate uncertainty when constructing models and answering research questions
  • confidence intervals
  • support intervals
  • credible intervals
  • bootstrap intervals
propagate uncertainty in simulations
explore the trade-offs of model complexity and generalizability
Probability is a framework for coherently updating beliefs based on new information and data.
Learning outcomes Topics
select prior distributions which reflect personal belief
  • informative vs weakly informative priors
implement bayesian updating
manipulate the posterior distribution to answer research questions
Probability models can be expressed and applied mathematically and computationally.
Learning outcomes Topics
use probability models to build simulations of complex real world processes to answer research questions

Grading

Courses carrying a Data Science subject area use the following grading system: A, A-; B+, B, B-; C+, C, C-; D+, D, D-; F. The symbol W is used when a student officially drops a course before its completion or if the student withdraws from an academic program of the University.

Grading Scale:

  • 93-100 A
  • 90-92 A-
  • 87-89 B+
  • 83-86 B
  • 80-82 B-
  • 77-79 C+
  • 73-76 C
  • 70-72 C-
  • <70 F

Grades will be a weighted average of the final exam score (30%), the midterm exams (each 15%), the deliverables (20%) and homeworks (20%).

Individual homeworks are graded with a score of 0, 1, or 2. After the initial grading, students may resubmit homework within one week of feedback for an additional point. That is, an initial score of 1 can be bumped up to a 2. Likewise, a 0 can be bumped up to a 1.

Deliverables are larger assignments than homework. To complete the deliverables, you will use probability models to build simulations of complex real world processes to answer questions. Deliverables are graded like homeworks, including the opportunity to resubmit for an additional point.

Midterm exams are graded on a 100 point scale. For midterm 1, if your grade on midterm 2 or the final is higher, the higher score will replace the score for midterm 1. Likewise, for midterm 2, if your grade on the final exam is higher, the higher score will replace the score for midterm 2. For example, suppose your exams scores for the midterms and final were 72, 88, 85. For the purposes of the final grade, your exam scores would be 88, 88, 85.

Grading Homework

Homework assignments will be submitted on Gradescope. Each question on a homework will be graded as a 0, 0.5, or 1. A score of 0 means the question was left blank or there was not a good faith effort. A 0.5 means the answer was a good faith effort, but not fully correct. A 1 means the answer is correct. The total grade for the assignment will be the fraction of total points earned and total possible points. The final score for the assignment will be determined by the following rule:

$$ \begin{aligned} x = \frac{\text{Total Points Earned}}{\text{Total Possible Points}} \\ \\ \text{Final Score} = \begin{cases} 0, \quad 0 < x < 0.5 \\ 1, \quad 0.5 \leq x < 0.8 \\ 2, \quad 0.8 \leq x \leq 1 \end{cases} \end{aligned} $$

Example Scenario: Imagine an assignment has 3 questions. A student receives a 0.5, 1, and 1 on Questions 1, 2, and 3 respectively. Their total points are $0.5 + 1 + 1 = 2.5$. The total possible points, 1 point per question, was $3 * 1 = 3$. The percentage on the assignment is therefore $x = \frac{2.5}{3} = 0.8\bar{3}$. Because this score is greater than $0.8$, the student would receive a 2 on the assignment.

Summary: Individual questions are graded as 0, 0.5, 1. Entire homework assignments are graded as 0, 1, 2.

Note: Homework assignments with additional questions are NOT worth more than homework assignments with fewer. All homework assignments are graded on the 0, 1, or 2 scale and are worth an equal amount.

Resubmitting Homework

If a student receives a grade less than a 2 on a homework assignment, they have the opportunity to resubmit the assignment for additional credit. There will be another assignment page on Gradescope where the new attempt can be submitted. Resubmissions must include the original answer to the question followed by the updated response. An example resubmission is provided here.

If your answer for a question received a 0, the most points you can receive for that question in the resubmission is 0.5. If the original answer received a 0.5, it can be increased to a 1 for full credit. This incentives a good faith effort on the original attempt.

Resubmissions will be due on the Friday following the release of the grades. For example, if grades are released on Monday the resubmission will be due the Friday of the same week.

Final exam schedule

The final exam for both sections is Monday, December 16, 2024 from 9AM to noon.

2024 Calendar

Homeworks, deliverables, reading assignments, and exams will be posted on the course calendar below. Homeworks are due before the start of class.

Mon Tue Wed Thu Fri
Aug
 
27
 
29
SLIDES: Tools
Intro to R
Reports
 
Sep
 
3
DUE: HW 1
In class: Working dir,
Intro R
Optional videosFirst 5 videos of Learn R Programming

 
5
 
10
DUE: HW 2
Add deadline
 
11
Drop deadline
 
12
 
17
DUE: HW 3 
19
 
24
DUE: HW 4
In class: Exam Prep 
26
Exam 1
 
Oct
 
1
slides
slides 
3
 
8
DUE: HW 5 
10
Medical Diagnosis
CH 6 slides 
15
Fall reading day
No class
17
In class: Deliverable 1 
18
 
22
Drop (with W) deadline
DUE (by 9:30am): Deliverable 1
24
Exam review
 
29
Exam 2
 
31
Nevada Day
 
Nov
 
5
Election day
No class
7
DUE (by 9:30am): HW 6
Hands/Sequences
12
Discrete RVs 
14
DUE (by 9:30 am): HW 7
Continuous RVs
Pen Drop
19
KDE, MM 
21
In class: In class
DUE: HW 8
MLE/Bayes 
22
DUE: HW 6 Resubmission
HW 8 Resubmission  
26
DUE: HW 9
HW 7 Resubmission
Thanksgiving
No class
28
Thanksgiving
No class
Dec
 
3
 
5
[Final prep](https://tgstewart.cloud/final-exam-prep.html
)
DUE: Deliverable 2
HW 9 Resubmission
Last day of class  
10
 
12
 
16
Final Exam
 
17
 

About

This repository contains the course materials for DS 2026 (Computational Probability).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published