This course is all about variation, uncertainty, and randomness. Students will learn the vocabulary of uncertainty and the mathematical and computational tools to understand and describe it.
Section 001: Thomas Stewart
1919 Ivy Rm 348
thomas.stewart@virginia.edu
Github: thomasgstewart
Section 002: Gianluca Guadagni
1919 Ivy Rm 431
gg5d@virginia.edu
Github: gg5d
Ethan Nelson
Graduate student in Data Science
ean8fr@virginia.edu
Github: eanelson01
Lathan Gregg
Graduate student in Data Science
uua9gw@virginia.edu
Github: lathangregg
Format of the class: In-class time will be a combination of lectures, group assignments, live coding, and student presentations. Please note: Circumstances may require the face-to-face portion of the class to be online.
Time & Location: Tues & Thurs, Data Science building Rm 206
Section | Time |
---|---|
1 | 9:30 - 10:45am |
2 | 11:00 - 12:15am |
Instructor Office Hours:
Time | Location |
---|---|
Tuesdays @ 2pm | SDS Rm 431 |
Tuesdays @ 4pm | SDS Rm 348 |
TA Office Hours:
Time | Primary location | Alternate location* |
---|---|---|
Mondays @ noon | SDS Hub | SDS 4th floor puzzle space |
Mondays @ 1pm | SDS Hub | SDS 4th floor puzzle space |
Wednesdays @ 2pm | SDS Hub | SDS 4th floor puzzle space |
Thursdays @ 4pm | SDS Hub | SDS 4th floor puzzle space |
*If the Hub is being used for an event, office hours will be on the 4th floor landing of the SDS building.
The following textbooks are freely available online via the UVA library.
Understanding uncertainty by Dennis V. Lindley
Understanding Probability, 3rd edition
by Henk Tijms
Introduction to Probability: Models and Applications
by N. Balakrishnan, Markos V. Koutras, Konstadinos G. Politis
The following textbooks may also be helpful.
Probability and Statistics for Data Science
by Norman Matloff
Introduction to Probability Models
by Sheldon M. Ross
The course will be taught using R.
The following are the four ideas that I hope will persist with students after the minutia of the Poisson distribution has faded from memory. Expand each section to see the associated learning outcomes and topics.
Probability is a framework for organizing beliefs; it is not a statement of what your beliefs should be.
Learning outcomes | Topics |
---|---|
compare and contrast different definitions of probability, illustrating differences with simple examples |
|
express the rules of probability verbally, mathematically, and computationally |
|
illustrate the rules of probability with examples | |
using long-run proportion definition of probability, derive the univariate rules of probability | |
organize/express bivariate random variables in cross tables | |
define joint, conditional, and marginal probabilities | |
identify joint, conditional, and marginal probabilities in cross tables | |
identify when a research question calls for a joint, conditional, or marginal probability | |
describe the connection between conditional probabilities and prediction | |
derive Bayes rule from cross tables | |
apply Bayes rules to answer research questions | |
determine if joint outcomes are independent | |
calculate a measure of association between joint outcomes | |
apply cross table framework to the special case of binary outcomes |
|
define/describe confounding variables |
|
list approaches for avoiding confounding |
|
Probability models are a powerful framework for describing and simplifying real world phenomena as a means of answering research questions.
Learning outcomes | Topics |
---|---|
list various data types | |
match each data type with probability models that may describe it |
|
discuss the degree to which models describe the underlying data | |
tease apart model fit and model utility | |
express probability models both mathematically, computationally, and graphically |
|
employ probability models (computationally and analytically) to answer research questions | |
explain and implement different approaches for fitting probability models from data |
|
visualize the uncertainty inherent in fitting probability models from data |
|
explore how to communicate uncertainty when constructing models and answering research questions |
|
propagate uncertainty in simulations | |
explore the trade-offs of model complexity and generalizability |
Probability is a framework for coherently updating beliefs based on new information and data.
Learning outcomes | Topics |
---|---|
select prior distributions which reflect personal belief |
|
implement bayesian updating | |
manipulate the posterior distribution to answer research questions |
Probability models can be expressed and applied mathematically and computationally.
Learning outcomes | Topics |
---|---|
use probability models to build simulations of complex real world processes to answer research questions |
Courses carrying a Data Science subject area use the following grading system: A, A-; B+, B, B-; C+, C, C-; D+, D, D-; F. The symbol W is used when a student officially drops a course before its completion or if the student withdraws from an academic program of the University.
Grading Scale:
- 93-100 A
- 90-92 A-
- 87-89 B+
- 83-86 B
- 80-82 B-
- 77-79 C+
- 73-76 C
- 70-72 C-
- <70 F
Grades will be a weighted average of the final exam score (30%), the midterm exams (each 15%), the deliverables (20%) and homeworks (20%).
Individual homeworks are graded with a score of 0, 1, or 2. After the initial grading, students may resubmit homework within one week of feedback for an additional point. That is, an initial score of 1 can be bumped up to a 2. Likewise, a 0 can be bumped up to a 1.
Deliverables are larger assignments than homework. To complete the deliverables, you will use probability models to build simulations of complex real world processes to answer questions. Deliverables are graded like homeworks, including the opportunity to resubmit for an additional point.
Midterm exams are graded on a 100 point scale. For midterm 1, if your grade on midterm 2 or the final is higher, the higher score will replace the score for midterm 1. Likewise, for midterm 2, if your grade on the final exam is higher, the higher score will replace the score for midterm 2. For example, suppose your exams scores for the midterms and final were 72, 88, 85. For the purposes of the final grade, your exam scores would be 88, 88, 85.
Homework assignments will be submitted on Gradescope. Each question on a homework will be graded as a 0, 0.5, or 1. A score of 0 means the question was left blank or there was not a good faith effort. A 0.5 means the answer was a good faith effort, but not fully correct. A 1 means the answer is correct. The total grade for the assignment will be the fraction of total points earned and total possible points. The final score for the assignment will be determined by the following rule:
Example Scenario: Imagine an assignment has 3 questions. A student receives a 0.5, 1, and 1 on Questions 1, 2, and 3 respectively. Their total points are
Summary: Individual questions are graded as 0, 0.5, 1. Entire homework assignments are graded as 0, 1, 2.
Note: Homework assignments with additional questions are NOT worth more than homework assignments with fewer. All homework assignments are graded on the 0, 1, or 2 scale and are worth an equal amount.
If a student receives a grade less than a 2 on a homework assignment, they have the opportunity to resubmit the assignment for additional credit. There will be another assignment page on Gradescope where the new attempt can be submitted. Resubmissions must include the original answer to the question followed by the updated response. An example resubmission is provided here.
If your answer for a question received a 0, the most points you can receive for that question in the resubmission is 0.5. If the original answer received a 0.5, it can be increased to a 1 for full credit. This incentives a good faith effort on the original attempt.
Resubmissions will be due on the Friday following the release of the grades. For example, if grades are released on Monday the resubmission will be due the Friday of the same week.
The final exam for both sections is Monday, December 16, 2024 from 9AM to noon.
Homeworks, deliverables, reading assignments, and exams will be posted on the course calendar below. Homeworks are due before the start of class.
Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|
Aug |
27 |
29 SLIDES: Tools Intro to R Reports |
||
Sep |
3 DUE: HW 1 In class: Working dir, Intro R Optional videosFirst 5 videos of Learn R Programming |
5 |
||
10 DUE: HW 2 Add deadline |
11 Drop deadline |
12 |
||
17 DUE: HW 3 |
19 |
|||
24 DUE: HW 4 In class: Exam Prep |
26 Exam 1 |
|||
Oct |
1 slides slides |
3 |
||
8 DUE: HW 5 |
10 Medical Diagnosis CH 6 slides |
|||
15 Fall reading day No class |
17 In class: Deliverable 1 |
18 |
||
22 Drop (with W) deadline DUE (by 9:30am): Deliverable 1 |
24 Exam review |
|||
29 Exam 2 |
31 Nevada Day |
Nov |
||
5 Election day No class |
7 DUE (by 9:30am): HW 6 Hands/Sequences |
|||
12 Discrete RVs |
14 DUE (by 9:30 am): HW 7 Continuous RVs Pen Drop |
|||
19 KDE, MM |
21 In class: In class DUE: HW 8 MLE/Bayes |
22 DUE: HW 6 Resubmission HW 8 Resubmission |
||
26 DUE: HW 9 HW 7 Resubmission Thanksgiving No class |
28 Thanksgiving No class |
|||
Dec |
3 |
5 [Final prep](https://tgstewart.cloud/final-exam-prep.html |
||
) DUE: Deliverable 2 HW 9 Resubmission Last day of class |
||||
10 |
12 |
|||
16 Final Exam |
17 |