-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes.txt
50 lines (41 loc) · 6.67 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
IGNORE. LOOK AT README.MD. THIS IS JUST A DUMP FILE.
Links:
https://blogs.sas.com/content/iml/2018/03/07/fit-distribution-matching-quantile.html
https://arxiv.org/abs/2008.06423
https://files.eric.ed.gov/fulltext/ED599204.pdf
https://chrispiech.github.io/probabilityForComputerScientists/en/examples/grades_not_normal/
https://github.com/rsnirwan/bqme
https://discourse.mc-stan.org/t/bayesian-quantile-matching-estimation-using-order-statistics/19187/4
Note: Just a warning that since I don't have much of a background in statistics, my explanations may be oversimplified / inaccurate so please excuse any of my mistake and let me know XD
Stanford Computer Science and GradeScope https://files.eric.ed.gov/fulltext/ED599204.pdf
https://chrispiech.github.io/probabilityForComputerScientists/en/examples/grades_not_normal/
Why the logit normal?
Upper and lower bounded. Skewed. Research paper showed above shows it best fit. The logit normal is basically the normal distribution with the inverse logit function, also known as expit or logistic function, applied to it with turns the range of the distribution from [-infinity, infinity] to [0, 1]. One can intuitively think of the grade distribution as scaled logit-normal distribution where a function similar to the logistic function is applied to students' preparedness (or competency or effort or smartness) distribution, which is assumed to be approximately normal, to convert it into grades, where the range is restricted [minimum_possible_score, max_possible_score] where it is much harder to move up than to move down, think the effort it takes trying to go from a 70 average on exams to 80 averages compared to from 80 averages to 90 averages, something about diminishing return, likewise, the difference between the underlying value normal distribution for 0.5 in a logit normal vs 0.6 in a logit normal is much more different than 0.6 and 0.7
How should I fit the logit normal? Don't use 90th percentile or more unless your sample is at least 100, never use max and minimum score obtained
Apply the logistic function, inverse logit normal, to get the mean. One can also think of the logistic function, as a function representing diminishing return.
Short Answer: The logit-normal is bounded, allows for skewness, and takes into account diminishing returns. Research paper from Stanford Computer Science, in collaboration with GradeScope, showed the logit-normal outperforms the normal, truncated normal, and beta distribution.
Long answer:
There are many distributions that may come to mind when people think about the supposed distribution of grades. The most commonly being the normal. However, the normal is unbounded, range [-infinity, infinity], while grades are bounded, range [min_possible_score, max_possible_score]. One must then narrow down to bounded distributions, some of which are the truncated normal, the beta distribution, the binomial distribution, continuous analog of the binomial, and the logit-normal. The truncated normal is, then, eliminated due to its lack of skewness, whereas grade distribution is often skewed and asymmetric. The binomial is also eliminated due to its discrete nature, which lacks flexibillity. This left us with the beta distribution, the continuous analog of the binomial, and the logit-normal. The logit-normal, reasoning wise, is the best among the three considering that it is characterized by the logistic function, which takes into account diminishing return. This is reinforced by a research paper from Stanford Computer Science, in collaboration with GradeScope, [insert link] showing that the logit-normal outperforms the normal, truncated normal, and beta distribution. Anecdotally, I have also tested the logit-normal against the distributions mentioned including the continuous binomial analog, and found that is it the superior distribution in terms of goodness of fit.
Intuitively, one can think of students' abilities distribution as an approximate normal distribution but with range from [0, infinity], where it is converted into the grade distribution through an unknown function. This function takes into account diminishing return (think about the amount of effort it would take to go from a 50 to a 70 on exam, as opposed to from a 70 to a 90) and compresses an infinite range [0, infinity] into a bounded range [min_possible_score, max_possible_score]. The logistic function, similarly, also represents a function characterized by diminishing return and transform an infinite range into a bounded range, albeit from [-infinity, infinity] to [0, 1]. One can intuitively understand and see the similarity between the grade distribution and the logit-normal distribution, which arises from applying the logistic function (resembling the unknown function characterized by diminishing returns and bounded range) onto a normal distribution (resembling the students' abilities distribution).
Parameters estimation
The result is way off?!?! Please message me
Do loss function graphing contour map and loss landscape visualization
https://blogs.sas.com/content/iml/2018/03/07/fit-distribution-matching-quantile.html
# doing weighted least squares instead of ordinary least squares to combat heteroskedasticity
# since sample quantile variance depends on the quantile examined. raw sample variance, as in sample variance calculated without sample size, since we don't know what the sample size is but it
# doesn't matter since we are doing weighted least squares and only need to know the relative
# variance or weight of each quantile. interestingly, the quantile function is mu + sigma *
# sqrt(2) * erfinv(2p - 1), which is of similar form to a linear model y = mx + b, where mu is
# b, the intercept, sigma is m, the slope, and x is erfinv(2p - 1), which is the quantile
# function of a standard normal. this means we can just transform the probabilities into the
# quantile of a standard normal distribution then find mu and sigma through doing linear
# regression instead of expensive numerical approximation. weight is 1 / variance since b is
# proven to to be the best unbiased linear estimator when weighted sum of residuals is
# minimized, where weight is equal to the reciprocal of the variance of the measurement.
standard_norm_quantiles = norm.ppf(cumulative_probs)
# the logit normal distribution has been shown to be one of, if not, the best fit to exam distribution
# https://stanford.edu/~cpiech/bio/papers/gradesAreNotNormal.pdf
# https://files.eric.ed.gov/fulltext/ED599204.pdf (duplicate link in case)
# this matches with item response theory. want to do qme on normal dist since loss function
# is convex and sample quantile variance is more homogenous and normal (lol) in the normal than
# in the logit-normal