Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p-values: precision #21

Open
jgmbenoit opened this issue Mar 19, 2018 · 3 comments
Open

p-values: precision #21

jgmbenoit opened this issue Mar 19, 2018 · 3 comments

Comments

@jgmbenoit
Copy link
Contributor

I fitted some (discrete) data against plfit provided here and the matlab code provided by the authors of [1]: I obtain values for p that differ significantly: grossly the p-values obtain with plfit are 10 times smaller. For the attached data file sample_deglist.txt:
$ plfit -b -p exact sample_deglist.txt
gives
$ sample_deglist.txt: D 2.32465 3 -6150.54 0.0155189 0.028
So p is 0.028
With the matlab code, I get
$ sample_deglist.txt: D 2.32000 3 -6150.56 0.126800
[I run the matlab code with octave 4.2.1.]
Any idea ? Otherwise, have you implemented formula (3.11) in [1], or something else ?

@ntamas
Copy link
Owner

ntamas commented Mar 19, 2018

I haven't implemented the reweighting in (3.11) so that's one possible source of the discrepancy. The calculation of the D value of the KS test is here -- feel free to poke around and let me know if you find something suspicious. The p-value is then simply calculated by generating artificial samples from the fitted power-law distribution, and comparing the D values obtained from the artificial samples with the D value of the real sample.

@jgmbenoit
Copy link
Contributor Author

Okay, from where comes the implemented formula: fabs( 1 - hzeta(alpha, x) / hzeta(alpha, xmin) - m / n) ?

@ntamas
Copy link
Owner

ntamas commented Apr 2, 2018

Sorry for the late reply - lots of things to be done at work. Anyway, the test statistic of the one-sample KS test is simply the maximum of the absolute value of the difference between the "theoretical" CDF and the observed CDF. In the formula above, m / n is the observed CDF (n is the number of samples, m counts the number of samples less than x, while x iterates over the sorted list of samples). The remaining part (i.e. 1 - hzeta(alpha, x) / hzeta(alpha, xmin)) should then be the value of the CDF of the power-law function at x if the power-law behaviour starts at xmin and has an exponent alpha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants