Low percentile values do not seem to be calculated correctly #20

homiak · 2019-05-24T09:41:58Z

Here are the percentile calculated using this library (first column) and percentiles calculated using github.com/montanaflynn/stats library (second column)

Target percentiles:

calcQuantiles = []float64{0.001, 0.01, 0.03, 0.05, 0.1, 0.2, 0.3, 0.4, 0.50, 0.90}

time="2019-05-24T08:53:00Z" level=info msg="Total: 62748 samples"
time="2019-05-24T08:53:00Z" level=info msg="0.1th: 182, 187"
time="2019-05-24T08:53:00Z" level=info msg="1.0th: 4494, 2826"
time="2019-05-24T08:53:00Z" level=info msg="3.0th: 7982, 8024"
time="2019-05-24T08:53:00Z" level=info msg="5.0th: 11528, 11689"
time="2019-05-24T08:53:00Z" level=info msg="10.0th: 15766, 15519"
time="2019-05-24T08:53:00Z" level=info msg="20.0th: 19606, 19680"
time="2019-05-24T08:53:00Z" level=info msg="30.0th: 20909, 20915"
time="2019-05-24T08:53:00Z" level=info msg="40.0th: 21205, 21206"
time="2019-05-24T08:53:00Z" level=info msg="50.0th: 21347, 21350"
time="2019-05-24T08:53:00Z" level=info msg="90.0th: 21470, 21470"

time="2019-05-24T08:54:00Z" level=info msg="Total: 64905 samples"
time="2019-05-24T08:54:00Z" level=info msg="0.1th: 318, 187"
time="2019-05-24T08:54:00Z" level=info msg="1.0th: 4106, 2838"
time="2019-05-24T08:54:00Z" level=info msg="3.0th: 9910, 8039"
time="2019-05-24T08:54:00Z" level=info msg="5.0th: 11709, 11720"
time="2019-05-24T08:54:00Z" level=info msg="10.0th: 15785, 15536"
time="2019-05-24T08:54:00Z" level=info msg="20.0th: 19635, 19685"
time="2019-05-24T08:54:00Z" level=info msg="30.0th: 20962, 20917"
time="2019-05-24T08:54:00Z" level=info msg="40.0th: 21221, 21207"
time="2019-05-24T08:54:00Z" level=info msg="50.0th: 21351, 21351"
time="2019-05-24T08:54:00Z" level=info msg="90.0th: 21466, 21471"

time="2019-05-24T08:54:52Z" level=info msg="Total: 66775 samples"
time="2019-05-24T08:54:52Z" level=info msg="0.1th: 823, 187"
time="2019-05-24T08:54:52Z" level=info msg="1.0th: 3790, 2844"
time="2019-05-24T08:54:52Z" level=info msg="3.0th: 9398, 8046"
time="2019-05-24T08:54:52Z" level=info msg="5.0th: 11577, 11730"
time="2019-05-24T08:54:52Z" level=info msg="10.0th: 15679, 15538"
time="2019-05-24T08:54:52Z" level=info msg="20.0th: 19635, 19683"
time="2019-05-24T08:54:52Z" level=info msg="30.0th: 20907, 20917"
time="2019-05-24T08:54:52Z" level=info msg="40.0th: 21201, 21208"
time="2019-05-24T08:54:52Z" level=info msg="50.0th: 21346, 21352"
time="2019-05-24T08:54:52Z" level=info msg="90.0th: 21466, 21471"

Method used to log the above

   	log.Printf("Total: %d samples", quantiles.Count())
  	for _, quantile := range calcQuantiles {
		percentile, err := stats.Percentile(samples, quantile*100)
		if err != nil {
			log.Errorf("failed to calc percentile - %s", err)
		} else {
			log.Infof("%.1fth: %.0f", quantile*100, quantiles.Query(quantile), percentile)
		}
	}
   }

Please note a big difference in low percentile numbers between the 3 print outs above. Even though one would expect some differences at say 0.1th percentile because 3 print outs above contained different sample numbers. Still there is unexpectedly large variability with only a limited sample number change. Also notice that percentile calculated with github.com/montanaflynn/stats does not display the same variability, which rules out the possibility that the samples added were somehow very different to the previous lot.

So overall I think this library calculates low percentiles incorrectly or the method is not really suitable for this.

I have the data that was used for calculating the above and happy to share it. Just let me know if you want it. (it's less than 1Mb file)

The text was updated successfully, but these errors were encountered:

frioux · 2019-11-18T00:28:22Z

The whole magic of this algorithm is that instead of calculating exact percentiles it adds an error (called epsilon) to allow more efficient recording of the summaries. The default in this library is 0.01, so a requested percentile of 0.01 is actually somewhere between zero and 0.02. You can request an epsilon of zero but I suspect the cost for that would be too high eventually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low percentile values do not seem to be calculated correctly #20

Low percentile values do not seem to be calculated correctly #20

homiak commented May 24, 2019 •

edited

Loading

frioux commented Nov 18, 2019

Low percentile values do not seem to be calculated correctly #20

Low percentile values do not seem to be calculated correctly #20

Comments

homiak commented May 24, 2019 • edited Loading

frioux commented Nov 18, 2019

homiak commented May 24, 2019 •

edited

Loading