Notes
Sampling Samples
If you had a set of p90 samples, how would you get a p90 of the overall original data? Would you take the p90 of your p90 samples? Would you take the p50 median of your set of p90 samples? It turns out that neither are correct and it’s actually impossible to reliably recover original percentiles from derived percentile data:
If your original data is grouped into a [10, 10, 10, 10]
sample and many [0]
samples, your p90 samples
sould be [10, 0, 0, 0, 0...]
and your p90(p90(data))
would turn out to be 0
(your p50(p90(data))
would
turn out to be 0 too).
Even with equal sized samples, p90 samples aren’t useful for finding an overall. If we had 10
sets of samples, [1->10]
, [11->20]
, etc. until [91->100]
, our overall p90 would be 90, but our
p90(p90(data))
would be 89.