Quantile-Quantile Plots

Visualization builds intuitions – math makes them concrete

Author

Sushovan Majhi

Published

September 20, 2022

Quantile-Quantile plot (also known as Q-Q plot) is an extremely useful visual tool for exploratory data analysis (EDA). A Q-Q plot is not particularly a summary of data, rather an informal assessment of goodness of fit to discern the disparity of two distributions. Quantiles from one distribution (usually from data) is plotted against those of another distribution (usually a theoretical, known model). For more examples and discussions, see (Wilk and Gnanadesikan 1968).

Quantiles

Since the concept of a Q-Q plot is based on quantiles. We begin with the definition of quantiles of a probability distribution.

Theoretical Quantiles

Definition 1 (Theoretical Quantiles)

For any \(p\in[0,1]\), a \(p\)th quantile of a random variable \(X\) is defined to be that value \(x_p\in\mathbb R\) such that \(\mathbb P(X\leq x_p)=p\).

In other words, the probability that \(X\) realizes a value not greater than a \(p\)th quantile is \(p\). For \(p=\frac{1}{2}\), \(x_p\) is commonly known as median of \(X\). If \(F(x)\) denotes the CDF of \(X\), one notes that \(x_p=F^{-1}(p)\), provided the CDF \(F(x)\) is invertible1 near \(x_p\).

Quantiles are not unique

In general, quantiles are not unique. Easy examples can be found when \(X\) is discrete. For an example on the continuous side, take \(X\sim\mathrm{unif}([0,1])\) to see that any number not less than \(1\) is a \(p\)th quantile for \(p=1\). See the CDF of \(X\) below to convince yourself.

(a) PDF

(b) CDF

Figure 1: The density (left) and cumulative distribution (right) functions of uniform \([0,1]\) are shown by the blue lines. The CDF is only invertible on the support.

Moving forward, we assume that the \(X\) is a continuous random variables and that its CDF \(F\) is a strictly increasing, continuous function, at least on an interval of the real line. As a consequence, the CDF is invertible everywhere and the \(p\)th quantile \(x_p=F^{-1}(p)\) is uniquely defined. Examples of such distributions include the exponential, \(\chi^2\)- and F-distribution on \((0,\infty)\), normal and Student’s t-distribution on \(\mathbb R\), the Beta distribution on \((0,1)\), etc.

Exercise 1

Find a continuous distribution with the expected value \(0\) and whose CDF is only intertible on a bounded interval of the real line.

Observed Quantiles

While the quantiles of a probability distribution can be concretely defined (Definition 1), there have been quite a few conventions for the assignment of quantiles for a batch of observations or a dataset. Although, for a large sample they make little to no difference for a descriptive analysis. We use the following convention:

For a random sample \(X_1, X_2, \ldots, X_n\) of size \(n\), the order statistics are denoted by \(X_{(1)}\leq X_{(2)}\leq\ldots\leq X_{(n)}\). And, the \(k/(n+1)\) quantile of data is assigned to \(X_{(k)}\), the \(k\)th-order statistic.

Plotting and Studying Q-Q Plots

Observed vs Theoretical Quantiles

Let us consider a sample of \(n\) from a uniform distribution from \([0,1]\). As proved in

Testing Uniform Random Generator

Two Observed Batches

Conclusion

References

Wilk, M. B., and R. Gnanadesikan. 1968. “Probability Plotting Methods for the Analysis of Data.” Biometrika 55 (1): 1–17. http://www.jstor.org/stable/2334448.

Footnotes

  1. A function \(f:A\to B\) is called invertible near a point \(x_0\in A\) if there is an interval \(I\) containing the point \(x_0\) such that \(f\) is a bijective map when restricted on \(I\).↩︎

Citation

BibTeX citation:
@online{majhi,
  author = {Sushovan Majhi},
  editor = {},
  title = {Quantile-Quantile {Plots}},
  date = {},
  url = {https://smajhi.com/tutorials/data-science/qq},
  langid = {en}
}
For attribution, please cite this work as:
Sushovan Majhi. n.d. “Quantile-Quantile Plots.” https://smajhi.com/tutorials/data-science/qq.