```
viewof n = Inputs.range([30,100],{ step: 1, label: 'sample size' })
Plot.plot({
style: { },
grid: true,
x: { label: `uniform quantiles →`, line: true },
y: { label: `↑ observed quantiles`, line: true },
marks: [
Plot.link({length: 1}, {
x1: 0,
x2: 1,
y1: 0,
y2: 1,
}),
Plot.dot({length: n}, {
x: d3.range(n).map(i => (i+1)/(n+1)),
y: d3.sort( Array.from({length: n}, d3.randomUniform()) )
}),
]
})
```

Quantile-Quantile plot (also known as Q-Q plot) is an extremely useful visual tool for exploratory data analysis (EDA). A Q-Q plot is not particularly a summary of data, rather an *informal* assessment of goodness of fit to discern the disparity of two distributions. Quantiles from one distribution (usually from data) is plotted against those of another distribution (usually a theoretical, known model). For more examples and discussions, see (Wilk and Gnanadesikan 1968).

# Quantiles

Since the concept of a Q-Q plot is based on quantiles. We begin with the definition of quantiles of a probability distribution.

## Theoretical Quantiles

**Definition 1 (Theoretical Quantiles) **

For any \(p\in[0,1]\), a \(p\)th **quantile** of a random variable \(X\) is defined to be that value \(x_p\in\mathbb R\) such that \(\mathbb P(X\leq x_p)=p\).

In other words, the probability that \(X\) realizes a value not greater than a \(p\)th quantile is \(p\). For \(p=\frac{1}{2}\), \(x_p\) is commonly known as **median** of \(X\). If \(F(x)\) denotes the CDF of \(X\), one notes that \(x_p=F^{-1}(p)\), provided the CDF \(F(x)\) is *invertible*^{1} near \(x_p\).

Moving forward, we assume that the \(X\) is a continuous random variables and that its CDF \(F\) is a strictly increasing, continuous function, at least on an interval of the real line. As a consequence, the CDF is invertible everywhere and the \(p\)th quantile \(x_p=F^{-1}(p)\) is uniquely defined. Examples of such distributions include the exponential, \(\chi^2\)- and F-distribution on \((0,\infty)\), normal and Student’s t-distribution on \(\mathbb R\), the Beta distribution on \((0,1)\), etc.

**Exercise 1 **

Find a continuous distribution with the expected value \(0\) and whose CDF is only intertible on a **bounded** interval of the real line.

## Observed Quantiles

While the quantiles of a probability distribution can be concretely defined (Definition 1), there have been quite a few conventions for the assignment of quantiles for a batch of observations or a dataset. Although, for a large sample they make little to no difference for a descriptive analysis. We use the following convention:

For a random sample \(X_1, X_2, \ldots, X_n\) of size \(n\), the order statistics are denoted by \(X_{(1)}\leq X_{(2)}\leq\ldots\leq X_{(n)}\). And, the \(k/(n+1)\) quantile of data is assigned to \(X_{(k)}\), the \(k\)th-order statistic.

# Plotting and Studying Q-Q Plots

## Observed vs Theoretical Quantiles

Let us consider a sample of \(n\) from a uniform distribution from \([0,1]\). As proved in

### Testing Uniform Random Generator

## Two Observed Batches

# Conclusion

## References

Wilk, M. B., and R. Gnanadesikan. 1968. “Probability Plotting Methods for the Analysis of Data.”

*Biometrika*55 (1): 1–17. http://www.jstor.org/stable/2334448.## Footnotes

A function \(f:A\to B\) is called invertible near a point \(x_0\in A\) if there is an interval \(I\) containing the point \(x_0\) such that \(f\) is a bijective map when restricted on \(I\).↩︎

## Citation

BibTeX citation:

```
@online{majhi,
author = {Sushovan Majhi},
editor = {},
title = {Quantile-Quantile {Plots}},
date = {},
url = {https://smajhi.com/tutorials/data-science/qq},
langid = {en}
}
```

For attribution, please cite this work as:

Sushovan Majhi. n.d. “Quantile-Quantile Plots.” https://smajhi.com/tutorials/data-science/qq.