```
viewof n = Inputs.range([30,100],{ step: 1, label: 'sample size' })
Plot.plot({
style: { },
grid: true,
x: { label: `uniform quantiles →`, line: true },
y: { label: `↑ observed quantiles`, line: true },
marks: [
Plot.link({length: 1}, {
x1: 0,
x2: 1,
y1: 0,
y2: 1,
}),
Plot.dot({length: n}, {
x: d3.range(n).map(i => (i+1)/(n+1)),
y: d3.sort( Array.from({length: n}, d3.randomUniform()) )
}),
]
})
```

Quantile-Quantile plot (also known as Q-Q plot) is an extremely useful visual tool for exploratory data analysis (EDA). A Q-Q plot is not particularly a summary of data, rather an *informal* assessment of goodness of fit to discern the disparity of two distributions. Quantiles from one distribution (usually from data) is plotted against those of another distribution (usually a theoretical, known model). For more examples and discussions, see [1].

# Quantiles

Since the concept of a Q-Q plot is based on quantiles. We begin with the definition of quantiles of a probability distribution.

## Theoretical Quantiles

**Definition 1 (Theoretical Quantiles)** For any p\in[0,1], a pth **quantile** of a random variable X is defined to be that value x_p\in\mathbb R such that \mathbb P(X\leq x_p)=p.

In other words, the probability that X realizes a value not greater than a pth quantile is p. For p=\frac{1}{2}, x_p is commonly known as **median** of X. If F(x) denotes the CDF of X, one notes that x_p=F^{-1}(p), provided the CDF F(x) is *invertible*^{1} near x_p.

Quantiles are not unique

In general, quantiles are not unique. Easy examples can be found when X is discrete. For an example on the continuous side, take X\sim\mathrm{unif}([0,1]) to see that any number not less than 1 is a pth quantile for p=1. See the CDF of X below to convince yourself.

Moving forward, we assume that the X is a continuous random variables and that its CDF F is a strictly increasing, continuous function, at least on an interval of the real line. As a consequence, the CDF is invertible everywhere and the pth quantile x_p=F^{-1}(p) is uniquely defined. Examples of such distributions include the exponential, \chi^2- and F-distribution on (0,\infty), normal and Student’s t-distribution on \mathbb R, the Beta distribution on (0,1), etc.

**Exercise 1** Find a continuous distribution with the expected value 0 and whose CDF is only intertible on a **bounded** interval of the real line.

## Observed Quantiles

While the quantiles of a probability distribution can be concretely defined (Definition 1), there have been quite a few conventions for the assignment of quantiles for a batch of observations or a dataset. Although, for a large sample they make little to no difference for a descriptive analysis. We use the following convention:

For a random sample X_1, X_2, \ldots, X_n of size n, the order statistics are denoted by X_{(1)}\leq X_{(2)}\leq\ldots\leq X_{(n)}. And, the k/(n+1) quantile of data is assigned to X_{(k)}, the kth-order statistic.

# Plotting and Studying Q-Q Plots

## Observed vs Theoretical Quantiles

Let us consider a sample of n from a uniform distribution from [0,1]. As proved in

### Testing Uniform Random Generator

## Two Observed Batches

# Conclusion

# References

[1]

M. B. Wilk and R. Gnanadesikan, “Probability plotting methods for the analysis of data,”

*Biometrika*, vol. 55, no. 1, pp. 1–17, 1968, Available: http://www.jstor.org/stable/2334448. [Accessed: Aug. 07, 2022]## Footnotes

A function f:A\to B is called invertible near a point x_0\in A if there is an interval I containing the point x_0 such that f is a bijective map when restricted on I.↩︎

## Citation

BibTeX citation:

```
@online{majhi2024,
author = {Majhi, Sushovan},
title = {Quantile-Quantile {Plots}},
date = {2024-06-30},
url = {https://smajhi.com/tutorials/data-science/qq},
langid = {en}
}
```

For attribution, please cite this work as:

S.
Majhi, “Quantile-Quantile Plots,” Jun. 30, 2024. Available:
https://smajhi.com/tutorials/data-science/qq