# Ancillary statistic

An **ancillary statistic** is a measure of a sample whose distribution does not depend on the parameters of the model. An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to construct prediction intervals.

This concept was introduced by the statistical geneticist Sir Ronald Fisher.

## Example

Suppose *X*_{1}, ..., *X*_{n} are independent and identically distributed, and are normally distributed with unknown expected value *μ* and known variance 1. Let

be the sample mean.

The following statistical measures of dispersion of the sample

- Range: max(
*X*_{1}, ...,*X*_{n}) − min(*X*_{1}, ...,*X*)_{n} - Interquartile range:
*Q*_{3}−*Q*_{1} - Sample variance:

are all *ancillary statistics*, because their sampling distributions do not change as *μ* changes. Computationally, this is because in the formulas, the *μ* terms cancel – adding a constant number to a distribution (and all samples) changes its sample maximum and minimum by the same amount, so it does not change their difference, and likewise for others: these measures of dispersion do not depend on location.

Conversely, given i.i.d. normal variables with known mean 1 and unknown variance *σ*^{2}, the sample mean
is *not* an ancillary statistic of the variance, as the sampling distribution of the sample mean is *N*(1, *σ*^{2}/*n*), which does depend on *σ* ^{2} – this measure of location (specifically, its standard error) depends on dispersion.

## Ancillary complement

Given a statistic *T* that is not sufficient, an **ancillary complement** is a statistic *U* that is ancillary and such that (*T*, *U*) is sufficient.[1] Intuitively, an ancillary complement "adds the missing information" (without duplicating any).

The statistic is particularly useful if one takes *T* to be a maximum likelihood estimator, which in general will not be sufficient; then one can ask for an ancillary complement. In this case, Fisher argues that one must condition on an ancillary complement to determine information content: one should consider the Fisher information content of *T* to not be the marginal of *T*, but the conditional distribution of *T*, given *U*: how much information does *T* *add*? This is not possible in general, as no ancillary complement need exist, and if one exists, it need not be unique, nor does a maximum ancillary complement exist.

### Example

In baseball, suppose a scout observes a batter in *N* at-bats. Suppose (unrealistically) that the number *N* is chosen by some random process that is independent of the batter's ability – say a coin is tossed after each at-bat and the result determines whether the scout will stay to watch the batter's next at-bat. The eventual data are the number *N* of at-bats and the number *X* of hits: the data (*X*, *N*) are a sufficient statistic. The observed batting average *X*/*N* fails to convey all of the information available in the data because it fails to report the number *N* of at-bats (e.g., a batting average of 0.400, which is very high, based on only five at-bats does not inspire anywhere near as much confidence in the player's ability than a 0.400 average based on 100 at-bats). The number *N* of at-bats is an ancillary statistic because

- It is a part of the observable data (it is a
*statistic*), and - Its probability distribution does not depend on the batter's ability, since it was chosen by a random process independent of the batter's ability.

This ancillary statistic is an **ancillary complement** to the observed batting average *X*/*N*, i.e., the batting average *X*/*N* is not a sufficient statistic, in that it conveys less than all of the relevant information in the data, but conjoined with *N*, it becomes sufficient.

## Notes

- Ancillary Statistics: A Review by M. Ghosh, N. Reid and D.A.S. Fraser