# Characterization of probability distributions

In mathematics in general, a characterization theorem says that a particular object – a function, a space, etc. – is the only one that possesses properties specified in the theorem. A characterization of a probability distribution accordingly states that it is the only probability distribution that satisfies specified conditions. More precisely, the model of characterization of probability distribution was described by V.M. Zolotarev  in such manner. On the probability space we define the space ${\mathcal {X}}=\{X\}$ of random variables with values in measurable metric space $(U,d_{u})$ and the space ${\mathcal {Y}}=\{Y\}$ of random variables with values in measurable metric space $(V,d_{v})$ . By characterizations of probability distributions we understand general problems of description of some set ${\mathcal {C}}$ in the space ${\mathcal {X}}$ by extracting the sets ${\mathcal {A}}\subseteq {\mathcal {X}}$ and ${\mathcal {B}}\subseteq {\mathcal {Y}}$ which describe the properties of random variables $X\in {\mathcal {A}}$ and their images $Y=\mathbf {F} X\in {\mathcal {B}}$ , obtained by means of a specially chosen mapping $\mathbf {F} :{\mathcal {X}}\to {\mathcal {Y}}$ .
The description of the properties of the random variables $X$ and of their images $Y=\mathbf {F} X$ is equivalent to the indication of the set ${\mathcal {A}}\subseteq {\mathcal {X}}$ from which $X$ must be taken and of the set ${\mathcal {B}}\subseteq {\mathcal {Y}}$ into which its image must fall. So, the set which interests us appears therefore in the following form:

$X\in {\mathcal {A}},\mathbf {F} X\in {\mathcal {B}}\Leftrightarrow X\in {\mathcal {C}},i.e.{\mathcal {C}}=\mathbf {F} ^{-1}{\mathcal {B}},$ where $\mathbf {F} ^{-1}{\mathcal {B}}$ denotes the complete inverse image of ${\mathcal {B}}$ in ${\mathcal {A}}$ . This is the general model of characterization of probability distribution. Some examples of characterization theorems:

• The assumption that two linear (or non-linear) statistics are identically distributed (or independent, or have a constancy regression and so on) can be used to characterize various populations. For example, according to George Pólya's  characterization theorem, if $X_{1}$ and $X_{2}$ are independent identically distributed random variables with finite variance, then the statistics $S_{1}=X_{1}$ and $S_{2}={\cfrac {X_{1}+X_{2}}{\sqrt {2}}}$ are identically distributed if and only if $X_{1}$ and $X_{2}$ have a normal distribution with zero mean. In this case
$\mathbf {F} ={\begin{bmatrix}1&0\\1/{\sqrt {2}}&1/{\sqrt {2}}\end{bmatrix}}$ ,
${\mathcal {A}}$ is a set of random two-dimensional column-vectors with independent identically distributed components, ${\mathcal {B}}$ is a set of random two-dimensional column-vectors with identically distributed components and ${\mathcal {C}}$ is a set of two-dimensional column-vectors with independent identically distributed normal components.
• According to generalized George Pólya's characterization theorem (without condition on finiteness of variance ) if $X_{1},X_{2},\dots ,X_{n}$ are non-degenerate independent identically distributed random variables, statistics $X_{1}$ and $a_{1}X_{1}+a_{2}X_{2}+\dots +a_{n}X_{n}$ are identically distributed and $\left|a_{j}\right\vert <1,a_{1}^{2}+a_{2}^{2}+\dots +a_{n}^{2}=1$ , then $X_{j}$ is normal random variable for any $j,j=1,2,\dots ,n$ . In this case
$\mathbf {F} ={\begin{bmatrix}1&0&\dots &0\\a_{1}&a_{2}&\dots &a_{n}\end{bmatrix}}$ ,
${\mathcal {A}}$ is a set of random n-dimensional column-vectors with independent identically distributed components, ${\mathcal {B}}$ is a set of random two-dimensional column-vectors with identically distributed components and ${\mathcal {C}}$ is a set of n-dimensional column-vectors with independent identically distributed normal components.
• All probability distributions on the half-line $\left[0,\infty \right)$ that are memoryless are exponential distributions. "Memoryless" means that if $X$ is a random variable with such a distribution, then for any numbers $0 ,
$\Pr(X>x\mid X>y)=\Pr(X>x-y)$ .

Verification of conditions of characterization theorems in practice is possible only with some error $\epsilon$ , i.e., only to a certain degree of accuracy. Such a situation is observed, for instance, in the cases where a sample of finite size is considered. That is why there arises the following natural question. Suppose that the conditions of the characterization theorem are fulfilled not exactly but only approximately. May we assert that the conclusion of the theorem is also fulfilled approximately? The theorems in which the problems of this kind are considered are called stability characterizations of probability distributions.