Generalized Pareto distribution

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location ${\displaystyle \mu }$, scale ${\displaystyle \sigma }$, and shape ${\displaystyle \xi }$.[1][2] Sometimes it is specified by only scale and shape[3] and sometimes only by its shape parameter. Some references give the shape parameter as ${\displaystyle \kappa =-\xi \,}$.[4]

Parameters Probability density functionGPD distribution functions for ${\displaystyle \mu =0}$ and different values of ${\displaystyle \sigma }$ and ${\displaystyle \xi }$ Cumulative distribution function ${\displaystyle \mu \in (-\infty ,\infty )\,}$ location (real) ${\displaystyle \sigma \in (0,\infty )\,}$ scale (real) ${\displaystyle \xi \in (-\infty ,\infty )\,}$ shape (real) ${\displaystyle x\geqslant \mu \,\;(\xi \geqslant 0)}$ ${\displaystyle \mu \leqslant x\leqslant \mu -\sigma /\xi \,\;(\xi <0)}$ ${\displaystyle {\frac {1}{\sigma }}(1+\xi z)^{-(1/\xi +1)}}$ where ${\displaystyle z={\frac {x-\mu }{\sigma }}}$ ${\displaystyle 1-(1+\xi z)^{-1/\xi }\,}$ ${\displaystyle \mu +{\frac {\sigma }{1-\xi }}\,\;(\xi <1)}$ ${\displaystyle \mu +{\frac {\sigma (2^{\xi }-1)}{\xi }}}$ ${\displaystyle {\frac {\sigma ^{2}}{(1-\xi )^{2}(1-2\xi )}}\,\;(\xi <1/2)}$ ${\displaystyle {\frac {2(1+\xi ){\sqrt {1-2\xi }}}{(1-3\xi )}}\,\;(\xi <1/3)}$ ${\displaystyle {\frac {3(1-2\xi )(2\xi ^{2}+\xi +3)}{(1-3\xi )(1-4\xi )}}-3\,\;(\xi <1/4)}$ ${\displaystyle \log(\sigma )+\xi +1}$ ${\displaystyle e^{\theta \mu }\,\sum _{j=0}^{\infty }\left[{\frac {(\theta \sigma )^{j}}{\prod _{k=0}^{j}(1-k\xi )}}\right],\;(k\xi <1)}$ ${\displaystyle e^{it\mu }\,\sum _{j=0}^{\infty }\left[{\frac {(it\sigma )^{j}}{\prod _{k=0}^{j}(1-k\xi )}}\right],\;(k\xi <1)}$

Definition

The standard cumulative distribution function (cdf) of the GPD is defined by[5]

${\displaystyle F_{\xi }(z)={\begin{cases}1-\left(1+\xi z\right)^{-1/\xi }&{\text{for }}\xi \neq 0,\\1-e^{-z}&{\text{for }}\xi =0.\end{cases}}}$

where the support is ${\displaystyle z\geq 0}$ for ${\displaystyle \xi \geq 0}$ and ${\displaystyle 0\leq z\leq -1/\xi }$ for ${\displaystyle \xi <0}$. The corresponding probability density function (pdf) is

${\displaystyle f_{\xi }(z)={\begin{cases}{\frac {1}{\sigma }}(\xi z+1)^{-{\frac {\xi +1}{\xi }}}&{\text{for }}\xi \neq 0,\\{\frac {1}{\sigma }}e^{-z}&{\text{for }}\xi =0.\end{cases}}}$

Characterization

The related location-scale family of distributions is obtained by replacing the argument z by ${\displaystyle {\frac {x-\mu }{\sigma }}}$ and adjusting the support accordingly: The cumulative distribution function is

${\displaystyle F_{(\xi ,\mu ,\sigma )}(x)={\begin{cases}1-\left(1+{\frac {\xi (x-\mu )}{\sigma }}\right)^{-1/\xi }&{\text{for }}\xi \neq 0,\\1-\exp \left(-{\frac {x-\mu }{\sigma }}\right)&{\text{for }}\xi =0.\end{cases}}}$

for ${\displaystyle x\geqslant \mu }$ when ${\displaystyle \xi \geqslant 0\,}$, and ${\displaystyle \mu \leqslant x\leqslant \mu -\sigma /\xi }$ when ${\displaystyle \xi <0}$, where ${\displaystyle \mu \in \mathbb {R} }$, ${\displaystyle \sigma >0}$, and ${\displaystyle \xi \in \mathbb {R} }$.

The probability density function (pdf) is

${\displaystyle f_{(\xi ,\mu ,\sigma )}(x)={\frac {1}{\sigma }}\left(1+{\frac {\xi (x-\mu )}{\sigma }}\right)^{\left(-{\frac {1}{\xi }}-1\right)}}$,

again, for ${\displaystyle x\geqslant \mu }$ when ${\displaystyle \xi \geqslant 0}$, and ${\displaystyle \mu \leqslant x\leqslant \mu -\sigma /\xi }$ when ${\displaystyle \xi <0}$.

The pdf is a solution of the following differential equation:

${\displaystyle \left\{{\begin{array}{l}f'(x)(-\mu \xi +\sigma +\xi x)+(\xi +1)f(x)=0,\\f(0)={\frac {\left(1-{\frac {\mu \xi }{\sigma }}\right)^{-{\frac {1}{\xi }}-1}}{\sigma }}\end{array}}\right\}}$

Special cases

• If the shape ${\displaystyle \xi }$ and location ${\displaystyle \mu }$ are both zero, the GPD is equivalent to the exponential distribution.
• With shape ${\displaystyle \xi >0}$ and location ${\displaystyle \mu =\sigma /\xi }$, the GPD is equivalent to the Pareto distribution with scale ${\displaystyle x_{m}=\sigma /\xi }$ and shape ${\displaystyle \alpha =1/\xi }$.
• If ${\displaystyle X}$ ${\displaystyle \sim }$ ${\displaystyle GPD}$ ${\displaystyle (}$${\displaystyle \mu =0}$, ${\displaystyle \sigma }$, ${\displaystyle \xi }$ ${\displaystyle )}$, then ${\displaystyle Y=\log(X)}$ ${\displaystyle \sim }$ ${\displaystyle exGPD}$ ${\displaystyle (}$${\displaystyle \mu =0}$, ${\displaystyle \sigma }$, ${\displaystyle \xi }$ ${\displaystyle )}$, where exGPD stands for the exponentiated generalized Pareto distribution. Unlike GPD, exGPD has the finite moments of all orders and possesses separate interpretations for the scale parameter and the shape parameter, which leads to stable and efficient parameter estimation than using GPD.
• GPD is similar to the Burr distribution.

Generating generalized Pareto random variables

If U is uniformly distributed on (0, 1], then

${\displaystyle X=\mu +{\frac {\sigma (U^{-\xi }-1)}{\xi }}\sim {\mbox{GPD}}(\mu ,\sigma ,\xi \neq 0)}$

and

${\displaystyle X=\mu -\sigma \ln(U)\sim {\mbox{GPD}}(\mu ,\sigma ,\xi =0).}$

Both formulas are obtained by inversion of the cdf.

In Matlab Statistics Toolbox, you can easily use "gprnd" command to generate generalized Pareto random numbers.

GPD as an Exponential-Gamma Mixture

A GPD random variable can also be expressed as an exponential random variable, with a Gamma distributed rate parameter.

${\displaystyle X|\Lambda \sim Exp(\Lambda )}$

and

${\displaystyle \Lambda \sim Gamma(\alpha ,\beta )}$

then

${\displaystyle X\sim GPD(\xi =1/\alpha ,\ \sigma =\beta /\alpha )}$

Notice however, that since the parameters for the Gamma distribution must be greater than zero, we obtain the additional restrictions that:${\displaystyle \xi }$ must be positive.

References

1. Coles, Stuart (2001-12-12). An Introduction to Statistical Modeling of Extreme Values. Springer. p. 75. ISBN 9781852334598.
2. Dargahi-Noubary, G. R. (1989). "On tail estimation: An improved method". Mathematical Geology. 21 (8): 829–842. doi:10.1007/BF00894450.
3. Hosking, J. R. M.; Wallis, J. R. (1987). "Parameter and Quantile Estimation for the Generalized Pareto Distribution". Technometrics. 29 (3): 339–349. doi:10.2307/1269343. JSTOR 1269343.
4. Davison, A. C. (1984-09-30). "Modelling Excesses over High Thresholds, with an Application". In de Oliveira, J. Tiago (ed.). Statistical Extremes and Applications. Kluwer. p. 462. ISBN 9789027718044.
5. Embrechts, Paul; Klüppelberg, Claudia; Mikosch, Thomas (1997-01-01). Modelling extremal events for insurance and finance. p. 162. ISBN 9783540609315.
• Pickands, James (1975). "Statistical inference using extreme order statistics". Annals of Statistics. 3 s: 119–131. doi:10.1214/aos/1176343003.
• Balkema, A.; De Haan, Laurens (1974). "Residual life time at great age". Annals of Probability. 2 (5): 792–804. doi:10.1214/aop/1176996548.
• Lee, Seyoon; Kim, J.H.K. (2018). "Exponentiated generalized Pareto distribution:Properties and applications towards extreme value theory". Communications in Statistics - Theory and Methods. 0: 1–25. arXiv:1708.01686. doi:10.1080/03610926.2018.1441418.
• N. L. Johnson; S. Kotz; N. Balakrishnan (1994). Continuous Univariate Distributions Volume 1, second edition. New York: Wiley. ISBN 978-0-471-58495-7. Chapter 20, Section 12: Generalized Pareto Distributions.
• Barry C. Arnold (2011). "Chapter 7: Pareto and Generalized Pareto Distributions". In Duangkamon Chotikapanich (ed.). Modeling Distributions and Lorenz Curves. New York: Springer. ISBN 9780387727967.
• Arnold, B. C.; Laguna, L. (1977). On generalized Pareto distributions with applications to income data. Ames, Iowa: Iowa State University, Department of Economics.