# Leverage (statistics)

In statistics and in particular in regression analysis, leverage is a measure of how far away the independent variable values of an observation are from those of the other observations.

High-leverage points are those observations, if any, made at extreme or outlying values of the independent variables such that the lack of neighboring observations means that the fitted regression model will pass close to that particular observation.[1]

## Definition

In the linear regression model, the leverage score for the i-th observation is defined as:

${\displaystyle h_{ii}=\left[\mathbf {H} \right]_{ii},}$

the i-th diagonal element of the projection matrix ${\displaystyle \mathbf {H} =\mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}}$ , where ${\displaystyle \mathbf {X} }$ is the design matrix (whose rows correspond to the observations and whose columns correspond to the independent or explanatory variables).

## Interpretation

The leverage score is also known as the observation self-sensitivity or self-influence,[2] because of the equation

${\displaystyle h_{ii}={\frac {\partial {\widehat {y\,}}_{i}}{\partial y_{i}}},}$

which states that the leverage of the i-th observation equals the partial derivative of the fitted i-th dependent value ${\displaystyle {\widehat {y\,}}_{i}}$ with respect to the measured i-th dependent value ${\displaystyle y_{i}}$ . This partial derivative describes the degree by which the i-th measured value influences the i-th fitted value. Note that this leverage depends on the values of the explanatory (x-) variables of all observations but not on any of the values of the dependent (y-) variables.

The equation ${\displaystyle h_{ii}={\frac {\partial {\widehat {y\,}}_{i}}{\partial y_{i}}}}$ follows directly from the computation of the fitted values as ${\displaystyle {\mathbf {\widehat {y}} }={\mathbf {H} }{\mathbf {y} }}$ .

## Bounds on leverage

${\displaystyle 0\leq h_{ii}\leq 1.}$

### Proof

First, note that H is an idempotent matrix: ${\displaystyle H^{2}=X(X^{\top }X)^{-1}X^{\top }X(X^{\top }X)^{-1}X^{\top }=XI(X^{\top }X)^{-1}X^{\top }=H.}$ Also, observe that ${\displaystyle H}$ is symmetric (i.e.: ${\displaystyle h_{ij}=h_{ji}}$ ). So equating the ii element of H to that of H 2, we have

${\displaystyle h_{ii}=h_{ii}^{2}+\sum _{j\neq i}h_{ij}^{2}\geq 0}$

and

${\displaystyle h_{ii}\geq h_{ii}^{2}\implies h_{ii}\leq 1.}$

## Effect on residual variance

If we are in an ordinary least squares setting with fixed X and homoscedastic regression errors ${\displaystyle \varepsilon _{i},}$

${\displaystyle Y=X\beta +\varepsilon ;\ \ \operatorname {Var} (\varepsilon )=\sigma ^{2}I}$

then the i-th regression residual

${\displaystyle e_{i}=Y_{i}-{\widehat {Y}}_{i}}$

has variance

${\displaystyle \operatorname {Var} (e_{i})=(1-h_{ii})\sigma ^{2}}$

In other words, an observation's leverage score determines the degree of noise in the model's misprediction of that observation, with higher leverage leading to less noise.

### Proof

First, note that ${\displaystyle I-H}$ is idempotent and symmetric, and ${\displaystyle {\widehat {Y}}=HY}$ . This gives

${\displaystyle \operatorname {Var} (e)=\operatorname {Var} ((I-H)Y)=(I-H)\operatorname {Var} (Y)(I-H)^{\top }=\sigma ^{2}(I-H)^{2}=\sigma ^{2}(I-H).}$

Thus ${\displaystyle \operatorname {Var} (e_{i})=(1-h_{ii})\sigma ^{2}.}$

### Studentized residuals

The corresponding studentized residual—the residual adjusted for its observation-specific estimated residual variance—is then

${\displaystyle t_{i}={e_{i} \over {\widehat {\sigma }}{\sqrt {1-h_{ii}\ }}}}$

where ${\displaystyle {\widehat {\sigma }}}$ is an appropriate estimate of ${\displaystyle \sigma .}$

### Partial leverage

Modern computer packages for statistical analysis include, as part of their facilities for regression analysis, various quantitative measures for identifying influential observations: among these measures is partial leverage, a measure of how a variable contributes to the leverage of a datum.

### Mahalanobis distance

Leverage is closely related to the Mahalanobis distance[3] (see proof: [4]).

Specifically, for some matrix ${\displaystyle X_{n,p}}$ the squared Mahalanobis distance of some row vector ${\displaystyle {\vec {x_{i}}}=X_{i,\cdot }}$ from the vector of mean ${\displaystyle {\hat {\mu }}={\bar {X}}}$ , of length ${\displaystyle p}$ , and with the estimated covariance matrix ${\displaystyle S^{-1}=cov(X)}$ is:

${\displaystyle D^{2}({\vec {x_{i}}})=({\vec {x_{i}}}-{\hat {\mu }})^{T}S^{-1}({\vec {x_{i}}}-{\hat {\mu }})}$

This is related to the leverage ${\displaystyle h_{ii}}$ of the hat matrix of ${\displaystyle X_{n,p}}$ after appending a column vector of 1's to it. The relationship between the two is:

${\displaystyle D^{2}({\vec {x_{i}}})=(n-1)(h_{ii}-{\tfrac {1}{n}})}$

## Software implementations

Many programs and statistics packages, such as R, Python, etc., include implementations of Leverage.

Language/ProgramFunctionNotes
Rhat(x, intercept = TRUE) or hatvalues(model, ...)See