# Mean and predicted response

In linear regression, mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different.

## Background

In straight line fitting, the model is

${\displaystyle y_{i}=\alpha +\beta x_{i}+\varepsilon _{i}\,}$

where ${\displaystyle y_{i}}$ is the response variable, ${\displaystyle x_{i}}$ is the explanatory variable, εi is the random error, and ${\displaystyle \alpha }$ and ${\displaystyle \beta }$ are parameters. The mean, and predicted, response value for a given explanatory value, xd, is given by

${\displaystyle {\hat {y}}_{d}={\hat {\alpha }}+{\hat {\beta }}x_{d},}$

while the actual response would be

${\displaystyle y_{d}=\alpha +\beta x_{d}+\varepsilon _{d}\,}$

Expressions for the values and variances of ${\displaystyle {\hat {\alpha }}}$ and ${\displaystyle {\hat {\beta }}}$ are given in linear regression.

## Mean response

Since the data in this context is defined to be (x, y) pairs for every observation, the mean response at a given value of x, say xd, is an estimate of the mean of the y values in the population at the x value of xd, that is ${\displaystyle {\hat {E}}(y\mid x_{d})\equiv {\hat {y}}_{d}\!}$. The variance of the mean response is given by

${\displaystyle \operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)=\operatorname {Var} \left({\hat {\alpha }}\right)+\left(\operatorname {Var} {\hat {\beta }}\right)x_{d}^{2}+2x_{d}\operatorname {Cov} \left({\hat {\alpha }},{\hat {\beta }}\right).}$

This expression can be simplified to

${\displaystyle \operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)=\sigma ^{2}\left({\frac {1}{m}}+{\frac {\left(x_{d}-{\bar {x}}\right)^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right),}$

where m is the number of data points.

To demonstrate this simplification, one can make use of the identity

${\displaystyle \sum (x_{i}-{\bar {x}})^{2}=\sum x_{i}^{2}-{\frac {1}{m}}\left(\sum x_{i}\right)^{2}.}$

## Predicted response

The predicted response distribution is the predicted distribution of the residuals at the given point xd. So the variance is given by

${\displaystyle \operatorname {Var} \left(y_{d}-\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)=\operatorname {Var} (y_{d})+\operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right).}$

The second part of this expression was already calculated for the mean response. Since ${\displaystyle \operatorname {Var} (y_{d})=\sigma ^{2}}$ (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by

{\displaystyle {\begin{aligned}\operatorname {Var} \left(y_{d}-\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)&=\sigma ^{2}+\sigma ^{2}\left({\frac {1}{m}}+{\frac {\left(x_{d}-{\bar {x}}\right)^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right)\\[4pt]&=\sigma ^{2}\left(1+{\frac {1}{m}}+{\frac {(x_{d}-{\bar {x}})^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right).\end{aligned}}}

## Confidence intervals

The ${\displaystyle 100(1-\alpha )\%}$ confidence intervals are computed as ${\displaystyle y_{d}\pm t_{{\frac {\alpha }{2}},m-n-1}{\sqrt {\operatorname {Var} }}}$. Thus, the confidence interval for predicted response is wider than the interval for mean response. This is expected intuitively – the variance of the population of ${\displaystyle y}$ values does not shrink when one samples from it, because the random variable εi does not decrease, but the variance of the mean of the ${\displaystyle y}$ does shrink with increased sampling, because the variance in ${\displaystyle {\hat {\alpha }}}$ and ${\displaystyle {\hat {\beta }}}$ decrease, so the mean response (predicted response value) becomes closer to ${\displaystyle \alpha +\beta x_{d}}$.

This is analogous to the difference between the variance of a population and the variance of the sample mean of a population: the variance of a population is a parameter and does not change, but the variance of the sample mean decreases with increased samples.

## General linear regression

The general linear model can be written as

${\displaystyle y_{i}=\sum _{j=1}^{n}X_{ij}\beta _{j}+\varepsilon _{i}\,}$

Therefore, since ${\displaystyle y_{d}=\sum _{j=1}^{n}X_{dj}{\hat {\beta }}_{j}}$ the general expression for the variance of the mean response is

${\displaystyle \operatorname {Var} \left(\sum _{j=1}^{n}X_{dj}{\hat {\beta }}_{j}\right)=\sum _{i=1}^{n}\sum _{j=1}^{n}X_{di}S_{ij}X_{dj},}$

where S is the covariance matrix of the parameters, given by

${\displaystyle \mathbf {S} =\sigma ^{2}\left(\mathbf {X^{\mathsf {T}}X} \right)^{-1}.}$

## References

• Draper, N.R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. ISBN 0-471-17082-8.