# Law of total expectation

The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations[2] (LIE), the tower rule,[3] Adam's law, and the smoothing theorem,[4] among other names, states that if ${\displaystyle X}$ is a random variable whose expected value ${\displaystyle \operatorname {E} (X)}$ is defined, and ${\displaystyle Y}$ is any random variable on the same probability space, then

${\displaystyle \operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),}$

i.e., the expected value of the conditional expected value of ${\displaystyle X}$ given ${\displaystyle Y}$ is the same as the expected value of ${\displaystyle X}$.

One special case states that if ${\displaystyle {\left\{A_{i}\right\}}_{i}}$ is a finite or countable partition of the sample space, then

${\displaystyle \operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.}$

## Example

Suppose that two factories supply light bulbs to the market. Factory ${\displaystyle X}$'s bulbs work for an average of 5000 hours, whereas factory ${\displaystyle Y}$'s bulbs work for an average of 4000 hours. It is known that factory ${\displaystyle X}$ supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?

Applying the law of total expectation, we have:

${\displaystyle \operatorname {E} (L)=\operatorname {E} (L\mid X)\operatorname {P} (X)+\operatorname {E} (L\mid Y)\operatorname {P} (Y)=5000(0.6)+4000(0.4)=4600}$

where

• ${\displaystyle \operatorname {E} (L)}$ is the expected life of the bulb;
• ${\displaystyle \operatorname {P} (X)={6 \over 10}}$ is the probability that the purchased bulb was manufactured by factory ${\displaystyle X}$;
• ${\displaystyle \operatorname {P} (Y)={4 \over 10}}$ is the probability that the purchased bulb was manufactured by factory ${\displaystyle Y}$;
• ${\displaystyle \operatorname {E} (L\mid X)=5000}$ is the expected lifetime of a bulb manufactured by ${\displaystyle X}$;
• ${\displaystyle \operatorname {E} (L\mid Y)=4000}$ is the expected lifetime of a bulb manufactured by ${\displaystyle Y}$.

Thus each purchased light bulb has an expected lifetime of 4600 hours.

## Proof in the finite and countable cases

Let the random variables ${\displaystyle X}$ and ${\displaystyle Y}$, defined on the same probability space, assume a finite or countably infinite set of finite values. Assume that ${\displaystyle \operatorname {E} [X]}$ is defined, i.e. ${\displaystyle \min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty }$. If ${\displaystyle \{A_{i}\}}$ is a partition of the probability space ${\displaystyle \Omega }$, then

${\displaystyle \operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.}$

Proof.

{\displaystyle {\begin{aligned}\operatorname {E} \left(\operatorname {E} (X\mid Y)\right)&=\operatorname {E} {\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y){\Bigg ]}\\[6pt]&=\sum _{y}{\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y=y){\Bigg ]}\cdot \operatorname {P} (Y=y)\\[6pt]&=\sum _{y}\sum _{x}x\cdot \operatorname {P} (X=x,Y=y).\end{aligned}}}

If the series is finite, then we can switch the summations around, and the previous expression will become

{\displaystyle {\begin{aligned}\sum _{x}\sum _{y}x\cdot \operatorname {P} (X=x,Y=y)&=\sum _{x}x\sum _{y}\operatorname {P} (X=x,Y=y)\\[6pt]&=\sum _{x}x\cdot \operatorname {P} (X=x)\\[6pt]&=\operatorname {E} (X).\end{aligned}}}

If, on the other hand, the series is infinite, then its convergence cannot be conditional, due to the assumption that ${\displaystyle \min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty .}$ The series converges absolutely if both ${\displaystyle \operatorname {E} [X_{+}]}$ and ${\displaystyle \operatorname {E} [X_{-}]}$ are finite, and diverges to an infinity when either ${\displaystyle \operatorname {E} [X_{+}]}$ or ${\displaystyle \operatorname {E} [X_{-}]}$ is infinite. In both scenarios, the above summations may be exchanged without affecting the sum.

## Proof in the general case

Let ${\displaystyle (\Omega ,{\mathcal {F}},\operatorname {P} )}$ be a probability space on which two sub σ-algebras ${\displaystyle {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}\subseteq {\mathcal {F}}}$ are defined. For a random variable ${\displaystyle X}$ on such a space, the smoothing law states that if ${\displaystyle \operatorname {E} [X]}$ is defined, i.e. ${\displaystyle \min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty }$, then

${\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]=\operatorname {E} [X\mid {\mathcal {G}}_{1}]\quad {\text{(a.s.)}}.}$

Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

• ${\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]{\mbox{ is }}{\mathcal {G}}_{1}}$-measurable
• ${\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]d\operatorname {P} =\int _{G_{1}}Xd\operatorname {P} ,}$ for all ${\displaystyle G_{1}\in {\mathcal {G}}_{1}.}$

The first of these properties holds by definition of the conditional expectation. To prove the second one,

{\displaystyle {\begin{aligned}\min \left(\int _{G_{1}}X_{+}\,d\operatorname {P} ,\int _{G_{1}}X_{-}\,d\operatorname {P} \right)&\leq \min \left(\int _{\Omega }X_{+}\,d\operatorname {P} ,\int _{\Omega }X_{-}\,d\operatorname {P} \right)\\[4pt]&=\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty ,\end{aligned}}}

so the integral ${\displaystyle \textstyle \int _{G_{1}}X\,d\operatorname {P} }$ is defined (not equal ${\displaystyle \infty -\infty }$).

The second property thus holds since ${\displaystyle G_{1}\in {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}}$ implies

${\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]d\operatorname {P} =\int _{G_{1}}\operatorname {E} [X\mid {\mathcal {G}}_{2}]d\operatorname {P} =\int _{G_{1}}Xd\operatorname {P} .}$

Corollary. In the special case when ${\displaystyle {\mathcal {G}}_{1}=\{\emptyset ,\Omega \}}$ and ${\displaystyle {\mathcal {G}}_{2}=\sigma (Y)}$, the smoothing law reduces to

${\displaystyle \operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].}$

## Proof of partition formula

{\displaystyle {\begin{aligned}\sum \limits _{i}\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})&=\sum \limits _{i}\int \limits _{\Omega }X(\omega )\operatorname {P} (d\omega \mid A_{i})\cdot \operatorname {P} (A_{i})\\&=\sum \limits _{i}\int \limits _{\Omega }X(\omega )\operatorname {P} (d\omega \cap A_{i})\\&=\sum \limits _{i}\int \limits _{\Omega }X(\omega )I_{A_{i}}(\omega )\operatorname {P} (d\omega )\\&=\sum \limits _{i}\operatorname {E} (XI_{A_{i}}),\end{aligned}}}

where ${\displaystyle I_{A_{i}}}$ is the indicator function of the set ${\displaystyle A_{i}}$.

If the partition ${\displaystyle {\{A_{i}\}}_{i=0}^{n}}$ is finite, then, by linearity, the previous expression becomes

${\displaystyle \operatorname {E} \left(\sum \limits _{i=0}^{n}XI_{A_{i}}\right)=\operatorname {E} (X),}$

and we are done.

If, however, the partition ${\displaystyle {\{A_{i}\}}_{i=0}^{\infty }}$ is infinite, then we use the dominated convergence theorem to show that

${\displaystyle \operatorname {E} \left(\sum \limits _{i=0}^{n}XI_{A_{i}}\right)\to \operatorname {E} (X).}$

Indeed, for every ${\displaystyle n\geq 0}$,

${\displaystyle \left|\sum _{i=0}^{n}XI_{A_{i}}\right|\leq |X|I_{\mathop {\bigcup } \limits _{i=0}^{n}A_{i}}\leq |X|.}$

Since every element of the set ${\displaystyle \Omega }$ falls into a specific partition ${\displaystyle A_{i}}$, it is straightforward to verify that the sequence ${\displaystyle {\left\{\sum _{i=0}^{n}XI_{A_{i}}\right\}}_{n=0}^{\infty }}$ converges pointwise to ${\displaystyle X}$. By initial assumption, ${\displaystyle \operatorname {E} |X|<\infty }$. Applying the dominated convergence theorem yields the desired.