1.5. Convergence theorems and elementary inequalities
In the previous section, we defined the Lebesgue integral and the expectation of random variables and showed basic properties. However the additive property of integrals is yet to be proved. In addition, since our major interest throughout the textbook is convergence of random variables and its rate, we need our toolbox for it.
In this section we assume a measure space $(\Omega, \mathcal{F}, \mu)$ with finite measure $\mu$. It is enough to state and prove theorems only on the finite measure case since our interest is in the probability space.
Convergence theorems
Earlier, I mentioned that to prove the additive property of the Lebesgue integrals, we need the monotone convergence theorem. Monotone convergence theorem (or MCT) is for a sequence of nonnegative functions that increases monotonically to the limiting function. In fact, there are other convergence theorems  bounded and dominated one  in addition to the MCT. Fatou’s lemma, a corollary of the MCT, is another useful tool for our journey through probability theory.
Monotone convergence theorem (MCT)
$f: \Omega \to [0,\infty]$: the limiting function of $\{f_n\}$.
$f_n\uparrow f \text{ a.s.} \implies$ (i) $f$ is measurable, (ii) $\int f_n d\mu \uparrow \int f d\mu$.
To prove this, we need two lemmas.
$\varphi(E) = \int_E s d\mu,~ \forall E\in\mathcal{F} \implies \varphi$ is a measure on $(\Omega, \mathcal{F}).$
$s: \Omega \to [0,\infty]$: a simple function.
$E_n \uparrow E$: $E_n, E$ are measurable.
$\implies \lim_n \int_{E_n} s d\mu = \int_E s d\mu.$
The first lemma is easy to check. The second one is trivial by continuity of measure $\phi$ defined as in the first one. The latter one will be used in the proof of the MCT.
expand proof
(1) It is trivial that $\alpha \le \int f d\mu$.
(2) Now, let $s$ be a simple function such that $0 \le s \le f$ and a constant $0 < c < 1$. Define $E_n = \{\omega\in\Omega:~ f_n(\omega) \ge c\cdot s(\omega)\}$. Then since $E_n$ increases monotonically as $f_n$ increases monotonically and $\cup_{n=1}^\infty E_n = \Omega$, $E_n \uparrow \Omega$. By the lemma 3, this implies $\int_{E_n} s d\mu \uparrow \int s d\mu$. $$ \begin{align} &\int f_n d\mu \ge \int_{E_n} f_n d\mu \ge \int_{E_n} c\cdot s d\mu \\ &\overset{\lim_n}{\implies} \alpha \ge c\int s d\mu \\ &\overset{c\to1}{\implies} \alpha \ge \int s d\mu \\ &\overset{\sup_{0\le s\le f}}{\implies} \alpha \ge \int f d\mu \end{align} $$ By (1) and (2), the desired result follows.
The MCT allows us to prove the yet to be shown property.
$\implies \int f+g d\mu = \int f d\mu + \int g d\mu.$
Let $s_n = (\lfloor 2^n f \rfloor / 2^n) \wedge n$ and $t_n = (\lfloor 2^n g \rfloor / 2^n) \wedge n$. Then $s_n + t_n \uparrow f+g$ and $\int s_n + t_n d\mu = \int s_n d\mu + \int t_n d\mu$. By the MCT, the result follows.
Fatou’s lemma
Fatou’s lemma, another important convergence theorem can be directly derived by the MCT.
$\implies \int \liminf\limits_n f_n d\mu \le \liminf\limits_n \int f_n d\mu.$
expand proof
Let $g_n = \inf_{k \ge n} f_k$ so that $g_n \uparrow \liminf\limits_n f_n$. By MCT, $\lim_n \int g_n d\mu = \int \liminf\limits_n f_n d\mu$. Since $g_n$ is monotone and $g_n \le f_n$, we get $\int g_n d\mu \le \int f_n d\mu,~ \forall n$ thus $\lim_n \int g_n d\mu \le \liminf\limits_n \int f_n d\mu$.
It is worth noting that Fatou’s lemma does not require convergence. Thus it can be applied to any sequence of nonnegative measurable functions. In many cases however, the lemma is used in the form of $\int X dP \le \liminf\limits_n \int X_n dP$ where $X_n \to X \text{ a.s.}$.
Dominated convergence theorem (DCT)
While the MCT is very useful, it can only be applied to a sequence of functions that monotonically converges. Lebesgue’s dominated convergence theorem (DCT) provides a tool for not only monotonically convergent, but general convergent functions that are uniformly dominated by some integrable function.
$f: \Omega \to \mathbb{R}$: a measurable functions.
$f_n \to f \text{ a.s.}$ and $f_n \le g,~ g$ is integrable.
$\begin{align} \implies & \text{(i) } f \in L^1(\mu). \\ & \text{(ii) } \int f_n  f d\mu \to 0. \\ & \text{(iii) } \int f_n d\mu \to \int f d\mu. \end{align}$
The usefulness of the DCT is that it not only shows convergence of the integral, but also integrability of the limiting function and $L^1$ convergence^{1} to it.
expand proof
(ii) $f_n  f$ are integrable since $\le 2g$. $0 \le 2g  f_n  f \le 2g$. By fatou's lemma, $$ \int 2g d\mu \le \int 2g d\mu  \limsup\limits_n \int f_n  f d\mu \\ \limsup\limits_n \int f_n  f d\mu \le 0 \\ \therefore \lim_n \int f_n f d\mu = 0 $$
(iii) $\int f_n d\mu  \int f d\mu \le \int f_n f d\mu \to 0.$
Bounded convergence theorem (BCT)
A special case of the DCT is where the sequence ${f_n}$ is uniformly bounded almost surely (i.e. $Y = c \in \mathbb{R}$). In this case we call it the bounded convergence theorem.
Inequalities
Along with convergence theorems, these integral inequalities will be used intensely throughout the probability theory. In the following theorems, assume $f,g$ are measurable. For $p \ge 1$, if $f^p$ is integrable, $\f\_p := (\int f^p d\mu)^\frac{1}{p}$ is the $L^p(\mu)$norm of $f$.
$\int f d\mu < \infty$, $\int \varphi(f) d\mu < \infty$.
$\implies \varphi(\int f d\mu) \le \int \varphi(f) d\mu$.
expand proof
Let $t = \int f d\mu$, then there exists $\beta = \sup\limits_{s<t} \frac{\varphi(t)\varphi(s)}{ts} \in \mathbb{R}$ such that $\varphi(x) \ge \beta(xt) + \varphi(t)$. Take integral to both sides and we get the result.
$\begin{align} \implies &\int fg d\mu \le (\int f^p d\mu)^\frac{1}{p} (\int g^q d\mu)^\frac{1}{q}. \\ &\text{i.e. } \fg\_1 \le \f\_p \g\_q. \end{align}$
expand proof
Let $A = \f\_p$ and $B = \g\_q$, $F=f/A$ and $G=g/B$. Then $\int F^p d\mu = \int G^q d\mu = 1$.
Our claim is that $x,y \ge 0 \Rightarrow xy \le \frac{x^p}{p} + \frac{y^q}{q}$. Let $h(x) = xy  x^p/p$, then $h'(x) = y  x^{p1} = y  x^{p/q}$ and $h$ achieves maximum at $x = y^{q/p}$.
By the claim, $\int FG d\mu \le \int F^p/p + G^q/q d\mu = 1$ and the desired result follows.
A special case of Hölder’s inequality is the CauchySchwarz inequality: $\fg\_1 \le \f\_2 \g\_2.$
$\implies \f+g\_p \le \f\_p + \g\_p.$
expand proof
$$ \int (f+g)^p d\mu = \int f(f+g)^{p1} d\mu + \int g(f+g)^{p1} d\mu $$ By Hölder's inequality, $$ \begin{align} \int f(f+g)^{p1} d\mu &\le \left(\int f^p d\mu \right)^{1/p} \left(\int (f+g)^{(p1)q} d\mu \right)^{1/q} \\ &= \left(\int f^p d\mu \right)^{1/p} \left(\int (f+g)^p d\mu \right)^{1/q} \end{align} $$ Similarly, $\int g(f+g)^{p1} d\mu \le \left(\int g^p d\mu\right)^{1/p} \left(\int (f+g)^p d\mu \right)^{1/q}$. Thus $$ \begin{align} \int (f+g)^p d\mu &\le \left(\int (f^p + g^p) d\mu\right)^{1/p} \left(\int (f+g)^p d\mu \right)^{1/q} \\ \therefore \f+q\_p &\le \f\_p + \g\_p \end{align} $$
Finally, we state MarkovChebyshev inequality. Assume a probability space $(\Omega, \mathcal{F}, P)$ and a random variable $X$ on it.
$A \in \mathcal{B}(\mathbb{R})$, $i_A := \inf\{\varphi(y):~ y\in A\}$.
$\implies i_A \cdot P(X \in A) \le \int_A \varphi(X) dP \le E\varphi(X).$
Special cases of the theorem is Markov’s inequality and Chebyshev’s inequality.
Concluding remarks
Since the expectation $EX$ is defined as a mere integral, all of the above theorems can be applied. For instance, if $X_n \to X$ a.s. and $X_n \le Y$ for some $Y$ such that $EY < \infty$, then by DCT $EX_n \to EX$ as $n\to\infty$.
Acknowledgement
This post series is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).
This post is also based on the textbook Real and Complex Analysis, 3rd edition (Rudin, 1986) and the lecture at SNU (instructor: Prof. Insuk Seo).

$f_n \to f$ in $L^1(\mu)$ is equivalent to $\int f_n  f d\mu \to 0$ and $f \in L^1(\mu)$. It will be covered in the next section. ↩