(PTE) 1.5. Convergence theorems and elementary inequalities

$\newcommand{\argmin}{\mathop{\mathrm{argmin}}\limits}$ $\newcommand{\argmax}{\mathop{\mathrm{argmax}}\limits}$

In the previous section, we defined the Lebesgue integral and the expectation of random variables and showed basic properties. However the additive property of integrals is yet to be proved. In addition, since our major interest throughout the textbook is convergence of random variables and its rate, we need our toolbox for it.

In this section we assume a measure space $(\Omega, \mathcal{F}, \mu)$ with finite measure $\mu$. It is enough to state and prove theorems only on the finite measure case since our interest is in the probability space.


Convergence theorems

Earlier, I mentioned that to prove the additive property of the Lebesgue integrals, we need the monotone convergence theorem. Monotone convergence theorem (or MCT) is for a sequence of non-negative functions that increases monotonically to the limiting function. In fact, there are other convergence theorems - bounded and dominated one - in addition to the MCT. Fatou’s lemma, a corollary of the MCT, is another useful tool for our journey through probability theory.

Monotone convergence theorem (MCT)

$\{f_n\}_{n\in\mathbb{N}}: \Omega \to [0,\infty]$: a sequence of measurable functions.
$f: \Omega \to [0,\infty]$: the limiting function of $\{f_n\}$.
$f_n\uparrow f \text{ a.s.} \implies$ (i) $f$ is measurable, (ii) $\int f_n d\mu \uparrow \int f d\mu$.

To prove this, we need two lemmas.

$s: \Omega \to [0,\infty]$: a simple function.
$\varphi(E) = \int_E s d\mu,~ \forall E\in\mathcal{F} \implies \varphi$ is a measure on $(\Omega, \mathcal{F}).$

$s: \Omega \to [0,\infty]$: a simple function.
$E_n \uparrow E$: $E_n, E$ are measurable.
$\implies \lim_n \int_{E_n} s d\mu = \int_E s d\mu.$

The first lemma is easy to check. The second one is trivial by continuity of measure $\phi$ defined as in the first one. The latter one will be used in the proof of the MCT.

expand proof
Let $\alpha_n = \int f_n d\mu$ so that $\alpha_1 \le \alpha_2 \le \cdots \le \int f d\mu$ and $\lim_n \alpha_n = \alpha$ for some $\alpha$. We need to show that $\alpha = \int f d\mu$.
  (1) It is trivial that $\alpha \le \int f d\mu$.
  (2) Now, let $s$ be a simple function such that $0 \le s \le f$ and a constant $0 < c < 1$. Define $E_n = \{\omega\in\Omega:~ f_n(\omega) \ge c\cdot s(\omega)\}$. Then since $E_n$ increases monotonically as $f_n$ increases monotonically and $\cup_{n=1}^\infty E_n = \Omega$, $E_n \uparrow \Omega$. By the lemma 3, this implies $\int_{E_n} s d\mu \uparrow \int s d\mu$. $$ \begin{align} &\int f_n d\mu \ge \int_{E_n} f_n d\mu \ge \int_{E_n} c\cdot s d\mu \\ &\overset{\lim_n}{\implies} \alpha \ge c\int s d\mu \\ &\overset{c\to1}{\implies} \alpha \ge \int s d\mu \\ &\overset{\sup_{0\le s\le f}}{\implies} \alpha \ge \int f d\mu \end{align} $$   By (1) and (2), the desired result follows.

The MCT allows us to prove the yet to be shown property.

$f,g: \Omega \to [0,\infty]$: measurable.
$\implies \int f+g d\mu = \int f d\mu + \int g d\mu.$

Let $s_n = (\lfloor 2^n f \rfloor / 2^n) \wedge n$ and $t_n = (\lfloor 2^n g \rfloor / 2^n) \wedge n$. Then $s_n + t_n \uparrow f+g$ and $\int s_n + t_n d\mu = \int s_n d\mu + \int t_n d\mu$. By the MCT, the result follows.

Fatou’s lemma

Fatou’s lemma, another important convergence theorem can be directly derived by the MCT.

$\{f_n\}: \Omega \to [0,\infty]$: a sequence of measurable functions.
$\implies \int \liminf\limits_n f_n d\mu \le \liminf\limits_n \int f_n d\mu.$
expand proof

Let $g_n = \inf_{k \ge n} f_k$ so that $g_n \uparrow \liminf\limits_n f_n$. By MCT, $\lim_n \int g_n d\mu = \int \liminf\limits_n f_n d\mu$. Since $g_n$ is monotone and $g_n \le f_n$, we get $\int g_n d\mu \le \int f_n d\mu,~ \forall n$ thus $\lim_n \int g_n d\mu \le \liminf\limits_n \int f_n d\mu$.

It is worth noting that Fatou’s lemma does not require convergence. Thus it can be applied to any sequence of non-negative measurable functions. In many cases however, the lemma is used in the form of $\int X dP \le \liminf\limits_n \int X_n dP$ where $X_n \to X \text{ a.s.}$.

Dominated convergence theorem (DCT)

While the MCT is very useful, it can only be applied to a sequence of functions that monotonically converges. Lebesgue’s dominated convergence theorem (DCT) provides a tool for not only monotonically convergent, but general convergent functions that are uniformly dominated by some integrable function.

$\{f_n\}: \Omega \to \mathbb{R}$: a sequence of measurable functions.
$f: \Omega \to \mathbb{R}$: a measurable functions.
$f_n \to f \text{ a.s.}$ and $|f_n| \le g,~ g$ is integrable.
$\begin{align} \implies & \text{(i) } f \in L^1(\mu). \\ & \text{(ii) } \int |f_n - f| d\mu \to 0. \\ & \text{(iii) } \int f_n d\mu \to \int f d\mu. \end{align}$

The usefulness of the DCT is that it not only shows convergence of the integral, but also integrability of the limiting function and $L^1$ convergence1 to it.

expand proof
(i) Trivial since $|f| \le g$.
(ii) $|f_n - f|$ are integrable since $\le 2g$. $0 \le 2g - |f_n - f| \le 2g$. By fatou's lemma, $$ \int 2g d\mu \le \int 2g d\mu - \limsup\limits_n \int |f_n - f| d\mu \\ \limsup\limits_n \int |f_n - f| d\mu \le 0 \\ \therefore \lim_n \int |f_n -f| d\mu = 0 $$
(iii) $|\int f_n d\mu - \int f d\mu| \le \int |f_n -f| d\mu \to 0.$


Bounded convergence theorem (BCT)

A special case of the DCT is where the sequence ${f_n}$ is uniformly bounded almost surely (i.e. $Y = c \in \mathbb{R}$). In this case we call it the bounded convergence theorem.

Inequalities

Along with convergence theorems, these integral inequalities will be used intensely throughout the probability theory. In the following theorems, assume $f,g$ are measurable. For $p \ge 1$, if $|f|^p$ is integrable, $\|f\|_p := (\int |f|^p d\mu)^\frac{1}{p}$ is the $L^p(\mu)$-norm of $f$.

$\varphi: \mathbb{R} \to \mathbb{R}$: convex function.
$\int |f| d\mu < \infty$, $\int |\varphi(f)| d\mu < \infty$.
$\implies \varphi(\int f d\mu) \le \int \varphi(f) d\mu$.
expand proof

Let $t = \int f d\mu$, then there exists $\beta = \sup\limits_{s<t} \frac{\varphi(t)-\varphi(s)}{t-s} \in \mathbb{R}$ such that $\varphi(x) \ge \beta(x-t) + \varphi(t)$. Take integral to both sides and we get the result.
$p,q \in (1,\infty)$ such that $\frac{1}{p} + \frac{1}{q} = 1$.
$\begin{align} \implies &\int |fg| d\mu \le (\int |f|^p d\mu)^\frac{1}{p} (\int |g|^q d\mu)^\frac{1}{q}. \\ &\text{i.e. } \|fg\|_1 \le \|f\|_p \|g\|_q. \end{align}$
expand proof

Let $A = \|f\|_p$ and $B = \|g\|_q$, $F=f/A$ and $G=g/B$. Then $\int F^p d\mu = \int G^q d\mu = 1$.
  Our claim is that $x,y \ge 0 \Rightarrow xy \le \frac{x^p}{p} + \frac{y^q}{q}$. Let $h(x) = xy - x^p/p$, then $h'(x) = y - x^{p-1} = y - x^{p/q}$ and $h$ achieves maximum at $x = y^{q/p}$.
  By the claim, $\int FG d\mu \le \int F^p/p + G^q/q d\mu = 1$ and the desired result follows.

A special case of Hölder’s inequality is the Cauchy-Schwarz inequality: $\|fg\|_1 \le \|f\|_2 \|g\|_2.$

$p \ge 1$, $\int |f|^p d\mu < \infty$, $\int |g|^p d\mu < \infty$
$\implies \|f+g\|_p \le \|f\|_p + \|g\|_p.$
expand proof

$$ \int (f+g)^p d\mu = \int f(f+g)^{p-1} d\mu + \int g(f+g)^{p-1} d\mu $$ By Hölder's inequality, $$ \begin{align} \int f(f+g)^{p-1} d\mu &\le \left(\int f^p d\mu \right)^{1/p} \left(\int (f+g)^{(p-1)q} d\mu \right)^{1/q} \\ &= \left(\int f^p d\mu \right)^{1/p} \left(\int (f+g)^p d\mu \right)^{1/q} \end{align} $$ Similarly, $\int g(f+g)^{p-1} d\mu \le \left(\int g^p d\mu\right)^{1/p} \left(\int (f+g)^p d\mu \right)^{1/q}$. Thus $$ \begin{align} \int (f+g)^p d\mu &\le \left(\int (f^p + g^p) d\mu\right)^{1/p} \left(\int (f+g)^p d\mu \right)^{1/q} \\ \therefore \|f+q\|_p &\le \|f\|_p + \|g\|_p \end{align} $$

Finally, we state Markov-Chebyshev inequality. Assume a probability space $(\Omega, \mathcal{F}, P)$ and a random variable $X$ on it.

$\varphi: \mathbb{R}\to\mathbb{R}$, $\varphi \ge 0$.
$A \in \mathcal{B}(\mathbb{R})$, $i_A := \inf\{\varphi(y):~ y\in A\}$.
$\implies i_A \cdot P(X \in A) \le \int_A \varphi(X) dP \le E\varphi(X).$

Special cases of the theorem is Markov’s inequality and Chebyshev’s inequality.

$X \ge 0$ a.s., $a > 0$ $\implies P(X \ge a) \le EX/a.$
$a > 0$ $\implies P(X \ge a) \le EX^2/a^2.$


Concluding remarks

Since the expectation $EX$ is defined as a mere integral, all of the above theorems can be applied. For instance, if $X_n \to X$ a.s. and $|X_n| \le Y$ for some $Y$ such that $E|Y| < \infty$, then by DCT $EX_n \to EX$ as $n\to\infty$.



Acknowledgement

This post series is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).

This post is also based on the textbook Real and Complex Analysis, 3rd edition (Rudin, 1986) and the lecture at SNU (instructor: Prof. Insuk Seo).

  1. $f_n \to f$ in $L^1(\mu)$ is equivalent to $\int |f_n - f| d\mu \to 0$ and $f \in L^1(\mu)$. It will be covered in the next section.