2.3. Borel-Cantelli lemmas

$\newcommand{\argmin}{\mathop{\mathrm{argmin}}\limits}$ $\newcommand{\argmax}{\mathop{\mathrm{argmax}}\limits}$

In this section I would like to cover the Borel-Cantelli lemmas, or B-C lemmas for short. Borel-Cantelli lemmas are quintessential tools for analysis of tail events and deriving almost sure convergence from $P$-convergence.


Limit of a sequence of sets

Borel-Cantelli lemmas are statements about probability of “limit” of sets. To state and prove the lemma, we first define $\limsup$ and $\liminf$ of a sequence of sets.


Let $(A_n)_{n\in\mathbb{N}}$ be a sequence of sets.
$\limsup\limits_n A_n = \bigcap_n \bigcup_{k\ge n} A_k = \{A_n \text{ i.o.}\}.$
$\liminf\limits_n A_n = \bigcup_n \bigcap_{k\ge n} A_k = \{A_n \text{ eventually}\}.$

i.o. stands for “infinitely often”. We named these $\limsup, \liminf$ since there is a connection with those of functions. It is not difficult to show that $\limsup\limits_n \mathbf{1}_{A_n} = \mathbf{1}_{\limsup\limits_n A_n}$ and $\liminf\limits_n \mathbf{1}_{A_n} = \mathbf{1}_{\liminf\limits_n A_n}.$ As a remark, by de Morgan’s law, $\{A_n \text{ i.o.}\}^c = \{ A_n^c \text{ eventually} \}$ thus $P(A_n \text{ i.o.}) = 0$ is equivalent to $P(A_n^c \text{ eventually}) = 1.$

Borel-Cantelli lemmas

(i) $\sum\limits_{n=1}^\infty P(A_n) < \infty \implies P(A_n \text{ i.o.}) = 0.$
(ii) If $A_n$'s are independent, $\sum\limits_{n=1}^\infty P(A_n) = \infty \implies P(A_n \text{ i.o.}) = 1.$

(i) $\sum\limits_{k=1}^n \mathbf{1}_{A_k} \uparrow \sum\limits_{n=1}^\infty \mathbf{1}_{A_n}$. By MCT $E\sum\limits_{n=1}^\infty \mathbf{1}_{A_n} = \sum\limits_{n=1}^\infty P(A_n) < \infty$ which leads to $\sum\limits_{n=1}^\infty \mathbf{1}_{A_n} < \infty$ a.s.
(ii) Using the inequality $1-x \le e^{-x}$, \begin{align*} P(\bigcap\limits_{k \geq m}{A_k}^c) &= \prod\limits_{k \geq m}\big( 1-P({A_k}) \big) \\ &\leq \prod\limits_{k \geq m}e^{-P(A_k)} = e^{-\sum\limits_{k \geq m} P(A_k)} = 0, \:\: \forall m>0 \end{align*} $\therefore P(\bigcup\limits_{k \geq m}{A_k}) = 1$ and $P(\limsup\limits_n{A_n}) = P(A_n \text{ i.o.}) = 1.$

Using Borel-Cantelli lemmas, we can convert the sum of probabilities to probability of tail events.

Tail events and the 0-1 law

I mentioned tail events several times but never actually defined it.

Let $(X_n)$ be a sequence of random variables. Let $\mathcal{G}_n = \sigma(X_n,X_{n+1},\cdots).$ We call $\mathcal{T}=\bigcap_n\mathcal{G}_n$ a tail $\sigma$-field of $(X_n)$, $A\in\mathcal{T}$ a tail event.

Simply put, a tail event is an event that does not depend on finite number of random variables. For instance, $\{ \limsup_n S_n/C_n > x \} \in \mathcal{T}$ for $C_n, x \in \mathbb{R}$, $S_n=X_1 + \cdots + X_n$ if $X_n\to_n\infty$ while $\{\limsup_n S_n > 0\}\notin\mathcal{T}.$

An event $A$ is $P$-trivial if $P(A) = 0$ or $1.$

The next theorem implies that Borel-Cantelli lemmas are converses of each other.

If $X_1, X_2, \cdots$ are independent, $A \in \mathcal{T}$, where $\mathcal{T} := \bigcap_n \sigma(X_n, X_{n+1}, \cdots)$, then $A$ is a $P$-trivial event.
Let $A \in \sigma(X_1, \cdots, X_k)$ and $B \in \sigma(X_{k+1}, \cdots, X_{k+j})$ for some $j$. Because $X_i$'s are independent, $A \perp B$. Thus $\sigma(X_1, \cdots, X_k) \perp \bigcup_j \sigma(X_{k+1}, \cdots, X_{k+j})$. Since both are $\pi$-systems, by Dynkin's $\pi$-$\lambda$ theorem, \begin{equation} \sigma(X_1, \cdots, X_k) \perp \sigma(X_{k+1}, \cdots) \end{equation} Now, let $A \in \sigma(X_1, \cdots, X_j)$ for some $j$ and $B \in \mathcal{T} \subset \sigma(X_{j+1}, \cdots)$. By (1) and similar process, \begin{equation} \sigma(X_1, \cdots) \perp \mathcal{T} \end{equation} By (2), since $\mathcal{T} \subset \sigma(X_1, \cdots)$, $\mathcal{T}$ is independent of itself.
$\therefore A \in \mathcal{T} \to P(A) = P(A \cap A) = P(A)P(A)$.


Application of Borel-Cantelli lemmas

Application of the first lemma

The first application connects convergence in probability to almost sure convergence.

$X_n \overset{P}{\to} X$ if and only if for all subsequence $(X_{n_k})$, there exists a further subsequence $(X_{n_k(m)})$ such that $X_{n_k(m)} \to X$ a.s.

($\Rightarrow$) Without loss of generality, we only have to prove that there exists a.s. convergent subsequence. Let $\epsilon_k>0$, $\epsilon_k \downarrow 0.$ There exists $n_k \in \mathbb{N}$ such that if $n \ge n_k$, $P(|X_n - X| > \epsilon_k) < \frac{1}{2^k}$ and $n_{k+1} > n_k.$ Hence $\sum\limits_{k=1}^\infty P(|X_{n_k} - X| \ge \epsilon_k) \le 1 < \infty$ and by the first Borel-Cantelli lemma, $P(|X_{n_k} - X| \ge \epsilon_k \text{ i.o.}) = 0.$
($\Leftarrow$) It is easy to show that for a sequence $(y_n)$ on a topological space, if for all subsequence there exists a further subsequence that converges to $y$, then $y_n \to y.$ Given $\delta > 0$, let $y_n = P(|X_n - X| > \delta)$ then it is a sequence on topological space $\mathbb{R}$ that converges to 0. By the condition, the result directly follows.

Another application is in the form of strong law of large numbers.

$X_1,\cdots,X_n$ are i.i.d. with $EX_1 = \mu$ and $EX_1^4 \le C < \infty.$ $\implies S_n/n \to \mu$ a.s.

$$ \begin{align} P(|S_n/n - \mu| > \epsilon) &= P(|S_n - n\mu| > n\epsilon) \\ &\le E(S_n - n\mu)^4 / n^4\epsilon^4 \le C / n^2\epsilon^4. \end{align} $$ $\sum\limits_{n=1}^\infty P(|S_n/n - \mu| > \epsilon) \le \sum\limits_{n=1}^\infty C_1 / n^2 < \infty$ for some $C_1.$ By the first Borel-Cantelli lemma, the result follows.


Application of the second lemma

As a sneak peek of the next sction, the second Borel-Cantelli lemma leads to a necessary condition of the strong law of large numbers.

$X_1,X_2,\cdots$ are i.i.d. with $E|X_1| = \infty$. $S_n = X_1 + \cdots + X_n.$
$\begin{align} \implies &\text{(i) } P(|X_1|\ge n \text{ i.o.}) = 1. \\ &\text{(ii) } P(\lim_n S_n/n \text{ exists and finite}) = 0. \end{align}$

Note that this implies $E|X_1| < \infty$ is required for the strong law.


(i) $E|X_1| = \int_0^\infty P(|X_1|>x)dx \le \sum\limits_{n=0}^\infty P(|X_1| \ge n) = \infty.$ By the second B-C lemma, $P(|X_1| \ge n \text{ i.o.}) = 1.$
(ii) $\frac{S_n}{n}-\frac{S_{n+1}}{n+1} = \frac{S_n}{n(n+1)}-\frac{X_{n+1}}{n+1}.$ Let $C = \{\lim_n S_n/n \text{ exists and finite}\}$, then on $C$, $\lim_n \frac{S_n}{n(n+1)} = 0.$ On $C \cap (|X_i| \ge n \text{ i.o.})$, $$ \begin{align} \limsup\limits_n |\frac{S_n}{n}-\frac{S_{n+1}}{n+1}| &= \limsup\limits_n |\frac{S_n}{n(n+1)}-\frac{X_{n+1}}{n+1}| \\ &\ge \limsup\limits_n |\frac{X_{n+1}}{n+1}| - \cancelto{0}{\lim_n |\frac{S_n}{n(n+1)}|} \ge 1. \end{align} $$ By (i), the desired result follows.


Generalization of the second lemma

Another important result is a generalization of the lemma.

Let $A_1,A_2,\cdots$ be pairwise independent events.
$\sum\limits_{n=1}^\infty P(A_n) = \infty \implies \frac{\sum_{m=1}^n \mathbf{1}_{A_m}}{\sum_{m=1}^n P(A_m)}\to 1 \text{ a.s.}$
expand proof

Let $X_m = 1_{A_m}$, $S_n = X_1 + \cdots + X_n$. Note that $0 \le X_m \le 1.$
(Step 1: $\frac{\sum_{m=1}^n \mathbf{1}_{A_m}}{\sum_{m=1}^n P(A_m)} = \frac{S_n}{ES_n} \overset{P}{\to} 1.$)
  $EX_n^2 = P(\mathbf{1}_{A_n}) < \infty.$ Thus $ES_n^2 = \sum_{i=1}^n EX_n^2 + 2\sum_{1\le i < j \le n}EX_i EX_j < \infty.$ $$ \begin{align} P(|\frac{S_n - ES_n}{ES_n}| > \epsilon) &\le \frac{\text{Var}(S_n)}{\epsilon^2(ES_n)^2} \\ &\le \frac{\sum_{i=1}^n EX_i^2}{\epsilon^2(ES_n)^2} \\ &= \frac{\sum_{i=1}^n EX_i}{\epsilon^2(ES_n)^2} \\ &= \frac{1}{\epsilon^2 ES_n} \to 0. \end{align} $$ (Step 2: $\exists n_k$ s.t. $\frac{S_{n_k}}{ES_{n_k}} \to 1 \text{ a.s.}$)
  Let $n_k = \inf\{n:~ ES_n > k^2\}$ and $T_k = S_{n_k}$, then $k^2 \le ET_k \le (k+1)^2.$ Since $P(|\frac{T_k - ET_k}{ET_k}| > \delta) \le \frac{1}{\delta^2 ET_k} \le \frac{1}{\delta^2 k^2}$, $\sum_k P(|\frac{T_k - ET_k}{ET_k}| > \delta) \le \sum_k \frac{1}{k^2} / \delta^2 < \infty.$ By the first Borel-Cantelli lemma, $P(|\frac{T_k - ET_k}{ET_k}| > \delta \text{ i.o.}) = 0$, hence $\frac{T_k}{ET_k} \to 1$ a.s.
(Step 3: sandwich)
  Let $\Omega_0 = \{\lim_k \frac{T_k}{ET_k} = 1\}$, then $P(\Omega_0) = 1.$ Pick $\omega \in \Omega_0$. Because $S_n$ is non-decreasing, for all $n$ there exists $k$ such that $T_k \le S_n \le T_{k+1}$ and $ET_k \le ES_n \le ET_{k+1}$ and thus $\frac{T_k(\omega)}{ET_{k+1}} \le \frac{S_n(\omega)}{ES_n} \le \frac{T_{k+1}(\omega)}{ET_k}.$ Since $\frac{T_k(\omega)}{ET_k} \to 1$ a.s. and $k^2 \le ET_k \le ET_{k+1} \le (k+2)^2$, by sandwich theorem $\frac{S_n(\omega)}{ES_n} \to 1$ and we get the result.

As well as its implication, the proof scheme of it is important as it is used to prove almost sure convergence of bounded random variables in the form of fractions.



Acknowledgement

This post series is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).

(TBD: example - head runs)