2.2. Weak laws of large numbers

$\newcommand{\argmin}{\mathop{\mathrm{argmin}}\limits}$ $\newcommand{\argmax}{\mathop{\mathrm{argmax}}\limits}$

We say a random variable $X_n$ converges in probability (or $P$-converges) to another random variable $X$ and write $X_n \overset{P}{\to} X$ if $\lim_n P(|X_n - X| > \epsilon) = 0$ for all $\epsilon >0.$ We can also define convergence in probability to a constant by letting $X = c\in\mathbb{R}$. It is easy yet useful to know that $X_n \overset{L^p}{\to} X$ with $p > 0$ implies $X_n \overset{P}{\to} X$ by Markov-Chebyshev inequality with $\varphi(x) = |x|^p$.1

Week law of large numbers is about convergence of the sample mean of independent random variables to their expectation in probability. In this section, we cover variants of the week law of large numbers (WLLN), from the simplest $L^2$-WLLN to the most familiar form which only requires $L^1$ condition.


$L^2$ weak law

$X_1, X_2$ are uncorrelated if $EX_1X_2 = EX_1 EX_2.$

If $X_n$’s are independent then it is uncorrelated but the converse does not hold. It is also clear that if $X_i$’s are uncorrelated then $\text{Var}(X_1+\cdots+X_n) = \sum_{i=1}^n \text{Var}(X_i).$ Using Chebyshev’s inequality we can easier get our first WLLN for uncorrelated random variables.

Let $X_i$'s be uncorrelated with $EX_i = \mu$ and $\text{Var}(X_i) \le C < \infty$ for all $i$. Let $S_n = X_1 + \cdots + X_n.$ Then $S_n/n \to \mu$ in probability and in $L^2$.

By chebyshev's inequality, given $\epsilon > 0$, $$\begin{align} P(|S_n/n - \mu| > \epsilon) \le \text{Var}(S_n/n)/\epsilon^2 \le C/\epsilon^2n \to 0 \end{align} $$ and by definition of variance, $$ E|S_n/n - \mu|^2 = \text{Var}(S_n/n) \le C/n \to 0. $$


An example of $L^2$ weak law is in high-dimensional problem.

Theorem 1 requires strong conditions that $X_i$’s should have the same expectation and their variance should be uniformly bounded. Theorem 2 relieves these so that the result could be more useful.

Let $\mu_n = ES_n$, $\sigma_n^2 = \text{Var}(S_n)$.
$\frac{\sigma_n^2}{b_n^2} \to_n 0 \implies \frac{S_n - \mu_n}{b_n} \overset{P}{\to} 0.$

$\sigma_n^2 / b_n^2 = E\left(\frac{S_n - \mu_n}{b_n}\right)^2 \to 0.$

This can be used in any kind of sequence of random variable $(S_n)_{n\in\mathbb{N}}$. (Even the uncorrelatedness is not required!)

Weak law for triangular arrays

Many topics in probability theory concerns triangular arrays $(X_{nk})_{1\le k \le n}$ such that $X_{nk}$, $k=1,\cdots,n$ are independent (row-wise independent). We are interested in limiting behaviors of row-wise sum $S_n := X_{n1} + \cdots + X_{nn}$.

To deal with the case where the finite second moment condition is not ensured, we introduce truncated random variables.


$\bar X := X\mathbf{1}_{(|X| \le M)} = \begin{cases} X &,~ \text{if } |X| \le M \\ 0 &,~ \text{otherwise} \end{cases}$

By ignoring the tail part of $X$, truncated random variable $\bar X$ always has finite moments.

For each $n$, $X_{nk}$, $k=1\cdots,n$ are independent.
Let $b_n > 0$, $b_n \to_n \infty$.
Define $\bar X_{nk} := X_{nk} \mathbf{1}_{(|X_{nk}| \le b_n)}.$
(i) $\sum_{k=1}^n P(|X_{nk}| > b_n) \to_n 0.$
(ii) $\frac{1}{b_n^2} \sum_{k=1}^n E\bar X_{nk}^2 \to_n 0.$
$\implies \frac{S_n - a_n}{b_n} \overset{P}{\to} 0$, where $a_n = \sum_{k=1}^n E\bar X_{nk}.$

Let $\bar S_n = \bar X_{n1} + \cdots + \bar X_{nn}$ so that $a_n = E\bar S_n.$ Then for an arbitrary $\epsilon>0$, $P( |\frac{S_n - a_n}{b_n}| > \epsilon) \le \underbrace{P(S_n \ne \bar S_n)}_\text{(i)} + \underbrace{P(|\frac{\bar S_n - a_n}{b_n}| > \epsilon)}_\text{(ii)}.$
For the first part, $$\begin{align} \text{(i)} &\le P(\bigcup\limits_{k=1}^n \{X_{nk} \ne \bar X_{nk}\}) \le \sum\limits_{k=1}^n P(X_{nk} \ne \bar X_{nk}) \\ & = \sum\limits_{k=1}^n P(|X_{nk}| > b_n) \to_n 0 \end{align} $$ and for the latter, $$ \begin{align} \text{(ii)} &\le E(\frac{\bar S_n - a_n}{b_n})^2 / \epsilon^2 \le \text{Var}(\bar S_n) / \epsilon^2b_n^2 \\ &= \sum\limits_{k=1}^n \text{Var}(\bar S_n) / \epsilon^2b_n^2 \\ &= \sum\limits_{k=1}^n E\bar X_{nk}^2 / \epsilon^2b_n^2 \to_n 0 \end{align} $$ $\therefore P(|\frac{S_n - a_n}{b_n}| > \epsilon) \to_n 0$ for all $\epsilon >0.$


Weak law of large numbers

Result for triangular arrays paves the way for general weak law of large numbers. Before we get to it, let’s take a look at a highly useful lemma.

Let $Y \ge 0$ be a random variable and $p>0.$
$\implies EY^p = \int_0^\infty py^{p-1} P(Y > y) dy.$

$$ \begin{align} \int_0^\infty py^{p-1} P(Y > y) dy &= \int_0^\infty \int_{(Y>y)} py^{p-1} dP dy \\ &= \int_\Omega \int_0^Y py^{p-1} dy dP \\ &= \int_\Omega Y^p dP = EY^p. \end{align} $$ The second equality comes from the Fubini's theorem.
$X_1,X_2,\cdots$ are i.i.d. $S_n = \sum\limits_{i=1}^n X_i.$
$x P(|X_i| > x) \to 0$ as $x \to \infty.$
$\implies S_n/n - \mu_n \overset{P}{\to} 0$ where $\mu_n = EX_1 \mathbf{1}_{(|X_1|\le n)}.$

  Use the weak law for triangular arrays. Let $X_{nk} = X_k$ and $b_n = n.$
  (i) $\sum\limits_{i=1}^n P(|X_i| > n) = n P(|X_1| > n) \to_n 0.$
  (ii) Let $\bar X_k = X_k\mathbf{1}_{(|X_k|\le n)}.$ Then $\frac{1}{n^2} \sum\limits_{k=1}^n E\bar X_k^2 = \frac{1}{n} E\bar X_1^2.$ By the lemma, $E\bar X_1^2 = \int_0^n 2xP(X_1 > x) dx$ so $\frac{1}{n} E\bar X_1^2 \to 0$ as $x\to\infty.$
  By (i) and (ii) and theorem 3, $S_n/n - \mu_n \overset{P}{\to} 0.$

Use this to get the weak law of large numbers.

$X_1, X_2, \cdots$ are i.i.d. with $E|X_i| < \infty.$
Let $S_n = X_1 + \cdots X_n$, $\mu = EX_1.$
$\implies S_n/n \overset{P}{\to} \mu.$

(i) $x P(|X_1| > x) = x E\mathbf{1}_{(|X_1| > x)} \le E|X_1|\mathbf{1}_{(|X_1| > x)}.$ Since $|X_1|\mathbf{1}_{(|X_1| > x)} \le |X_1|$ and $E|X_1| < \infty$ by DCT $E|X_1|\mathbf{1}_{(|X_1| > x)} \to 0$ as $x\to\infty.$
(ii) Let $\mu_n = E|X_1|\mathbf{1}_{(|X_1| \le n)}.$ Since $|X_1|\mathbf{1}_{(|X_1| \le n)} \le |X_1|$ and $E|X_1| < \infty$ by DCT $E|X_1|\mathbf{1}_{(|X_1| \le n)} \to_n \mu.$
By theorem 5, $S_n/n - \mu_n \overset{P}{\to} 0$. Since $\mu_n \overset{P}{\to} \mu$, the desired result follows.

Note that we do not need finite second moment condition.

While I will leave other examples as further readings, I would like to discuss the example of coupon collector’s problem (example 2.2.7).

$X_1,X_2,\cdots \overset{\text{iid}}{\sim} \mathcal{U}\{1,2,\cdots,n\}.$ Define a stopping time $\tau_k^n = \inf\{m:~ |\{X_1,\cdots,X_m\}| = k\}$ and let $T_n = \tau_n^n.$ Then $\frac{T_n}{n\log{n}} \overset{P}{\to} 1.$

Here, $\mathcal{U}\{1,2,\cdots,n\}$ is a discrete uniform distribution from one to $n$. In this example it can be interpreted that $n$ is the total number of different coupons and $\tau_k^n$ represents the number of trials until $k$ different kinds of coupons are collected, $T_n$ is the number of trials until all of the coupons are collected. This example says that asymptotically $n\log n$ trials are required for all $n$ coupons to be collected. Likewise, asymptotic rate of a stochastic process can be calculated using the weak laws.

Let $X_{nk} = \tau_k^n - \tau_{k-1}^n$, $k=1,\cdots,n.$ Then $X_{nk} \sim \text{Geo}(p)$ where $p=1-\frac{k-1}{n}.$ The memoryless property of Benoulli process implies $X_{nk} \perp X_{nj}$, $j=1,\cdots,k-1$ and $T_n = X_{n1} + \cdots + X_{nn}.$
Let $b_n = n\log n$ then it satisfies the condition of theorem 2.



Acknowledgement

This post series is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).

  1. Relationship between $L^p$ convergence, convergence in probability, almost sure convergence will be covered later in detail.