3.4. Central limit theorem

$\newcommand{\argmin}{\mathop{\mathrm{argmin}}\limits}$ $\newcommand{\argmax}{\mathop{\mathrm{argmax}}\limits}$

Now that we have all the right tools, we state and prove the central limit theorem (CLT for short), starting from the simplest form for i.i.d. cases and to Lindeberg and Lyapounov conditions.


Central limit theorem for i.i.d. sequences

We need three short lemmas on complex numbers for the proof of our first CLT.

Let $\xi_1,\cdots,\xi_n,$ $\omega_1,\cdots,\omega_n \in \mathcal{C}$, $|xi_i|,|\omega_i| \le \theta.$ Then $$\left|\prod_{i=1}^n \xi_i - \prod_{i=1}^n \omega_i\right| \le \theta^{n-1} \sum\limits_{i=1}^n |\xi_i - \omega_i|.$$

The proof is clear from induction.

For $b \in \mathbb{C}$ such that $|b| \le 1,$ $|e^b - (1+b)| \le |b|^2.$

Taylor expansion gives $$e^b-(1+b) = \frac{b^2}{2!} + \frac{b^3}{3!} + \cdots.$$ Thus $$|e^b-(1+b)| \le \frac{|b|^2}{2}(1+\frac{1}{2}+\frac{1}{2^2}+\cdots)=|b|^2.$$
For $c_n\to c\in \mathbb{ C},$ $(1+\frac{c_n}{n})^n \to e^c.$

Let $\gamma > |c|$ then $\gamma > |c_n|$ for large $n$. By the two lemmas above, $$\begin{aligned} |e^{c_n} - (1+\frac{c_n}{n})^n| &\le |(e^{\frac{c_n}{n}})^n - (1+\frac{c_n}{n})^n| \\ &\le (e^{\frac{\gamma}{n}})^{n-1} |e^\frac{c_n}{n} - (1+\frac{c_n}{n})| \\ &\le e^\frac{(n-1)\gamma}{n} n |\frac{c_n}{n}|^2 \to 0 \end{aligned}$$ as $n\to\infty.$

Now we prove the main theorem.

Let $X_1,\cdots,X_n$ be i.i.d. with $EX_1 = \mu$ and $\text{Var}(X_1)=\sigma^2 < \infty.$ $\implies \frac{S_n - n\mu}{\sigma\sqrt{n}} \overset{w}{\to} \mathcal{N}(0,1)$ as $n\to\infty.$
expand proof

Without loss of generality, let $\mu=0.$ Let $S=\frac{S_n}{\sigma\sqrt n}$ and $\varphi$ be ch.f. of $X.$ By continuity theorem, showing $\varphi_S\to e^{-\frac{t^2}{2}}$ is enough. Since $EX^2 < \infty,$ the upper bound of the error gives us the nice approximation $\varphi(t) = 1 - \frac{\sigma^2t^2}{2} + o(t^2).$ $$\varphi_S(t) = \left(\varphi\left(\frac{t}{\sigma\sqrt n}\right)\right)^n \\ = \left(1-\frac{t^2}{2n} + o\left(\frac{t^2}{\sigma^2n}\right)\right)^n \\ = \left(1 - \frac{t^2}{2n} + o\left(\frac{1}{n}\right)\right)^n.$$ This with the lemmas gives us $$\begin{aligned} &\left|\varphi_S(t) - \left(1-\frac{t^2}{2n}\right)^n\right| \\ &= \left| \left(\varphi\left(\frac{t}{\sigma\sqrt n}\right) \right)^n - \left(1-\frac{t^2}{2n}\right)^n \right| \\ &\le n \left| \varphi\left(\frac{t}{\sigma \sqrt n}\right) - \left( 1-\frac{t^2}{2n} \right) \right| \\ &= n \cdot o\left(\frac{1}{n}\right) \to 0 \end{aligned}$$ as $n\to\infty$ and the proof is done.

The next example shows that unlike the strong law of large numbers, pairwise independence is not enough for the central limit theorem.

Let $\xi_1,\xi_2\cdots$ are i.i.d. with $P(\xi_1=z)=\frac{1}{2}$ if $z=1$ and $\frac{1}{2}$ if $z=-1.$ Define $X_i$'s as follows: $$\begin{aligned} & X_1 = \xi_1, \\ & X_2 = X_1\xi_2, \\ & X_3 = X_1\xi_3,~ X_4 = X_2\xi_3 \\ & X_5 = X_1\xi_4,~ X_6 = X_2\xi_4,~ X_7 = X_4\xi_4 ,\cdots \end{aligned}$$ Then $X_i$'s are pairwise independent because $X_i$'s are products of different sets of $\xi_j$'s. Let $$\begin{aligned} S_{2^n} &= \sum\limits_{i=1}^{2^n} X_i \\ &= \xi_1(1+\xi_2)\cdots(1+\xi_{n+1}) \\ &=\begin{cases} \pm 2^n &,~ \text{with probability } \frac{1}{2^n} \\ 0 &,~ \text{with probability } 1-\frac{1}{2^n} \end{cases} \end{aligned}$$ so that $ES_{2^n}=0$ and $\text{Var}(S_{2^n}) = 2^{n-2}.$ Then $$\begin{aligned} \frac{S_{2^n}}{\sqrt{2^n}\sqrt{2^{n-2}}} &= \frac{S_{2^n}}{s^{n-1}} \\ &= \begin{cases} \pm 2^{\frac{n}{2}+1} &,~ \text{with probability } \frac{1}{2^n} \\ 0 &,~ \text{with probability } 1-\frac{1}{2^n} \end{cases} \end{aligned}$$ and it does not converge weakly to $\mathcal{ N}(0,1).$


Lindeberg-Feller condition

Our goal is to extend CLT for cases where identical distribution condition is not guaranteed. We again take advantage of triangular arrays for this.

$\{X_{nk}\}_{k=1}^{r_n}$ is a rowwise independent traingular array with $EX_{nk} = 0$ and $\text{Var}(X_{nk}) = \sigma_{nk}^2 < \infty.$ Let $s_n^2 = \sum_{k=1}^{r_n} \sigma_{nk}^2.$
$\forall\epsilon>0,~ \lim_n \frac{1}{s_n^2} \sum\limits_{k=1}^{r_n} \int_{|X_{nk}|>\epsilon s_n} X_{nk}^2 dP = 0. \;\;\;\;(*)$
$\implies \frac{\sum_{k=1}^{r_n} X_{nk}}{s_n} \overset{w}{\to} \mathcal{ N}(0,1).$

We say $(*)$ the Lindeberg-Feller or just Lindeberg condition. The condition basically says the tail part of $s_n^2$ is dominated by $s_n^2$ so that only $s_n^2 \approx \bar s_n^2$ where $\bar s_n^2 = \sum_{k=1}^n \int_{|X_{nk}|\le \epsilon s_n} X_{nk}^2 dP.$ Intuitively, like we truncated random variables for laws of large numbers, we truncate the variance of it.

expand proof

Let $S_n = X_{n1}+\cdots+X_{nr_n}$ be the rowwise sum and $\varphi_{nk}$ be ch.f. of $X_{nk}.$ Without loss of generality assume that $s_n^2 = 1.$ Then $\varphi_{S_n}(t) = \prod_{k=1}^{r_n}\varphi_{nk}(t).$ so $$\begin{aligned} &|\varphi_{S_n}(t) - e^{-\frac{t^2}{2}}| \\ &= \left| \prod_{k=1}^{r_n} \varphi_{nk}(t) - \prod_{k=1}^{r_n} e^{-\frac{1}{2}\sigma_{nk}^2 t^2} \right| \\ &\le \sum_{k=1}^{r_n} |\varphi_{nk}(t) - e^{-\frac{1}{2}\sigma_{nk}^2 t^2}| \\ &\le \sum_{k=1}^{r_n} \left\{ \underbrace{\left|\varphi_{nk}(t) - (1-\frac{1}{2}\sigma_{nk}^2t^2)\right|}_\text{(i)} + \underbrace{\left|(1-\frac{1}{2}\sigma_{nk}^2t^2) - e^{-\frac{1}{2}\sigma_{nk}^2 t^2}\right|}_\text{(ii)} \right\} \end{aligned}$$
$$\begin{aligned} \text{(i)} &\le E\left( \frac{|tX_{nk}|^3}{3!} \wedge \frac{2|tX_{nk}|^2}{2!} \right) \\ &\le E\left( |tX_{nk}|^3 \wedge |tX_{nk}|^2 \right) \\ &= \int_{|X_{nk}|\le\epsilon} |tX_{nk}|^3 \wedge |tX_{nk}|^2 dP + \int_{|X_{nk}|>\epsilon} |tX_{nk}|^3 \wedge |tX_{nk}|^2 dP \\ &\le \int_{|X_{nk}|\le\epsilon} |tX_{nk}|^3 dP + t^2 \int_{|X_{nk}|>\epsilon} X_{nk}^2 dP \\ &\le \epsilon |t|^3 s_n^2 + t^2 \int_{|X_{nk}|>\epsilon} X_{nk}^2 dP \end{aligned}$$ By the Lindeberg condition, right-most term goes to 0 as $n\to\infty.$ Since was $\epsilon>0$ was arbitrary, $\text{(i)}\to0.$
  Now observe that for small $|t|,$ $|-\frac{1}{2}\sigma_{nk}^2 t^2| < 1$ so by the lemma $$|e^{-\frac{1}{2}\sigma_{nk}^2 t^2} - (1-\frac{1}{2}\sigma_{nk}^2 t^2)| \le |-\frac{1}{2}\sigma_{nk}^2 t^2| = \frac{1}{4}\sigma_{nk}^4 t^4$$ and it follows: $$\begin{aligned} \text{(ii)} &\le \sum\limits_{k=1}^{r_n} \frac{1}{4} \sigma_{nk}^4 t^4 \\ &\le \frac{1}{4} t^4 \sum\limits_{k=1}^{r_n} \max_{1\le k\le r_n} \sigma_{nk}^2 \cdot \sigma_{nk}^2 \\ &= \frac{1}{4} t^4 \max_{1\le k\le r_n} \sigma_{nk}^2. \end{aligned}$$ Note that $$\begin{aligned} \sigma_{nk}^2 &= \int_{|X_{nk}|\le\epsilon} X_{nk}^2 dP + \int_{|X_{nk}|>\epsilon} X_{nk}^2 dP \\ &\le \epsilon^2 + \sum\limits_{k=1}^{r_n} \int_{|X_{nk}|>\epsilon} X_{nk}^2 dP \end{aligned}$$ and the last term is independent to $k.$ By the Lindeberg condition, $\max_{1\le k\le r_n}\sigma_{nk}^2 \to 0$ and $\text{(ii)} \to 0.$
  $\therefore \varphi_{S_n}(t) \to e^{-\frac{t^2}{2}}.$

Before moving on to Lyapounov condition, I would like to remark two more points. First, as we can see in the proof, if the Lindeberg condition is met \[ \max_{1\le k\le r_n} \frac{\sigma_{nk}^2}{s_n} = \max_{1\le k\le r_n} \frac{\sigma_{nk}^2}{\sqrt{\sum_{k=1}^{r_n}\sigma_{nk}^2}} \to 0. \] Second, the Lindeberg condition always holds for i.i.d. sequences. Suppose $X_1,X_2,\cdots \overset{\text{iid}}{\sim}(0,\sigma^2)$ then \[ \frac{1}{\sigma^2}\int_{|X_k|>\epsilon\sigma\sqrt{n}} X_k^2 dP \to 0,~ \forall \epsilon > 0 \] by DCT. In such manner, we can think of it as a generalization of the CLT for i.i.d. sequences.

Lyapounov condition

Sometimes it is easier to show sufficiency of the Lindeberg condition than to directly show the condition. Lyapounov condition is the one.

Let $\{X_{nk}\}_{k=1}^{r_n}$ be a rowwise independent triangular array. $\{X_{nk}\}_{k=1}^{r_n}$ satisfies the Lyapounov condition if for some $\delta>0,$
(i) $E|X_{nk}^{2+\delta}| < \infty.$
(ii) $\frac{1}{s_n^{2+\delta}} \sum\limits_{k=1}^{r_n} \int |X_{nk}|^{2+\delta} dP \to 0.$
If $\{X_{nk}\}_{k=1}^{r_n}$ satisfies Lyapounov condition, then it satisfies Lindeberg condition as well.

$$\begin{aligned} &\frac{1}{s_n^2} \sum\limits_{k=1}^{r_n} \int_{|X_{nk}|>\epsilon s_n} X_{nk}^2 dP \\ &= \frac{1}{s_n^2} \sum\limits_{k=1}^{r_n} \int_{\frac{|X_{nk}|}{\epsilon s_n}>1} X_{nk}^2 dP \\ &\le \frac{1}{s_n^2} \sum\limits_{k=1}^{r_n} \int_{\frac{|X_{nk}|}{\epsilon s_n}>1} \left( \frac{|X_{nk}|}{\epsilon s_n} \right)^\delta \cdot X_{nk}^2 dP \\ &\le \frac{1}{\epsilon^\delta} \frac{1}{s_n^{2+\delta}} \sum\limits_{k=1}^{r_n} \int |X_{nk}|^{2+\delta} dP \to 0. \end{aligned}$$


Lindeberg and Lyapounov conditions of higher order

In fact, the condition introduced above is sometimes phrased Lyapounov condition of order $r=2+\delta.$ Corresponding Lindeberg condition of order $r>2$ is as follows.

$$\forall \epsilon>0,~ \frac{1}{s_n^r}\sum\limits_{k=1}^{r_n} \int_{|X_{nk}|>\epsilon s_n} |X_{nk}|^r dP \to 0 \text{ as } n \to \infty.$$

It can be shown that both conditions are equivalent in order $r>2.$

Examples

As an example, we continue to prove the unproven part of the three series theorem.

($\Rightarrow$)
(i) Suppose $\sum_n P(|X_n|>A)=\infty.$ Then by the second Borel-Cantelli lemma, $P(|X_n|>A \text{ i.o.}) = 1$ and $\sum_{n=1}^\infty X_n$ does not converge ($\rightarrow\!\leftarrow$).
(ii) Suppose $\sum_n \text{Var}(Y_n) = \infty.$ Let $c_n = sum_{k=1}^n \text{Var}(Y_k),$ $u_{nk} = \frac{Y_k - EY_k}{\sqrt c_n},$ then $Eu_{nk} = 0$ and $s_n^2 = \sum_{k=1}^n \text{Var}(u_{nk}) = 1.$ Since for all $\epsilon>0,$ $|u_{nk}|\le \frac{2A}{\sqrt c_n} < \epsilon$ for large $n,$ $\int_{|u_{nk}|>\epsilon} u_{nk}^2 dP = 0$ and Lindeberg condition is met. Hence $\sum_{k=1}^n u_{nk} \overset{w}{\to} \mathcal{N}(0,1). \;(*)$ From this result, we get $$\begin{aligned} &\sum_n P(|X_n|>A) < \infty \\ &\implies P(X_n=Y_n \text{ eventually}) = 1 \\ &\implies \sum_n X_n \text{ converges iff } \sum_n Y_n \text{ converges} \end{aligned}$$ By (i), $\sum_n Y_n$ converges and $\frac{\sum_{k=1}^n Y_k}{\sqrt c_n}\to0 \text{ a.s.} \;(**)$
  By $(*),(**)$ and Slutsky's theorem, $-\frac{1}{\sqrt c_n}\sum_{k=1}^n EY_k \overset{w}{\to} \mathcal{N}(0,1)$ which is the contradiction since a sequence of real numbers cannot converge to a distribution.
(iii) By (ii), $\sum_n Y_n$ converges a.s. $\sum_n (Y_n - EY_n)$ also converges by theorem 2.5.6. Thus $\sum_n EY_n$ converges.


Acknowledgement

This post series is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).