3.6. Poisson convergence
I would like to finish reviewing Probability theory I by briefly mentioning the Poisson convergence (section 3.6) and limit theorems in $\mathbb{R}^d$ (3.10).
Poisson convergence is about limiting laws of a sequence of independent discrete Bernoulli-like random variables with $ES_n$ converges to a constant $\lambda.$ I used the term “Bernoulli-like” because for large enough $n,$ such $X_n$ should behave similar to Bernoulli random variables. For those sequences, the limiting law is not normal, but actually Poisson.
Basic Poisson convergence
(i) $\sum_{k=1}^n p_{nk} \to \lambda \in (0,\infty).$
(ii) $\max_{1\le k\le n} p_{nk} \to 0.$
Then $S_n=\sum_{k=1}^n X_{nk} \overset{w}{\to} \mathcal{P}(\lambda).$
expand proof
As in the proof of the CLT, it suffices to show $\varphi_{S_n}(t) \to \exp(\lambda(e^{it}-1)).$ $$\begin{aligned} \varphi_{S_n}(t) &= \prod_{k=1}^n \varphi_{nk}(t) \\ &= \prod_{k=1}^n \left( p_{nk}(e^{it}-1) + 1 \right). \end{aligned}$$ Thus $$\begin{aligned} &\left| \varphi_{S_n} - e^{\lambda(e^{it}-1)} \right| \\ &= \left| \prod_{k=1}^n \left( p_{nk}(e^{it}-1) + 1 \right) - \prod_{k=1}^n e^{p_{nk}(e^{it}-1)} \right| \\ &= \sum_{k=1}^n \left| p_{nk}(e^{it}-1)+1 - e^{p_{nk}(e^{it}-1)} \right| \\ &\;\;\;\;(b = p_{nk}(e^{it}-1),~ |b| \le 2p_{nk} \le 1 \text{ for large n.}) \\ &\le \sum_{k=1}^n p_{nk}^2 |e^{it}-1|^2 \le 4 \sum_{k=1}^n p_{nk}^2 \\ &\le 4 \max_{1\le k\le n} p_{nk} \sum_{k=1}^n p_{nk} \to 0. \end{aligned}$$ The first inequality is from the lemmas we covered before $$\begin{aligned} &|p_{nk}(e^{it}-1)+1| \le 1-p_{nk}+p_{nk}|e^{it}| \le 1.\\ &|e^{p_{nk}(e^{it}-1)}| \le e^{p_{nk} |e^{it}-1|} \le e^{2p_{nk}} \le 1 \text{ for large } n. \end{aligned}$$
For a special case, consider $X_k \overset{\text{iid}}{\sim} \mathcal{B}(p_n)$ for $k=1,\cdots,n$ where $np_n\to\mu.$ Then $S_n \overset{w}{\to} \mathcal{P}(\mu).$
General Poisson convergence
(i) $\sum_{k=1}^n p_{nk} \to \lambda \in (0,\infty).$
(ii) $\max_{1\le k\le n} p_{nk} \to 0.$
(iii) $\sum_{k=1}^n \epsilon_{nk} \to 0.$
then $S_n = \sum_{k=1}^n X_{nk} \overset{w}{\to} \mathcal{P}(\lambda).$
Let $X_{nk}' = X_{nk}$ if $X_{nk}=1$ and $0$ otherwise. Then $$\begin{aligned} P(S_n \ne S_n') \le \sum_{k=1}^n P(X_{nk} \ge 2) = \sum_{k=1}^n \epsilon_{nk} \to 0. \end{aligned}$$ By basic Poisson convergence theorem, $S_n' \overset{w}{\to} \mathcal{P}(\lambda).$ By the converging together lemma, we get the desired result.
Total variation and weak convergence
Lastly, I will introduce total variation, a metric on a space of discrete probability measures. Like Levy metric, convergence in total variation distance is equivalent to weak convergence on such space.
$$\begin{aligned} \|\mu-\nu\| &:= \frac{1}{2} \sum_{z\in\mathcal{S}} |\mu(z) - \nu(z)| \\ &= \sup_{A\subset\mathcal{S}} |\mu(A) - \nu(A)| \end{aligned}$$ is the total variation distance between $\mu$ and $\nu.$
While the first formula is the definition, the second one is a derived property. Note that
\[\begin{aligned} \sum_{z\in\mathcal{S}}|\mu(z)-\nu(z)| &\ge |\mu(A) - \nu(A)| + |\mu(A^c)-\nu(A^c)| \\ &= 2|\mu(A)-\nu(A)|. \end{aligned}\]Let $A=\{z:~ \mu(z) \ge \nu(z)\}$ then the equality holds.
(ii) $\|\mu_n-\mu\|\to0$ if and only if $\mu_n(z)\to\mu(z)$ for all $z\in\mathbb{Z}.$ (i.e. $\mu_n \overset{w}{\to} \mu.$)
(ii)
($\Rightarrow$) Given $z\in\mathbb{Z},$ $$\begin{aligned} |\mu_n(z)-\mu(z)| \le \sup_{A\subset\mathbb{Z}}|\mu_n(A)-\mu(A)| \to 0. \end{aligned}$$ ($\Leftarrow$) $\mu_n \overset{w}{\to} \mu$ then $$\begin{aligned} \frac{1}{2}\sum_{z\in\mathbb{Z}} |\mu_n(z)-\mu(z)| = \sum_{z\in\mathbb{Z}} (\mu_(z)-\mu(z))^+ \to 0 \end{aligned}$$ by DCT.
Acknowledgement
This post series is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).