3.2.1. Weak convergence
Now that we covered convergence of point estimates (specifically, the sample mean) our next interest is in weaker concept of convergence where convergence in probability is not guaranteed. In undergraduate statistics, we call it convergence in distribution. Here we prefer borrowing terminology from measure theory and call it weak convergence and write $X_n \overset{w}{\to} X$.
Definition
The name “weak convergence” was named after weak topology generated by bounded continuous functions: Convergence in weak topology is weak convergence. Some textbook separates the usage of terms weak convergence and convergence in distribution by assigning the former to distribution functions and the latter to random variables. Durrett used the former for distribution and both for random variables.
While Durrett define the concept as pointwise convergence at continuity points, Billingsley1 define it differently.
Equivalence of both definitions is not difficult to show. $1\rightarrow 2$ is trivial by the Skorohod’s representation theorem (see below) and BCT. To show $2\rightarrow 1$, consider a function
\[g_{x,\epsilon}(y) = \begin{cases} 1 &,~ y \le x \\ 0 &,~ y \ge x+\epsilon \\ \text{linear} &,~ x < y < x+\epsilon \end{cases}\]for $x$: a continuity point of $F$ and an arbitrary $\epsilon >0.$ Then $g_{x,\epsilon}$ is continuous and bounded with
\[g_{x-\epsilon,\epsilon} \le \mathbf{1}_{y \le x} \le g_{x,\epsilon}\]and we get
\[F(x-\epsilon) \le \liminf\limits_n F_n(x) \le \limsup\limits_n F_n(x) \le F(x+\epsilon).\]Since $x$ was a continuity point and $\epsilon>0$ was arbitrary, letting $\epsilon \to 0$ yields the result we want.
Converging together lemmas
(i) $X_n\overset{w}{\to} X$ and $Y_n \overset{w}{\to} c$ for $c \in \mathbb{R}$ $\implies X_n+Y_n \overset{w}{\to} X+c.$
(ii) $X_n\overset{w}{\to} X$ and $Y_n \overset{w}{\to} c$ for $c \in \mathbb{R}$ $\implies X_nY_n \overset{w}{\to} cX.$
(iii) $X_n\overset{P}{\to}X$ $\implies X_n \overset{w}{\to} X.$
(iv) $X_n\overset{w}{\to} c$ for $c\in\mathbb{R}$ $\implies X_n \overset{P}{\to} c.$
The proof is not difficult so I would like to leave it as an exercise.
Tools
Skorohod’s representation theorem
Even though the weak convergence is weak, we can relate it with much stronger almost sure convergence. Skorohod’s representation theorem allows us to do so.
The proof is similar to that of theorem 1.2.2.
expand proof
Let $(\Omega, \mathcal{F}, P) = ((0,1), \mathcal{B}(0,1), \lambda)$ where $\lambda$ is the Lebesgue measure. Define $Y_n(\omega) = \sup\{y:~ F_n(y) < \omega\}$ so that $Y_n \sim F_n$ for $1\le n \le \infty$ which implies $$ \begin{align} F(y) < \omega &\iff y < Y_\infty(\omega), \\ F(z) > \omega &\iff z > Y_\infty(\omega). \end{align} $$ We now need almost sure convergence. Let $a_\omega = \sup\{y:~ F_\infty(y)<\omega \}$, $b_\omega = \inf\{y:~ F_\infty(y)>\omega \}$, $\Omega_0 = \{\omega:~ (a_\omega, b_\omega) = \phi\}.$ Observe that $\omega \notin \Omega_0$ are at most countable since $(a_\omega, b_\omega)$'s are disjoint non-empty intervals containing distinct rational numbers. Thus $P(\Omega_0) = 1.$ Pick an arbitrary $\omega, \omega' \in \Omega_0$ such that $\omega < \omega'.$ Then for given $\epsilon >0$, $Y_n(\omega) \to Y_\infty(\omega)$ ($\omega$ is a continuity point of $Y_\infty$) implies for all large $n,$ $$ Y_\infty(\omega) - \epsilon < Y_n(\omega) < Y_\infty(\omega') + \epsilon. $$ So it follows that $$ Y_\infty(\omega) \le \liminf\limits_n Y_n(\omega) \le \limsup\limits_n Y_n(\omega) \le Y_\infty(\omega'). $$ Let $\omega' \to \omega$ then for all $\omega \in \Omega_0$, $\lim_n Y_n(\omega) = Y_\infty(\omega).$
Note that $\Omega_0$ in the proof is the set where $F_\infty$ is strictly increasing. Hence $F_\infty^{-1}$ can be defined on $\Omega_0$. We defined $Y_n,Y$ to be $F_n^{-1}, F_\infty^{-1}$ on $\Omega_0$ respectively.
Convergence theorems
The theorem allows us to use our favorites tools such as convergence theorems directly with minimum effort of proving it.
Let $Y_n \sim F_n$ for $1 \le n \le \infty$ be random variables such that $Y_n \to Y$ a.s. By Fatou's lemma for almost surely convergent random variables, $\liminf_n Eg(Y_n) \ge Eg(Y_\infty).$ Since $X_n \overset{d}{=} Y_n$ for all $1\le n\le\infty$ the result follows.
Continuous mapping theorem2
Such $D_g$ is called the discontinuity set of $g.$ The theorem extends the possible choice of $g$ from continuous to measurable functions.
Portmanteau theorem1
The following theorem is so important that it is named portmanteau. It is actually equivalent definitions of weak convergence. I found that almost every theorem regarding weak convergence can be proved with this to some extent.
(i) $P_n \overset{w}{\to} P.$
(ii) $\limsup_n P_n(F) \le P(F),~ \forall \text{closed } F.$
(iii) $\liminf_n P_n(G) \ge P(G),~ \forall \text{open } G.$
(iv) $P_n(A) \to P(A),~ \forall A:~ P(\partial A)=0.$
expand proof
((i)$\Rightarrow$(ii)) For given $\epsilon > 0$, let $f^\epsilon(x) = (1 - \inf_{y\in F}|x - y|/\epsilon)^+$ so that $\mathbf{1}_F \le f^\epsilon \le \mathbf{1}_{F^\epsilon}$ where $F^\epsilon = \{x: \inf_{y\in F}|x-y| < \epsilon\}.$ Then $f^\epsilon$ is continuous and bounded so $$ \limsup_n P_n(F) \le \limsup_n \int f^\epsilon dP_n = \int f^\epsilon dP \le P(F^\epsilon). $$ Letting $\epsilon \to 0$ leads to the result.
((ii)$\Leftrightarrow$(iii)) is trivial.
((ii)&(iii)$\Rightarrow$(iv)) $$ \limsup_n P_n(A) \le \limsup_n P_n(\overline A) \le P(\overline A) = P(A). \\ \liminf_n P_n(A) \ge \liminf_n P_n(A^\circ) \ge P(A^\circ) = P(A). $$ ((iv)$\Rightarrow$(i)) Without loss of generality, let $f$ be a continuous function with $0 \le f \le 1.$ Note that $P(f=t) > 0$ for at most countably many $t$'s so that $P(\partial\{f>t\})=0.$ Hence by (iv), $$ \begin{align} \int f dP_n &= \int_0^\infty P_n(f > t) dt \\ &\to \int_0^\infty P(f > t) dt = \int f dP. \end{align} $$ is the desired result.
Metric for weak convergence
The Lévy metric is a metric defined on a space of distribution functions that metrizes weak convergence. That is, $\rho(F_n,F_\infty)\to0$ if and only if $F_n \overset{w}{\to} F_\infty.$3 With this measure we can regard a space of probability measures as a metric space, which is well studied.
Acknowledgement
This post series is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).
This post is also based on the textbook Convergence of Probability Measures, 2nd edition (Billingsley, 1999) and the lecture at SNU (instructor: Prof. Jaeyong Lee).