3.2.1. Weak convergence

Now that we covered convergence of point estimates (specifically, the sample mean) our next interest is in weaker concept of convergence where convergence in probability is not guaranteed. In undergraduate statistics, we call it convergence in distribution. Here we prefer borrowing terminology from measure theory and call it weak convergence and write $X_n \overset{w}{\to} X$.

Definition
Converging together lemmas
Tools
Metric for weak convergence

Definition

Let $(F_n)$ be a sequence of distribution functions. $F_n$ converges weakly to a distribution $F$, if $F_n(x) \to F(x)$ for all $x$ that is a continuity point of $F.$ Likewise, a sequence of random variables $X_n$ converges weakly to a random variable $X$ if the corresponding distribution functions weakly converges.

The name “weak convergence” was named after weak topology generated by bounded continuous functions: Convergence in weak topology is weak convergence. Some textbook separates the usage of terms weak convergence and convergence in distribution by assigning the former to distribution functions and the latter to random variables. Durrett used the former for distribution and both for random variables.

While Durrett define the concept as pointwise convergence at continuity points, Billingsley¹ define it differently.

A sequence of probability measures $P_n$ converges weakly to a probability measure $P,$ if $\int f dP_n \to \int f dP_n$ for all bounded continuous function $f.$ $F_n \overset{w}{\to} F$ if their corresponding probability measures weakly converges. $X_n \overset{w}{\to} X$ if $Ef(X_n) \to Ef(X)$ for all bounded continuous $f.$

Equivalence of both definitions is not difficult to show. $1\rightarrow 2$ is trivial by the Skorohod’s representation theorem (see below) and BCT. To show $2\rightarrow 1$, consider a function

\[g_{x,\epsilon}(y) = \begin{cases} 1 &,~ y \le x \\ 0 &,~ y \ge x+\epsilon \\ \text{linear} &,~ x < y < x+\epsilon \end{cases}\]

for $x$: a continuity point of $F$ and an arbitrary $\epsilon >0.$ Then $g_{x,\epsilon}$ is continuous and bounded with

\[g_{x-\epsilon,\epsilon} \le \mathbf{1}_{y \le x} \le g_{x,\epsilon}\]

and we get

\[F(x-\epsilon) \le \liminf\limits_n F_n(x) \le \limsup\limits_n F_n(x) \le F(x+\epsilon).\]

Since $x$ was a continuity point and $\epsilon>0$ was arbitrary, letting $\epsilon \to 0$ yields the result we want.

Converging together lemmas

(i) $X_n\overset{w}{\to} X$ and $Y_n \overset{w}{\to} c$ for $c \in \mathbb{R}$ $\implies X_n+Y_n \overset{w}{\to} X+c.$
(ii) $X_n\overset{w}{\to} X$ and $Y_n \overset{w}{\to} c$ for $c \in \mathbb{R}$ $\implies X_nY_n \overset{w}{\to} cX.$
(iii) $X_n\overset{P}{\to}X$ $\implies X_n \overset{w}{\to} X.$
(iv) $X_n\overset{w}{\to} c$ for $c\in\mathbb{R}$ $\implies X_n \overset{P}{\to} c.$

The proof is not difficult so I would like to leave it as an exercise.

Tools

Skorohod’s representation theorem

Even though the weak convergence is weak, we can relate it with much stronger almost sure convergence. Skorohod’s representation theorem allows us to do so.

Suppose $F_n \overset{w}{\to} F_\infty.$ Then there exists random variables $Y_n \sim F_n$ and $Y_\infty \sim F_\infty$ such that $Y_n \to Y_\infty \text{ a.s.}$

The proof is similar to that of theorem 1.2.2.

expand proof

Let $(\Omega, \mathcal{F}, P) = ((0,1), \mathcal{B}(0,1), \lambda)$ where $\lambda$ is the Lebesgue measure. Define $Y_n(\omega) = \sup\{y:~ F_n(y) < \omega\}$ so that $Y_n \sim F_n$ for $1\le n \le \infty$ which implies $$ \begin{align} F(y) < \omega &\iff y < Y_\infty(\omega), \\ F(z) > \omega &\iff z > Y_\infty(\omega). \end{align} $$ We now need almost sure convergence. Let $a_\omega = \sup\{y:~ F_\infty(y)<\omega \}$, $b_\omega = \inf\{y:~ F_\infty(y)>\omega \}$, $\Omega_0 = \{\omega:~ (a_\omega, b_\omega) = \phi\}.$ Observe that $\omega \notin \Omega_0$ are at most countable since $(a_\omega, b_\omega)$'s are disjoint non-empty intervals containing distinct rational numbers. Thus $P(\Omega_0) = 1.$ Pick an arbitrary $\omega, \omega' \in \Omega_0$ such that $\omega < \omega'.$ Then for given $\epsilon >0$, $Y_n(\omega) \to Y_\infty(\omega)$ ($\omega$ is a continuity point of $Y_\infty$) implies for all large $n,$ $$ Y_\infty(\omega) - \epsilon < Y_n(\omega) < Y_\infty(\omega') + \epsilon. $$ So it follows that $$ Y_\infty(\omega) \le \liminf\limits_n Y_n(\omega) \le \limsup\limits_n Y_n(\omega) \le Y_\infty(\omega'). $$ Let $\omega' \to \omega$ then for all $\omega \in \Omega_0$, $\lim_n Y_n(\omega) = Y_\infty(\omega).$

Note that $\Omega_0$ in the proof is the set where $F_\infty$ is strictly increasing. Hence $F_\infty^{-1}$ can be defined on $\Omega_0$. We defined $Y_n,Y$ to be $F_n^{-1}, F_\infty^{-1}$ on $\Omega_0$ respectively.

Convergence theorems

The theorem allows us to use our favorites tools such as convergence theorems directly with minimum effort of proving it.

$g \ge 0$, $g$ is continuous, $X_n \overset{w}{\to} X_\infty.$ $\implies \liminf_n Eg(X_n) \ge Eg(X_\infty).$

Let $Y_n \sim F_n$ for $1 \le n \le \infty$ be random variables such that $Y_n \to Y$ a.s. By Fatou's lemma for almost surely convergent random variables, $\liminf_n Eg(Y_n) \ge Eg(Y_\infty).$ Since $X_n \overset{d}{=} Y_n$ for all $1\le n\le\infty$ the result follows.

$g,h$ are continuous, $g>0,$ $\frac{|h(x)|}{g(x)} \to 0$ as $|x|\to\infty,$ $\int g dF_n < \infty.$ If $F_n \overset{w}{\to} F,$ then $\int h dF_n \to \int h dF.$

Continuous mapping theorem²

$g$ is a measurable function. $D_g=\{x:~ g \text{ is not continuous at} x\}.$ If $X_n \overset{w}{\to} X_\infty$ and $P(X_\infty \in D_g)=0,$ then $g(X_n) \overset{w}{\to} g(X_\infty).$

Such $D_g$ is called the discontinuity set of $g.$ The theorem extends the possible choice of $g$ from continuous to measurable functions.

Portmanteau theorem¹

The following theorem is so important that it is named portmanteau. It is actually equivalent definitions of weak convergence. I found that almost every theorem regarding weak convergence can be proved with this to some extent.

For probability measures $P_n, P$ on a measurable space $(\Omega, \mathcal{F})$ where $\Omega$ is a metric space, the followings are equivalent.
(i) $P_n \overset{w}{\to} P.$
(ii) $\limsup_n P_n(F) \le P(F),~ \forall \text{closed } F.$
(iii) $\liminf_n P_n(G) \ge P(G),~ \forall \text{open } G.$
(iv) $P_n(A) \to P(A),~ \forall A:~ P(\partial A)=0.$

expand proof

((i)$\Rightarrow$(ii)) For given $\epsilon > 0$, let $f^\epsilon(x) = (1 - \inf_{y\in F}|x - y|/\epsilon)^+$ so that $\mathbf{1}_F \le f^\epsilon \le \mathbf{1}_{F^\epsilon}$ where $F^\epsilon = \{x: \inf_{y\in F}|x-y| < \epsilon\}.$ Then $f^\epsilon$ is continuous and bounded so $$ \limsup_n P_n(F) \le \limsup_n \int f^\epsilon dP_n = \int f^\epsilon dP \le P(F^\epsilon). $$ Letting $\epsilon \to 0$ leads to the result.
((ii)$\Leftrightarrow$(iii)) is trivial.
((ii)&(iii)$\Rightarrow$(iv)) $$ \limsup_n P_n(A) \le \limsup_n P_n(\overline A) \le P(\overline A) = P(A). \\ \liminf_n P_n(A) \ge \liminf_n P_n(A^\circ) \ge P(A^\circ) = P(A). $$ ((iv)$\Rightarrow$(i)) Without loss of generality, let $f$ be a continuous function with $0 \le f \le 1.$ Note that $P(f=t) > 0$ for at most countably many $t$'s so that $P(\partial\{f>t\})=0.$ Hence by (iv), $$ \begin{align} \int f dP_n &= \int_0^\infty P_n(f > t) dt \\ &\to \int_0^\infty P(f > t) dt = \int f dP. \end{align} $$ is the desired result.

Metric for weak convergence

$\rho(F,G) = \inf\{\epsilon:~ F(x-\epsilon)-\epsilon \le G(x) \le F(x+\epsilon)+\epsilon,\forall x\}$ is the Lévy metric.

The Lévy metric is a metric defined on a space of distribution functions that metrizes weak convergence. That is, $\rho(F_n,F_\infty)\to0$ if and only if $F_n \overset{w}{\to} F_\infty.$³ With this measure we can regard a space of probability measures as a metric space, which is well studied.

Acknowledgement

This post series is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).

This post is also based on the textbook Convergence of Probability Measures, 2nd edition (Billingsley, 1999) and the lecture at SNU (instructor: Prof. Jaeyong Lee).

Billingsley, 1999, Convergence of Probability Measures, 2nd edition. ↩ ↩²
I will prove it in more general form later when reviewing Billingsley (1999). ↩
TBD. ↩

Search on De Novo