3.3. Characteristic functions
In undergraduate statistics, we learned moment generating functions and that it uniquely determines the distribution if exists. However moment generating function does not always exsits for all distributions. Characteristic functions always exists for all real-valued random variables and it provides alternative approach to working with distributions.
- Characteristic function
- Inversion formula
- Continuity theorem
- Moment generating property
- Estimation of the error term
Characteristic function
Or we could just notate it ch.f. for short. Every characteristic functions have the following properties regardless of the distribution.
(ii) $\varphi_X(-t) = \overline{\varphi_X(t)}.$
(iii) $|\varphi_X(t)| \le 1.$
(iv) $\varphi_X$ is uniformly continuous.
(v) $\varphi_{aX+b} = e^{itb}\varphi_X(at).$
(vi) $X_1 \perp X_2$ then $\varphi_{X_1+X_2} = \varphi_{X_1}\varphi_{X_2}.$
(iv) $$ \begin{align} |\varphi(t+h) - \varphi(t)| &= |E(e^{i(t+h)X - e^{itX}})| \\ &\le E\cancelto{1}{|e^{itX}|}|e^{ihX - 1}| \end{align} $$ holds for all $t\in\mathbb{R}.$ By BCT, $E|e^{ihX}-1|\to0$ as $h\to\infty.$ Thus for all $\epsilon>0,$ there exists $h$ such that $|\varphi(t+h)-\varphi(t)|<\epsilon$ for all $t.$
other proofs are direct from the definition. Another property comes from the additivity of lebesgue integral.
$\implies \sum_{i=1}^n \lambda_i F_i$ has a ch.f. $\sum_{i=1}^n \lambda_i \varphi_i.$
This simply says that the mixture of distributions have the ch.f. which is also the mixture.
$$ \begin{align} &\varphi_{\sum_{i=1}^n\lambda_iF_i} = \int e^{itx} d\left(\sum_{i=1}^n\lambda_iF_i(x)\right) \\ & = \sum_{i=1}^n \lambda_i \int e^{itx} dF_n(x) = \sum_{i=1}^n\lambda_i\varphi_i. \end{align} $$
Using the lemma we can show still another property.
Let $X\sim F$ and $(-X)\sim F'$ be independent.
(i) Let $Y$ be a random variable with distribution $F_Y(x) = 0.5F(x) + 0.5F'(x).$ By the lemma, $\varphi_Y = \text{Re}\varphi_X.$
(ii) $\varphi_{X+X'} = \varphi_X\overline{\varphi_{X}} = |\varphi_X|^2.$
Here are some of the ch.f.s of well-known distributions. exx.
(i) (Poisson) $X \sim \mathcal{P}(\lambda).$ $\varphi_X(t) = e^{\lambda (e^{it}-1)}.$
(ii) (Normal) $X\sim \mathcal{N}(0,1).$ $\varphi_X(t) = e^{-t^2/2}.$
(iii) (Exponential) $X\sim\text{Exp}(1).$ $\varphi_X(t) = \frac{1}{1-it}.$
(iv) (Double exponential) $X\sim\text{DE}(1).$ $\varphi_X(t) = \frac{1}{1+t^2}.$
Note that the ch.f. of $\text{DE}(1)$ is a density of the Cauchy distribution.
For a random variable that has a density with respect to the Lebesgue measure, Characteristic function is in fact the Fourier transform of the distribution function of $X.$ So we can naturally deduce that a proper inversion might fully recover the distribution from ch.f.
Inversion formula
Notice that it is not $\int_{-\infty}^\infty$ but $\lim_T \int_{-T}^T.$ We will find out why during the proof.
expand proof
Let $I_T$ be the integrand of the left hand side. $$ \begin{align} I_T &= \int_{-T}^T \int \frac{e^{-ita}-e^{-itb}}{it} e^{itx} d\mu(x) dt \\ &= \int \int_{-T}^T \frac{1}{it}\left( e^{it(x-a)} - e^{it(x-b)} \right) dt d\mu(x) \\ &= \int \int_{-T}^T \frac{1}{t} \left( \sin t(x-a) = \sin t(x-b) \right) dt d\mu(x) \\ &= \int R(x-a, T) - R(x-b, T) d\mu(x) \end{align} $$ where $R(\theta,T) = \int_{-T}^T\frac{\sin \theta t}{t} dt.$ The first equality comes from Fubini's theorem since $$ \frac{e^{-ita}-e^{-itb}}{it} = \int_a^b e^{-itx}dx \le b-a $$ so it is integrable. Now observe that if we let $S(T)=\int_0^T \frac{\sin x}{x}dx,$ $$ R(\theta,T)=\begin{cases} \int_{-T\theta}^{T\theta} \frac{\sin x}{x} dx = 2\int_0^{T\theta} \frac{\sin x}{x}dx = 2S(T\theta) &,~ \theta\ge0 \\ -R(-\theta,T) = -2S(-T\theta) &,~ \theta < 0 \end{cases} $$ We get $R(\theta,T) = 2\text{sgn}(\theta)S(T|\theta|).$ By the result of exercise 1.7.5 of the textbook, $S(T) \to \frac{\pi}{2}$ as $T\to\infty.$ Thus $$ R(x-a,T)-T(x-b,T) \to \begin{cases} 2\pi &,~ a<x<b \\ \pi &,~ x=a \text{ or } x=b \\ 0 &,~ x<a \text{ or } x>b \\ \end{cases} $$ Since $|R(\theta,T)| \le 2\sup_y\int_0^y\frac{\sin t}{t}dt < \infty,$ with the help of BCT we get $$ \begin{align} &\lim_T \frac{1}{2\pi} \int_{-T}^T \frac{e^{-ita}-e^{-itb}}{it} \varphi(t) dt \\ &= \lim_T \frac{1}{2\pi} \int R(x-a,T) - R(x-b,T) d\mu(x) \\ &= \frac{1}{2\pi} \int \lim_T \left( R(x-a,T)-R(x-b,T) \right) d\mu(x) \\ &= \frac{1}{2\pi} \left( 2\pi\mu(a,b) + \pi\mu\{a,b\} \right) \\ &= \mu(a,b) + \frac{1}{2}\mu\{a,b\}. \end{align} $$
By inversion formula, $\varphi_X$ is real if and only if $X$ is symmetric.
If we use $\int_{-\infty}^\infty$ instead then $\int_{-\infty}^\infty \frac{\sin \theta t}{t} dt$ would not be integrable. If we have $\int |\varphi(t)| dt < \infty,$ then it is possible to use $\int_{-\infty}^\infty.$ The next theorem is about this case.
Observe that it is exactly the same to the inverse Fourier transform. In fact, $\mu$ is absolutely continuous to the Lebesgue measure and $f=\frac{d\mu}{d\lambda}.$
expand proof
Using the above and Fubini's theorem we get $$ \begin{align} \mu(x,x+h) &= \frac{1}{2\pi} \int_{-\infty}^\infty \frac{e^{-itx}-e^{-it(x+h)}}{it} \varphi(t)dt \\ &= \frac{1}{2\pi} \int_{-\infty}^\infty \int_x^{x+h} e^{-ity} dy \varphi(t)dt \\ &= \int_x^{x+h} \frac{1}{2\pi} \int_{-\infty}^\infty e^{-ity} \varphi(t) dt dy \end{align} $$ so $\frac{1}{2\pi} \int e^{-ity} \varphi(t) dt$ is a density of $\mu.$
$|f(y)| \le \frac{1}{2\pi} \int |\varphi(t)|dt < \infty$ and $f$ is bounded.
$|f(y+h) - f(y)| \le \frac{1}{2\pi} \int |e^{-ity}(e^{-ith}-1)\varphi(t)|dt \le \frac{1}{\pi} \int |\varphi(t)|dt < \infty.$ By DCT $|f(y+h) - f(y)|\to0$ as $h\to0$ and $f$ is continuous.
The formula indicates that if the ch.f. is integrable, then the distribution has a continuous density and it can be directly calculated. As an example of the inversion, remember the ch.f. of double exponential being a density of Cauchy distribution? In fact, Cauchy distribution has the ch.f. that is the same to a density of double exponential distribution.
Continuity theorem
We now relate pointwise convergence of the characteristic function to weak convergence of distributions.
(i) $\mu_n \overset{w}{\to} \mu_\infty$ $\implies \varphi_n \to \varphi_\infty.$
(ii) If there exists $\varphi$ such that $\varphi_n \to \varphi$ and $\varphi$ is continuous at $0,$ then $\mu_n \overset{w}{\to} \mu$ where $\mu$ is a probability measure with $\varphi$ as its ch.f.
expand proof
(i) Trivial since $e^{itx}$ is bounded and continuous.
(ii) First, show that $(\mu_n)$ is uniformly tight. Since $\varphi$ is continuous at $0,$ given $\epsilon>0,$ there exists $\delta>0$ such that $\frac{1}{\delta} \int_0^\delta (1-\varphi(t)) dt = \int 2(1-\frac{\sin \delta x}{\delta x}) d\mu(x) < \epsilon.$ So for large $n,$ $\frac{1}{\delta} (1-\varphi_n(t)) dt = \int 2(1-\frac{\sin \delta x}{\delta x}) d\mu_n(x) < 2\epsilon.$ The left hand side is bounded below by $$ \begin{align} &\int 2 (1-\frac{\sin \delta x}{\delta x}) d\mu_n(x) \\ &\ge \int 2(1-\frac{1}{\delta|x|}) d\mu_n(x) \\ &\ge \int_{x:\frac{2}{\delta|x|}\le 1} 2(1-\frac{1}{\delta|x|}) d\mu_n(x) \\ &\ge \int_{x:\frac{2}{\delta|x|}\le 1} 1 d\mu_n(x). \end{align} $$ So $1-F_n(2/\delta)+F_n(-2/\delta) < 2\epsilon$ and $\mu_n$ is uniformly tight.
By tightness theorem, there exists a subsequence $\mu_{n_k}$ that converges weakly to some probability measure $\mu.$ We need to show that every subsequence has a further subsequence that weakly converges to the same $\mu.$ Given a subsequence $(\mu_{n_k}),$ again by tightness theorem there exists a further subsequence that converges weakly to some probability measure $\nu.$ By (i), $\varphi_{n_k} \to \varphi_\nu$ and since $\varphi_n \to \varphi,$ $\phi_\nu$ should be identical to $\varphi.$ Now given a bounded continuous function $f$, $(\int f d\mu_n)$ is a sequence on $\mathbb{R}.$ By the last result every subsequence of it has a further subsequence that converges to $\int f d\mu,$ so $\mu_n \overset{w}{\to} \mu.$
Moment generating property
The result during the proof of the continuity theorem implies that smoothness of ch.f. $\varphi$ is related to the tail probability of underlying measure $\mu.$ To be specific, for $\delta>0,$ the inequality holds: \(\mu(\{x:~ |x|>\frac{2}{\delta}\}) \le \frac{1}{\delta} \int_{-\delta}^\delta (1-\varphi(t))dt.\)
This leads to another relationship between the ch.f. and its distribution: differentiability and derivative of the characteristic function is directly related to moments of its distribution.
The obvious corollary is that $\varphi^{(n)}(0) = i^nEX^n$ for $1\le n \le p$ if $X\in L^p(\mu).$ The next theorem implies the converse in $L^2$ case.
expand proof
For necessity of Fubini's theorem see $\frac{e^{ihx} + e^{-ihx} - 2}{h^2} = \frac{2(\cos hx -1)}{h^2} \le 0.$ In addition, $\lim_{h\to0}\frac{2(\cos hx -1)}{h^2} = \lim_{h\to0} -x^2\cos hx = -x^2.$ $$ \begin{align} EX^2 &= \int x^2 dF(x) \\ &\le \liminf\limits_{h\downarrow0} \int \frac{2(1- \cos hx)}{h^2} dF(x) \\ &= -\limsup\limits_{h\downarrow0} \frac{\varphi(h)+\varphi(-h)-2\varphi(0)}{h^2} \\ &< \infty. \end{align} $$ The first equality is from the Fubini's theorem and the first inequality is from the Fatou's lemma.
Estimation of the error term
Before ending the section, I would like to cover the estimation of ch.f.s with Taylor series about 0. This will be used when proving the central limit theorem (the next section) and the existence of canonical representation of infinitely divisible distributions (section 3.9).
expand proof
Let $R_n(x) = e^{ix} - \sum_{m=0}^n \frac{(ix)^m}{m!}.$ We will prove by induction. For $n=0,$ $R_0(x) = e^{ix}-1 = \int_0^x ie^{iy}dy.$ Thus $$ |R_0(x)| = \begin{cases} |e^{ix}-1| \le |e^{ix}| + 1 = 2 \\ |\int_0^x ie^{iy} dy| \le \int_0^x |ie^{iy}| dy = x \end{cases} $$ Now suppose $|R_n(x)| \le \frac{|x|^{n+1}}{(n+1)!} \wedge \frac{2|x|^n}{n!}.$ $R_{n+1} = R_n(x) - \frac{(ix)^{n+1}}{(n+1)!} = \int_0^x iR_n(y)dy.$ This is from term-by-term integration of $iR_n.$ $$ \begin{align} |R_{n+1}| &\le \int_0^x |iR_n(y)| dy \\ &\le \int_0^x (\frac{|y|^{n+1}}{(n+1)!} \wedge \frac{2|y|^n}{n!}) dy \\ &= \frac{|x|^{n+2}}{(n+2)!} \wedge \frac{2|x|^{n+1}}{(n+1)!}. \end{align} $$
Using this, we can reduce the order of error to 2 (from 3) for second-order Taylor series. We write $g(x) = o(f(x))$ if $g(x)/f(x) \to 0$ as $x\to0.$
expand proof
By the previous theorem we get $$ \begin{align} |ER_2(tX)| \le E|R_2(tX)| \\ \le E(\frac{|tX|^3}{3!} \wedge \frac{2|tX|^2}{2!}) \\ \le t^2 E(|t||X|^3 \wedge X^2). \end{align} $$ As $t\to0,$ $|t||X|^3\wedge X^2 \to 0$ and $|t||X|^3\wedge X^2 \le X^2$ with $EX^2 <\infty.$ By DCT, $ER_2(tX)/t^2 \to 0$ as $t\to0$ so $ER_2(tX) = o(t^2).$
Acknowledgement
This post series is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).