1.1. Basics on measure theory

$\newcommand{\argmin}{\mathop{\mathrm{argmin}}\limits}$ $\newcommand{\argmax}{\mathop{\mathrm{argmax}}\limits}$

The first year as an M.S. student in Statistics at SNU was the time spent for learning theoretical foundations of statistics. The probability theory was certainly the most emphasized subject of all. I would like to take this vacation as an opportunity to review the course on probability theory. Most of the content is from the book Probability: Theory and Examples, 5th edition (Durrett, 2019), while some others are borrowed from the lecture note and personal communications with colleagues.

The first chapter is devoting to the basics of measure theory. A probability space is defined and essential theorems are introduced.

Probability space

A probability space is a special kind of a measure space equipped with a positive finite measure. A measure space is defined as a triplet: a set, a $\sigma$-field attached to that set, and a measure function. We will define each of the components.

For a set $\Omega$, $\mathcal{F}$, a non-empty collection of subsets of $\Omega$, is called a $\sigma$-field(or a $\sigma$-algebra) of $\Omega$, if the following conditions are satisfied.
(i) If $A\in \mathcal{F}$, then $A^c \in \mathcal{F}.$
(ii) If $A_i\in\mathcal{F},~ i=1,2,\cdots$, then $\cup_{i=1}^\infty A_i \in \mathcal{F}.$

A $\sigma$-field is basically a set of sets. The first condition states that it is closed under complement and the second one states that it is closed under countable unions.

We next define a function that measures sizes of sets inside the $\sigma$-field.

$\mu: \mathcal{F} \to \mathbb{R}$ is a measure, if
(i) $\mu(A) \ge \mu(\phi) = 0,~ \forall A \in \mathcal{F}.$
(ii) $A_i \in \mathcal{F},~ i=1,2,\cdots$ are disjoint. Then $\mu(\cup_{i=1}^\infty A_i) = \sum_{i=1}^\infty \mu(A_i).$

The second condition is sometimes referred to as $\sigma$ sub-additivity (countable sub-additivity). This is natural if we think of a common notion of a measure: if it is empty, its “size” should be zero and if we add one with another, the resulting size should be the sum. We call $\mu(A)$ the measure of a set $A$.

If there exists a sequence of sets ${A_n} \subset \mathcal{F}$ such that $\mu(A_n) < \infty$ for all $n$ and $\cup_{n=1}^\infty A_n = \Omega$, then $\mu$ is called a $\sigma$-finite measure. If the measure of the whole set $\mu(\Omega)$ is finite, we call $\mu$ a finite measure. If $\mu(\Omega) = 1$ in addition, then we call this a probability measure (PM for abbreviation in the following post series). Most of the times, we name a PM with alphabet $P$ or $Q$.

Finally we can define a measure space and in addition a probability space.

$(\Omega, \mathcal{F})$: a pair of a set and its $\sigma$-field, is called a measurable space. A set $A \in \mathcal{F}$ is called a ($\mathcal{ F}$-)measurable set. $(\Omega, \mathcal{F}, P)$: a measureable space equipped with a measure is a measure space. If $P$ is a probability measure, we call this a probability space, $\Omega$ a sample space, and an element of $\mathcal{F}$ an event.

If one is familiar with topology, the definition of $\sigma$-field might also be quite familiar. In fact, Borel $\sigma$-field connects a topological space with a corresponding measurable space.

$\mathcal{B}(\Omega) = \cap_{\tau\subset\mathcal{F}} \mathcal{F}$ is the Borel $\sigma$-field of $\Omega$, where $\tau$ is a topology on $\Omega$. An element of $\mathcal{B}(\Omega)$ is called a Borel set.

$\mathcal{B}(\Omega)$ is the smallest $\sigma$-field that contains all open sets of $\Omega.$ An important property of $\sigma$-fields related to the definition of Borel fields is that arbitrary intersections of $\sigma$-fields is a $\sigma$-algebra. (This comes directly from the definition.)

For example of a Borel $\sigma$-field and a measure space, consider $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ and the Lebesgue measure $\lambda$. It is not difficult to know that $\mathcal{B}(\mathbb{R})$ is consists of intersections and unions of sets of the form $(a,b),~(a,b],~[a,b)$ or $[a,b]$ where $a,b\in\mathbb{R}$. The Lebesgue measure measures their “size” as $\lambda(a,b]=b-a$.

Probability measures

Let’s take a deeper look into the properties of measure and probability measures.

Let $P: \mathcal{F} \to [0,1]$ be a probability measure. Then the following properties hold.
(i) (monotonicity) $A\subset B \implies P(A) \le P(B).$
(ii) ($\sigma$ sub-additivity) $A \subset \cup_{i=1}^\infty A_i \implies P(A) \sum_{i=1}^\infty P(A_i).$
(iii) (continuity from below) $A_1 \subset A_2 \subset \cdots,~ \bigcup_{i=1}^\infty A_i = A \implies \lim_n P(A_n) = P(A).$
(iv) (continuity from above) $B_1 \supset B_2 \supset \cdots,~ \bigcap_{i=1}^\infty B_i = B \implies \lim_n P(B_n) = P(B).$
expand proof

(i) $B = B-A + A$, $B \ne \phi$. Thus $P(B) = P(B-A) + P(A) \ge P(A).$
(ii) Let $A_n' = A_n \cap A$, $B_n = A_n' \setminus \cup_{i=1}^{n-1}A_i$. Then $B_n$'s are disjoint and $\cup_{n=1}^\infty B_n = A$, $B_n \subset A_n$. Hence $P(A) = \sum_{n=1}^\infty P(B_n) \le \sum_{n=1}^\infty P(A_n).$
(iii) Let $B_n = A_n \setminus A_{n-1},~ A_0=\phi$ so that $B_n$'s be disjoint. Then $$\begin{align} P(\cup_n A_n) &= P(\cup_n B_n) = \sum_n P(B_n) \\ &= \lim_n P(\cup_{i=1}^n B_i) \\ &= \lim_n P(\cup_{i=1}^n A_i) \\ &= \lim_n P(A_n) \end{align}$$
(iv) Let $B_n' = B_1 \setminus B_n$ and use (iii).

Characterization of a probability measure

Until now, we defined measures only on $\sigma$-fields. Measures in other set of subsets can be defined similarly. Furthermore, we can characterize each measure as an extension of the function similar to the measure defiend above. For this, we first define collections of sets that can be viewed as generalizations of $\sigma$-fields.

A collection of sets $\mathcal{S}$ is a semi-algebra, if
(i) For all $S \in \mathcal{S}$, $S^c$ is a finite disjoint unions of $S_i \in \mathcal{S}.$
(ii) $S,T\in\mathcal{S} \implies S\cap T \in \mathcal{S}.$

It is not necessary for a semi-algebra to contain $\phi$. However in many cases it is convenient to make it do so.

A collection of sets $\mathcal{A}$ is an algebra, if
(i) $A \in \mathcal{A} \implies A^c \in \mathcal{A}.$
(ii) $S,T\in\mathcal{S} \implies S\cap T \in \mathcal{S}.$

Note that the strengthened condition on the definition of algrebra allows it to be closed on both the finite intersection and union.

For example, $\mathcal{S}_1 = \{\phi\} \cup \{(a,b]:~-\infty\le a<b\le \infty\}$ is a semi-algebra on $\mathbb{R}$. $\mathcal{A} = \{A\in\mathbb{Z}:~ A \text{ or } A^c \text{ is finite}\}$ is an algebra on $\mathbb{Z}$. They are trivial so I will leave it as exercises.

An algebra can be generated by semi-algebra. $\overline{\mathcal{S}} := \{\text{finite disjoint unions of sets in }\mathcal{S}\}$ is an algebra generated by a semi-algebra $\mathcal{S}$. Sometimes we call $\mathcal{S}$ a generator of $\overline{\mathcal{S}}$. Like Borel $\sigma$-field, it is the smallest algebra that contains $\mathcal{S}$.

Similarly, a $\sigma$-field can be generated by (semi)algebra. $\sigma(\mathcal{S}) = \sigma(\overline{\mathcal{S}})$ is the smallest $\sigma$-field that contains $\mathcal{S}$.

Now we define “measures” on these structures.

$\mu:\mathcal{A} \to \mathbb{R}^+\cup\{0\}$ is a measure on an algebra $\mathcal{A}$ if
(i) $\mu(A) \ge \mu(\phi) = 0,~ \forall A \in \mathcal{A}.$
(ii) $A_i\in A,~ i=1,2,\cdots$ are disjoint and $A=\cup_{i=1}^\infty A_i \in \mathcal{A}$, then $\mu(\cup_{i=1}^\infty A_i) = \sum_{i=1}^\infty \mu(A_i).$

$\sigma$-finiteness is defined as in the $\sigma$-field case.

We can define similar functions in semi-algebra. Let $\mu: \mathcal{S} \to \mathbb{R}^+ \cup {0}$ be a function on a semi-algebra $\mathcal{S}$ that satisfies
(i) $\mu(\phi)=0 \text{ and } \mu(S) \ge 0,~ \forall S \in \mathcal{S}.$
(ii) $S_i \in \mathcal{S},~ i=1,\cdots,n \implies \mu(\cup_{i=1}^n S_i) = \sum_{i=1}^n \mu(S_i).$
(iii) $S_i \in \mathcal{S},~ i=1,2,\cdots \implies \mu(\cup_{i=1}^\infty S_i) \le \sum_{i=1}^\infty \mu(S_i).$
I will call such functions semi-measures1.

The following theorem states that a semi-measure can be uniquely extended to a measure on algebra. If the extended measure on algebra is $\sigma$-finite, it can be further extended to a measure on $\sigma$-field.

$\mathcal{S}$: a semi-algebra with $\phi\in\mathcal{S}$.
$\mu: \mathcal{S} \to \mathbb{R}^+\cup\{0\}$ is a semi-measure.
$\Rightarrow \exists!$ a positive measure $\overline{\mu}$ in $\overline{\mathcal{S}}$ that is an extension of $\mu$.
In addition, if $\overline{\mu}$ is $\sigma$-finite, $\exists!$ a measure $\nu$ on $\sigma(\mathcal{S})$ that is an extension of $\overline{\mu}$.

Our major interest is in probability measure on $\mathbb{R}$. In undergraduate statistics, we learned that (cumulative) distribution functions uniquely determine probability distributions while densities cannot. Caratheodory’s extension theorem leads us the that conclution.

We say a real-valued function on $\mathbb{R}$ is a Stieltjes measure function if it is non-decreasing and right-continuous.

$F$ is a Stieltjest measure function. $\Rightarrow \exists!$ a measure $\mu$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ such that $\mu(a,b] = F(b)-F(a)$.

Let $\mathcal{S} = \{0\} \cup \{(a,b]:~ -\infty\le a < b \le \infty\}$ and $\nu(\phi)=0$, $\nu(a,b] = F(b) - F(a)$. Then $\mathcal{S}$ is a semi-algebra and $\nu$ is a semi-measure. Let $S_n = (-n, n] \in \overline{\mathcal{S}}$, then it is easy to show $\sigma$-finiteness. By Caratheodory's theorem, there is a unique extension of $\nu$.

Since a distribution function $F$ is a special case of Stieltjes measure function, it follows directly from the theorem that $F$ uniquely determines a probability measure2.

On $\mathbb{R}^d,~ d>1$, we define functions similar to Stieltjes measure function: $F:\mathbb{R}^d \to \mathbb{R}^+ \cup {0}$ such that $F$ is non-decreasing, right-continuous and $\Delta_A F \ge 0$ for all $A = (a_1,b_1] \times \cdots \times (a_d,b_d]$, where $\Delta_A F = \sum_{v\in V}\text{sgn}(v)F(v)$, $V=\{a_1, b_1\} \times \cdots \times \{a_d, b_d\}$. Similar to the above, such $F$ can be uniquely extended to a measure $\mu$ such that $\mu(A) = \Delta_A F$ forall finite rectable $A$.

Dynkin’s $\pi$-$\lambda$ theorem

I will end this subsection by stating the theorem that will be used throughout the course.

A collection of sets $\mathcal{P}$ is a $\pi$-system if $A, B \in \mathcal{P}$, then $A \cap B \in \mathcal{P}$.
A collection of sets $\mathcal{L}$ is a $\lambda$-system on $\Omega$ if the followings hold.
(i) $\Omega \in \mathcal{L}$
(ii) $A \in \mathcal{L} \Rightarrow A^c \in \mathcal{L}$
(iii) $A_i \in \mathcal{L}, i=1,2,\cdots$, where $A_i$'s are disjoint. $\Rightarrow$ $\uplus_{i=1}^\infty A_i \in \mathcal{L}$
$\mathcal{P}$ is a $\pi$-system and $\mathcal{L}$ is a $\lambda$-system. If $\: \mathcal{P} \subset \mathcal{L}$, then $\sigma(\mathcal{P}) \subset \mathcal{L}$.

The theorem implies in order to show that some property holds in a $\sigma$-field, we only need to prove that it holds in a $\lambda$-system and the generator $\pi$-system of the $\sigma$-field is contained in our $\lambda$-system. A simple but useful corollary is about equivalent probability measures.

$\mu_1$, $\mu_2$ are probability measures on $(\Omega, \mathcal{F})$. $\mathcal{A} \subset \mathcal{F}$ is a $\pi$-system such that $\sigma(\mathcal{A}) = \mathcal{F}$. If $\mu_1(A) = \mu_2(A)$, $\forall A \in \mathcal{A}$, then $\mu_1 \overset{A \in \mathcal{F}}{\equiv} \mu_2$.
Let $\mathcal{L} = \{ B \in \mathcal{F} : \mu_1(B) = \mu_2(B) \}$, then by construction $\mathcal{A} \subset \mathcal{L}$ and it is clear that $\mathcal{L}$ is a $\lambda$-system. By $\pi-\lambda$ theorem, $\sigma(\mathcal{A}) = \mathcal{F} \subset \mathcal{L}$ leads to the desired results.

Other examples can be found here. We will get on it in this post series one at a time.


This post is based on the textbook Probability: Theory and Examples, 5th edition (Durrett, 2019) and the lecture at Seoul National University, Republic of Korea (instructor: Prof. Johan Lim).

  1. I named this “semi-measure” just for the convenience. This might be different to the actual definition of semi-measure. 

  2. Why does this mean that $F$ uniquely determines a probability distribution? Because probability distribution is defined as a probability measure generated by a special kind of functions. This will be discussed in subsection 1.2.