A metric derived from KL divergence
KL divergence is regarded as “a distance” between the two probability distributions. However, it is not a metric in mathematical sense, since $K(p_0, p_1) \neq K(p_1, p_0)$ in general.
» continue reading
Determination of random variables and random number generation
($\Rightarrow$) is trivial.
($\Leftarrow$) let $\Omega = [0, 1]$, $\mathcal{F} = \mathcal{B}\big( [0, 1] \big)$, $P=\lambda$ where $\lambda$ is a lebesgue measure. Define $X(\omega) := \sup\{y: F(y) < \omega\}$, then our claim is that X is a random variable that has $F$ as its distribution. To show the claim is true, we need to show $$P(X\leq x) = F(x) = P(\{\omega: 0\leq \omega \leq F(x)\})$$ which can be inferred by $\{\omega: X(\omega) \leq x\} = \{\omega: \omega \leq F(x)\}$, $\forall x$.
Given $x$, pick $\omega_0 \in \{\omega: \omega \leq F(x)\}$, since $F(x) \geq \omega_0$, $x \notin \{y: F(y) < \omega_0\}$. Therefore $X(\omega_0) \leq x$ and $\omega_0 \in \{\omega: X(\omega) \leq x\}$. $$\begin{equation} \therefore \: \{\omega: X(\omega) \leq x\} \supset \{\omega: \omega \leq F(x)\}, \forall x \end{equation}$$ Given $x$, pick $\omega_0 \notin \{\omega: \omega \leq F(x)\}$, then $\omega_0 > F(x)$. Since $F$ is right-continuous, $\exists\epsilon > 0$ such that $F(x) \leq F(x+\epsilon) < \omega_0$. $x+\epsilon \leq X(\omega_0)$ because $X$ is defined as $\sup$, and this gives $x < X(\omega_0)$ and thus $\omega_0 \notin \{\omega: X(\omega) \leq x\}$. $$\therefore \: \{\omega: X(\omega) \leq x\} \subset \{\omega: \omega \leq F(x)\}, \forall x$$ Hence the claim is true, and $$\begin{align} F(x) &= \lambda\big( [0, F(x)] \big) = P(\{\omega: \omega \leq F(x)\})\\ &= P(\{\omega: X(\omega) \leq x\}) = P(X \leq x) \end{align}$$
Borel-Cantelli lemmas are converses of each other
(2) If $A_n$'s are independent, $\sum\limits_{k=1}^\infty P(A_k) = \infty$, then $P(A_n \:\: i.o) = 1$
(2) \begin{align*} P(\bigcap\limits_{k \geq m}{A_k}^c) &= \prod\limits_{k \geq m}\big( 1-P({A_k}) \big) \\ &\leq \prod\limits_{k \geq m}e^{-P(A_k)} = e^{-\sum\limits_{k \geq m} P(A_k)} = 0, \:\: \forall m>0 \end{align*} $\therefore P(\bigcup\limits_{k \geq m}{A_k}) = 1$ and $P(\limsup\limits_n{A_n}) = P(A_n \:\: i.o.) = 1$.
$\therefore A \in \mathcal{T} \to P(A) = P(A \cap A) = P(A)P(A)$.
Borel-Cantelli lemmas are widely used to prove almost sure convergence/existence of limit points of random variables. e.g. By showing that $P(|X_n - X|>\epsilon)$ is summable for any given $\epsilon > 0$, one can easily check almost sure convergence from convergence in probability.
» continue reading
Limitation of $R^2$
Hence expectation of $R^2$ increases as the dimension of predictors increases, regardless of fit of the model.
» continue reading
Irregularity of almost sure convergence
Theorem 1 and 2 combined implies that almost sure convergence does not come from topology. In fact, while convergence in probability forms convergence class, a.s. convergence does not. This shows that a.s. convergense is actually not a “convergence” concept that we generally think of.
» continue reading