1.2. Mean Estimation in the Binary Choice Problem

$\newcommand{\argmin}{\mathop{\mathrm{argmin}}\limits}$ $\newcommand{\argmax}{\mathop{\mathrm{argmax}}\limits}$

Another motivating example can be found in the estimation of mean of the binary choice model.

Mean estimation in the binary classification

Our model is from the (case 2) of the previous article.

\[P(Y=1|Z=z) = F_0(z),~ Z\sim Q,~ F_0\in\Lambda, \\ \Lambda := \{F:\mathbb R \to [0,1],~ F \text{ is increasing}\}.\]

This time the estimand is the population mean

\[\begin{aligned} \theta_0=\theta_{F_0} &:=\int F_0(z) dQ(z) = E F_0(Z) \\ &= P(Y=1) = EY. \end{aligned}\]

By definition, it is clear that $E\big[ Y_i - F_0(Z_i) \big] = 0.$

We will only consider the case where $Q$ is known. In this case, the MLE of $F_0$ becomes

\[\hat\theta_n = \theta_{\hat F_n} = \int \hat F_n(z) dQ(z),\]

where $\hat F_n$ is the MLE of $F_0.$

For $F \in \Lambda,$ define

\[\theta_F = \int F(z) dQ(z),\]

then by the classical central limit theorem, we get

\[\frac 1 {\sqrt n} \sum_{i=1}^n \big( F(Z_i) - \theta_F \big) \overset d \to \mathcal{N}(0, \text{Var}(F(Z))).\]

Again, if we prove some form of functional CLT for all $F\in\Lambda,$ then together with the result form the classical CLT, we get the asymptotic distribution of the MLE.

$$ \frac 1 {\sqrt n} \sum_{i=1}^n \big( \hat F_n(Z_i) - \hat\theta_n \big) = \frac 1 {\sqrt n} \sum_{i=1}^n \big( F_0(Z_i) - \theta_0 \big) + o_P(1) \\ \implies \sqrt n (\hat\theta_n - \theta_0) \overset d \to \mathcal{N}\left( 0, \int F_0(z)(1-F_0(z)) dQ(z) \right). $$

We will take the fact $\frac1n \sum_{i=1}^n \hat F_n(Z_i)=\bar Y$ for granted. By this fact, $$ \begin{aligned} \sqrt n (\hat\theta_n - \theta_0) &= \sqrt n (\bar Y - \theta_0) - \sqrt n (\bar Y - \hat\theta_n) \\ &= \sqrt n (\bar Y - \theta_0) - \frac 1 {\sqrt n} \sum_{i=1}^n \big( \hat F_n(Z_i) - \hat\theta_n \big) \\ &= \sqrt n (\bar Y - \theta_0) - \frac 1 {\sqrt n} \sum_{i=1}^n \big( F_0(Z_i) - \theta_0 \big) + o_P(1) \\ &= \frac 1 {\sqrt n} \sum_{i=1}^n \big( Y_i - \cancel{\theta_0} - F_0(Z_i) + \cancel{\theta_0} \big) + o_P(1). \end{aligned} $$ By the classical CLT, $$ \begin{aligned} \frac 1 {\sqrt n} \sum_{i=1}^n \big( Y_i - F_0(Z_i) \big) \overset d \to \mathcal{N}\left( 0, \text{Var}(Y-F_0(Z)) \right). \end{aligned} $$ The variance can be rewritten as $$ \begin{aligned} \text{Var}(Y-F_0(Z)) &= E[Y-F_0(Z)]^2 \\ &= E[Y^2 - 2YF_0(Z) + F_0^2(Z)] \\ &= E[Y^2 - F_0^2(Z)] \\ &= EY - EF_0^2(Z) \\ &= EF_0(Z) - EF_0^2(Z). \end{aligned} $$ Hence the result follows from the Slutsky's theorem.


  • van de Geer. 2000. Empirical Processes in M-estimation. Cambridge University Press.
  • Theory of Statistics II (Fall, 2020) @ Seoul National University, Republic of Korea (instructor: Prof. Jaeyong Lee).