Chapter 3 Statistical Estimation

  • Reading: Wasserman Chapter 6.

Statistical inference, often rebranded as learning in computer science, is the process of figuring out certain information of a distribution function \(F\) given sample \(X_1, \dots, X_n \sim F\).

Typically, we don’t know which distribution function our sample comes from. However, sometimes, with some background theory (or simply just to make life easier), we may assume that the data come from certain family of distributions so that we can narrow our search. This gives rise to the following definitions.

Definition 3.1 A statistical model \(\mathcal{F}\) is a set of distributions (or densities).

A parametric model is a set set \(\mathcal{F}\) that can be parametrized by a finite number of parameters.

A non-parametric model is a statistical model that is not parametric.

Example 3.1

  1. The set of Gaussians is a two parameter models: \[ \mathcal{F} = \left\{ f(x; \mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} \exp\left\{ -\frac{(x-\mu)^2}{2\sigma^2} \right\}, \mu \in \mathbb{R}, \sigma > 0 \right\}. \]

  2. The set of Bernoulli distributions is a set of one parameter model: \[ \mathcal{F} = \left\{ \mathbb{P}(X = 1) = p, \mathbb{P}(X = 0) = 1 -p, 0\leq p \leq 1 \right\}.\]

  3. Generally, a parametric model has the following form \[\mathcal{F} = \left\{ f(x;\theta) : \theta \in \Theta \right\} ,\] where \(\Theta\) is some parameter space.

Notation. Given a parametric model \(\mathcal{F} = \left\{ f(x;\theta) : \theta \in \Theta \right\} ,\) we denote \[ \mathbb{P}(X \in A ) = \int_A f(x;\theta) \, dx\] and \[ \mathbb{E}_\theta ( r(X)) = \int r(x) f(x;\theta) \, dx \,.\]