3.6 Bayesian Approach

(Wasserman Chapter 11)

Please watch this great video by Philippe Rigollet (MIT) for the Bayesian approach: https://youtu.be/bFZ-0FH5hfs?si=IItsPqGD9g9kCC76

In short, we have that \[ f(\theta | X_1, \dots, X_n) \propto \mathcal{L}_n (\theta) f(\theta), \] where \(f(\theta | X_1, \dots, X_n)\) is a (believed) density distribution of the parameter \(\theta\) called the posterior, \(f(\theta)\) is a (believed) density distribution called the prior, and \(\mathcal{L}_n(\theta)\) is the likelihood function.

One can think about this as how we update belief about the certain truth from a prior belief after seeing the evidence. This point of view is crucial in science; it is the scientific method written in mathematical form.

We can then construct a Bayesian estimator by simply taking the expectation of the posterior:

Definition 3.9 (Bayes estimator: Posterior mean) Let \(\Theta\) be a RV from the posterior \(f(\theta | X_1, \dots, X_n)\). \[ \hat \theta_n = \mathbb{E}_{\theta|X}(\Theta) = \int \theta f(\theta | X_1, \dots, X_n) d\theta.\]

Note that this is one of the many candidates from Bayesian approach to talk about an estimator.

Exercise 3.11 Let \(X_1, \dots, X_n\) be sample from Bernoulli distribution \(\mathrm{Bernoulli}(p)\).

  1. Suppose that at the beginning we believe that \(p\) obeys \(\mathrm{Beta}(\alpha, \beta)\) (see Definition). What is the posterior distribution of \(p\) after knowing the above sample?

  2. Compute the Bayes estimator with the above posterior estimator. Compare this with MLE of \(p\).

  3. Suppose that at the beginning we believe that \(p\) obeys \(\mathrm{Uniform}([0,1])\). What is the posterior distribution of \(p\) after knowing the above sample? Compare this with part (1). Explain what you see.

Exercise 3.12 (Rice, 8.10.4) Suppose \(X\) is RV with the following distribution \[ \begin{aligned} &\mathbb{P}(X= 0) = \frac{2}{3}\theta \\ & \mathbb{P}(X=1)=\frac{1}{3} \theta \\ & \mathbb{P}(X=2)=\frac{2}{3}(1-\theta) \\ & \mathbb{P}(X=3)=\frac{1}{3}(1-\theta) \end{aligned} \] where \(0 \leq \theta \leq 1\) is a parameter. The following 10 independent observations were taken from such a distribution: \((3,0,2,1,3,2,1,0,2,1)\).

  1. Find the method of moments estimate of \(\theta\).
  2. Find an approximate standard error for your estimate.
  3. What is the maximum likelihood estimate of \(\theta\) ?
  4. What is an approximate standard error of the maximum likelihood estimate?
  5. If the prior distribution of \(\Theta\) is uniform on \([0,1]\), what is the posterior density? Plot it. What is the mode of the posterior?