## 3.2 Confidence set

In elementary statistics, given sample \(X_1, \dots, X_n\), we define confidence interval with significance level \(\alpha\) to be the interval \((a,b)\) such that \(\mathbb{P}_\theta( \theta \in (a,b) ) \geq 1 - \alpha\).

Note that \((a,b)\) depends on your sample, i.e., \(a = a(X_1, \dots, X_n), b = b(X_1, \dots, X_n)\).

It must be stressed that \(\theta\) is fixed and \((a,b)\) is random.

For higher dimension / different kinds of data, the notion of confidence interval is replaced by the notion of confidence set.

**Definition 3.6 **Given sample \(X_1, \dots, X_n\).
A *confidence set* associated with significance level \(\alpha\) is the set (random) \(C_n\) (depending on the sample)
such that
\[ \mathbb{P}_\theta(\theta \in C_n) \geq 1 - \alpha. \]

Confidence set is not a probability statement about the parameter \(\theta\). It is rather a statement about the uncertainty of your data.

**Example 3.1 (Example 6.14 in Wasserman) **Let \(\theta \in \mathbb{R}\). Let \(X_1, X_2\) RVs coming from the distribution
\(\mathbb{P}(X_i = 1) = \mathbb{P}(X_i = -1) = 1/2\).
Suppose \(Y_i = \theta + X_i\) are your observed data.
Define
\[ C = \begin{cases}
\{ Y_1 - 1\} & Y_1 = Y_2 \,, \\
\{ (Y_1 + Y_2)/2 \} & Y_1 \not= Y_2 \,.
\end{cases}\]

For all \(\theta\in \mathbb{R}\), \(\mathbb{P}_\theta(\theta \in C ) = 3/4\).

Suppose we get \(Y_1 = 9\), \(Y_2 = 11\), \(C = \{ 10 \}\). Then, for sure, \(\theta = 10\). Therefore, \(\mathbb{P}(\theta \in C | Y_1, Y_2) = 1\).

**Exercise 3.3 **Recall Hoeffding’s inequality
\[ \mathbb{P}\left( \left| \frac{1}{n}\sum_{i=1}^n X_i \right| \geq t \right)
\leq 2 \exp\left( - \frac{2 n^2 t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right) \]
for \(X_i \in [a_i, b_i]\) and \(\mathbb{E}X_i = 0\).

Apply this to the Bernoulli parametric model \[\mathcal{F} = \left\{ \mathbb{P}(X= 1) = p, \mathbb{P}(X = 0) = 1-p; p \in [0,1] \right\}.\]

**Question:** Suppose our sample comes from a Bernoulli distribution.
What is a confidence interval that gives significance level \(\alpha\)?

Try with two approaches: Hoeffding and Chebyshev.

**Exercise 3.4 **Let \(X_1, \ldots, X_n \sim \operatorname{Bernoulli}(p)\) and let \(\widehat{p}_n=n^{-1} \sum_{i=1}^n X_i\).

Compute \(\mathbb{V}( X_i)\) and \(\mathbb{V}(\hat p_n)\)

Suppose we don’t know \(\mathbb{V}(\hat p_n)\), so we use an estimator of this quantity \[\widehat{\mathrm{se}}^2 = {\widehat{p}_n\left(1-\widehat{p}_n\right) / n} \,.\] Convince yourself that, by the Central Limit Theorem, \(\widehat{p}_n \approx N\left(p, \widehat{\operatorname{se}}^2\right)\).

Find the confidence interval for the significance level \(\alpha\).

Compare this with the confidence interval in the previous exercise. You should see that the Normal-based interval is shorter but it only has approximately (when sample size is large ) correct coverage.