4.1 Procedure

Denote the quantity of interest (parameter or functional) to be \(\theta\). This parameter will be assumed to take place in a space \(D\).

Suppose that, because of previous experience, we think that \(\theta\) could only be in \(D_1\) or \(D_2\) (but not both), where \(D_0 \cap D_1 = \emptyset\) and \(D_0 \cup D_1 = D\).

The components of our test would be:

  1. \(H_0 : \theta \in D_0\) versus \(H_1: \theta \in D_1\)
  2. Sample: \(X_1, \dots, X_n\)
  3. Rejection (Critical) region \(C\): determines the decision rule as follows:
    • Reject \(H_0\) if \((X_1, \dots, X_n) \in C\)
    • Reject \(H_1\) if \((X_1, \dots, X_n) \not\in C\)

For \(i=0,1\), if \(D_i = \{ \theta_i \}\) (that is, \(D_i\) is a singleton), then \(H_i\) is called a simple hypothesis. A hypothesis is composite if it is not simple. So, we could have cases where one of the hypotheses is simple and the other is composite, both are simple, or both are composite.

Due to probabilistic nature of the procedure, there are a few scenarios

\(H_0\) is true \(H_1\) is true
Reject \(H_0\) Type I error (false positive) Correct decision
Reject \(H_1\) Correct decision Type II error (false negative)

One of the working assumption of science is that all models are wrong and only approximations of reality. So, the goal of science is to try to reject the hypothesis \(H_0\). The harder it is to reject \(H_0\), the better of a theory as an approximation of reality.

However, there is a conundrum the two types of errors will happen and it is not possible to minimize both at the same time.

Example 4.1 Let \(C = \emptyset\). Then the probability for Type I error is \(0\) as one will never reject \(H_0\). However, if it turns out that \(H_1\) is true, then the probability for Type II error would be \(1\).

Often, we consider a false positive to be worse than a false negative (imagine a medical test says that you don’t have a sickness while you do). So, we want to do the following:

  1. Choose small probability \(\alpha\) and find reasonable critical regions that make the probability for Type I error be bounded by \(\alpha\).

  2. Among these critical regions, minimize the probability for Type II error.

Note that we have \[ 1 - \mathbb{P}_\theta ( \text{Type II error}) = \mathbb{P}_\theta \left[ (X_1, \dots, X_n) \in C \right].\]

This inspires the following definitions

Definition 4.1 (size (significance level) of critical region) We say that a critical region \(C\) is of size (or significance level) \(\alpha\) if \[ \alpha = \sup_{\theta \in D_0} \mathbb{P}_\theta \left[ (X_1, \dots, X_n) \in C \right].\]

Definition 4.2 (Power function of critical region) A power function of a critical region \(C\) is a function \(\gamma_C: D_1 \to [0,1]\) so that \[ \gamma_C(\theta) = \mathbb{P}_\theta \left[ (X_1, \dots, X_n) \in C \right].\]

Remark. From the above discussion, the quality of a hypothesis test is really determined by choosing the right critical region \(C\). So a test is better than another test when the critical region of it is better than the critical region of the other one. We, thus, need to be able to compare critical regions.

Definition 4.3 Given two critical regions \(C_1\) and \(C_2\) of size \(\alpha\), \(C_1\) is better than \(C_2\) (denoted by \(C_1 \succeq C_2\)) if \[\gamma_{C_1}(\theta) \geq \gamma_{C_2}(\theta)\,, \forall \theta \in D_1.\]

Note that, not any pair of critical regions are comparable.