Do you think the results of the consumer group’s survey have practical importance?
Do you think the results of the consumer group’s survey have practical importance? Explain your opinion.
9.5: Type II Error Probabilities and Sample Size Determination (Optional)
Chapters 9 and 11
As we have seen, we usually take action (for example, advertise a claim) on the basis of having rejected the null hypothesis. In this case, we know the chances that the action has been taken erroneously because we have prespecified α, the probability of rejecting a true null hypothesis. However, sometimes we must act (for example, use a day’s production of camshafts to make V6 engines) on the basis of not rejecting the null hypothesis. If we must do this, it is best to know the probability of not rejecting a false null hypothesis (a Type II error). If this probability is not small enough, we may change the hypothesis testing procedure. In order to discuss this further, we must first see how to compute the probability of a Type II error.
As an example, the Federal Trade Commission (FTC) often tests claims that companies make about their products. Suppose coffee is being sold in cans that are labeled as containing three pounds, and also suppose that the FTC wishes to determine if the mean amount of coffee μ in all such cans is at least three pounds. To do this, theFTC tests H0: μ ≥ 3 (or μ = 3) versus Ha: μ < 3 by setting α = .05. Suppose that a sample of 35 coffee cans yields . Assuming that σ equals .0147, we see that because
is not less than −z.05 = −1.645, we cannot reject H0: μ ≥ 3 by setting α = .05. Since we cannot reject H0, we cannot have committed a Type I error, which is the error of rejecting a true H0. However, we might have committed a Type II error, which is the error of not rejecting a false H0. Therefore, before we make a final conclusion about μ, we should calculate the probability of a Type II error.
A Type II error is not rejecting H0: μ ≥ 3 when H0 is false. Because any value of μ that is less than 3 makes H0 false, there is a different Type II error (and, therefore, a different Type II error probability) associated with each value of μ that is less than 3. In order to demonstrate how to calculate these probabilities, we will calculate the probability of not rejecting H0: μ ≥ 3 when in fact μ equals 2.995. This is the probability of failing to detect an average underfill of .005 pounds. For a fixed sample size (for example, n = 35 coffee can fills), the value of β, the probability of a Type II error, depends upon how we set α, the probability of a Type I error. Since we have set α = .05, we reject H0 if
or, equivalently, if
Therefore, we do not reject H0 if . It follows that β, the probability of not rejecting H0: μ ≥ 3 when μ equals 2.995, is
This calculation is illustrated in Figure 9.12. Similarly, it follows that β, the probability of not rejecting H0: μ ≥ 3 when μ equals 2.99, is
Figure 9.12: Calculating β When μ Equals 2.995
It also follows that β, the probability of not rejecting H0: μ ≥ 3 when μ equals 2.985, is
This probability is less than .00003 (because z is greater than 3.99).
In Figure 9.13 we illustrate the values of β that we have calculated. Notice that the closer an alternative value of μ is to 3 (the value specified by H0: μ = 3), the larger is the associated value of β. Although alternative values of μ that are closer to 3 have larger associated probabilities of Type II errors, these values of μ have associated Type II errors with less serious consequences. For example, we are more likely to not reject H0: μ = 3 when μ = 2.995 (β = .3557) than we are to not reject H0: μ = 3 when μ = 2.99 (β = .0087). However, not rejecting H0: μ = 3 when μ = 2.995, which means that we are failing to detect an average underfill of .005 pounds, is less serious than not rejecting H0: μ = 3 when μ = 2.99, which means that we are failing to detect a larger average underfill of .01 pounds. In order to decide whether a particular hypothesis test adequately controls the probability of a Type II error, we must determine which Type II errors are serious, and then we must decide whether the probabilities of these errors are small enough. For example, suppose that the FTC and the coffee producer agree that failing to reject H0: μ = 3 when μ equals 2.99 is a serious error, but that failing to reject H0: μ = 3 when μ equals 2.995 is not a particularly serious error. Then, since the probability of not rejecting H0: μ = 3 when μ equals 2.99, which is .0087, is quite small, we might decide that the hypothesis test adequately controls the probability of a Type II error. To understand the implication of this, recall that the sample of 35 coffee cans, which has , does not provide enough evidence to reject H0: μ ≥ 3 by setting α = .05. We have just shown that the probability that we have failed to detect a serious underfill is quite small (.0087), so the FTC might decide that no action should be taken against the coffee producer. Of course, this decision should also be based on the variability of the fills of the individual cans. Because and σ = .0147, we estimate that 99.73 percent of all individual coffee can fills are contained in the interval If the FTC believes it is reasonable to accept fills as low as (but no lower than) 2.9532 pounds, this evidence also suggests that no action against the coffee producer is needed.
Figure 9.13: How β Changes as the Alternative Value of μ Changes
Suppose, instead, that the FTC and the coffee producer had agreed that failing to reject H0: μ ≥ 3 when μ equals 2.995 is a serious mistake. The probability of this Type II error, which is .3557, is large. Therefore, we might conclude that the hypothesis test is not adequately controlling the probability of a serious Type II error. In this case, we have two possible courses of action. First, we have previously said that, for a fixed sample size, the lower we set α, the higher is β, and the higher we set α, the lower is β. Therefore, if we keep the sample size fixed at n = 35 coffee cans, we can reduce β by increasing α. To demonstrate this, suppose we increase α to .10. In this case we reject H0 if
or, equivalently, if
Therefore, we do not reject H0 if It follows that β, the probability of not rejecting H0: μ ≥ 3 when μ equals 2.995, is
We thus see that increasing α from .05 to .10 reduces β from .3557 to .2327. However, β is still too large, and, besides, we might not be comfortable making α larger than .05. Therefore, if we wish to decrease β and maintain α at .05, we must increase the sample size. We will soon present a formula we can use to find the sample size needed to make both α and β as small as we wish.
Once we have computed β, we can calculate what we call the power of the test.
The power of a statistical test is the probability of rejecting the null hypothesis when it is false.
Just as β depends upon the alternative value of μ, so does the power of a test. In general, the power associated with a particular alternative value of μ equals 1 − β, where β is the probability of a Type II error associated with the same alternative value of μ. For example, we have seen that, when we set α = .05, the probability of not rejecting H0: μ ≥ 3 when μ equals 2.99 is .0087. Therefore, the power of the test associated with the alternative value 2.99 (that is, the probability of rejecting H0: μ ≥ 3 when μ equals 2.99) is 1 − .0087 = .9913.
Thus far we have demonstrated how to calculate β when testing a less than alternative hypothesis. In the following box we present (without proof) a method for calculating the probability of a Type II error when testing a less than, a greater than, or a not equal to alternative hypothesis: