Hypothesis testing and the legal system
If the value of the test statistic z is far enough above 0, we reject H0 in favor of Ha. To see how large z must be in order to reject H0, we must understand that a hypothesis test rejects a null hypothesis H0 only if there is strong statistical evidence against H0. This is similar to our legal system, which rejects the innocence of the accused only if evidence of guilt is beyond a reasonable doubt. For instance, the network will reject H0: μ ≤ 50 and run the trash bag commercial only if the test statistic z is far enough above 0 to show beyond a reasonable doubt that H0: μ ≤ 50 is false and Ha: μ > 50 is true. A test statistic that is only slightly greater than 0 might not be convincing enough. However, because such a test statistic would result from a sample mean that is slightly greater than 50, it would provide some evidence to support rejecting H0: μ ≤ 50, and it certainly would not provide strong evidence sup porting H0: μ ≤ 50. Therefore, if the value of the test statistic is not large enough to convince us to reject H0, we do not say that we accept H0. Rather we say that we do not reject H0 because the evidence against H0 is not strong enough. Again, this is similar to our legal system, where the lack of evidence of guilt beyond a reasonable doubt results in a verdict of not guilty, but does not prove that the accused is innocent.
Type I and Type II errors and their probabilities
To determine exactly how much statistical evidence is required to reject H0, we consider the errors and the correct decisions that can be made in hypothesis testing. These errors and correct decisions, as well as their implications in the trash bag advertising example, are summarized in Tables 9.1 and 9.2. Across the top of each table are listed the two possible “states of nature.” Either H0: μ ≤ 50 is true, which says the manufacturer’s claim that μ is greater than 50 is false, or H0 is false, which says the claim is true. Down the left side of each table are listed the two possible decisions we can make in the hypothesis test. Using the sample data, we will either reject H0: μ ≤ 50, which implies that the claim will be advertised, or we will not reject H0, which implies that the claim will not be advertised.
Table 9.1: Type I and Type II Errors
Table 9.2: The Implications of Type I and Type II Errors in the Trash Bag Example
In general, the two types of errors that can be made in hypothesis testing are defined here:
Type I and Type II Errors
If we reject H0 when it is true, this is a Type I error.
If we do not reject H0 when it is false, this is a Type II error.
As can be seen by comparing Tables 9.1 and 9.2, if we commit a Type I error, we will advertise a false claim. If we commit a Type II error, we will fail to advertise a true claim.
We now let the symbol α (pronounced alpha) denote the probability of a Type I error, and we let β (pronounced beta) denote the probability of a Type II error. Obviously, we would like both α and β to be small. A common (but not the only) procedure is to base a hypothesis test on taking a sample of a fixed size (for example, n = 40 trash bags) and on setting α equal to a small prespecified value. Setting α low means there is only a small chance of rejecting H0 when it is true. This implies that we are requiring strong evidence against H0 before we reject it.
We sometimes choose α as high as .10, but we usually choose α between .05 and .01. A frequent choice for α is .05. In fact, our former student tells us that the network often tests advertising claims by setting the probability of a Type I error equal to .05. That is, the network will run a commercial making a claim if the sample evidence allows it to reject a null hypothesis that says the claim is not valid in favor of an alternative hypothesis that says the claim is valid with α set equal to .05. Since a Type I error is deciding that the claim is valid when it is not, the policy of setting α equal to .05 says that, in the long run, the network will advertise only 5 percent of all invalid claims made by advertisers.
One might wonder why the network does not set α lower—say at .01. One reason is that it can be shown that, for a fixed sample size, the lower we set α, the higher is β, and the higher we set α, the lower is β. Setting α at .05 means that β, the probability of failing to advertise a true claim (a Type II error), will be smaller than it would be if α were set at .01. As long as (1) the claim to be advertised is plausible and (2) the consequences of advertising the claim even if it is false are not terribly serious, then it is reasonable to set α equal to .05. However, if either (1) or (2) is not true, then we might set α lower than .05. For example, suppose a pharmaceutical company wishes to advertise that it has developed an effective treatment for a disease that has formerly been very resistant to treatment. Such a claim is (perhaps) difficult to believe. Moreover, if the claim is false, patients suffering from the disease would be subjected to false hope and needless expense. In such a case, it might be reasonable for the network to set α at .01 because this would lower the chance of advertising the claim if it is false. We usually do not set α lower than .01 because doing so often leads to an unacceptably large value of β. We explain some methods for computing the probability of a Type II error in optional Section 9.6. However, β can be difficult or impossible to calculate in many situations, and we often must rely on our intuition when deciding how to set α.