Measuring the weight of evidence against the null hypothesis
We have seen that in some situations the decision to take an action is based solely on whether a null hypothesis can be rejected in favor of an alternative hypothesis by setting α equal to a single, prespecified value. For example, in the trash bag case the television network decided to run the trash bag commercial because H0: μ = 50 was rejected in favor of Ha: μ > 50 by setting α equal to .05. Also, in the payment time case the management consulting firm decided to claim that the new electronic billing system has reduced the Hamilton trucking company’s mean payment time by more than 50 percent because H0: μ = 19.5 was rejected in favor of Ha: μ < 19.5 by setting α equal to .01. Furthermore, in the Valentine’s Day chocolate case, the candy company decided to base its production of valentine boxes on the ten percent projected sales increase because H0: μ = 330 could not be rejected in favor of Ha: μ ≠ 330 by setting α equal to .05.
Although hypothesis testing at a fixed α level is sometimes used as the sale basis for deciding whether to take an action, this is not always the case. For example, consider again the payment time case. The reason that the management consulting firm wishes to make the claim about the new electronic billing system is to demonstrate the benefits of the new system both to the Hamilton company and to other trucking companies that are considering using such a system. Note, however, that a potential user will decide whether to install the new system by considering factors beyond the results of the hypothesis test. For example, the cost of the new billing system and the receptiveness of the company’s clients to using the new system are among other factors that must be considered. In complex business and industrial situations such as this, hypothesis testing is used to accumulate knowledge about and understand the problem at hand. The ultimate decision (such as whether to adopt the new billing system) is made on the basis of nonstatistical considerations, intuition, and the results of one or more hypothesis tests. Therefore, it is important to know all the information—called the weight of evidence—that a hypothesis test provides against the null hypothesis and in favor of the alternative hypothesis. Furthermore, even when hypothesis testing at a fixed α level is used as the sole basis for deciding whether to take an action, it is useful to evaluate the weight of evidence. For example, the trash bag manufacturer would almost certainly wish to know how much evidence there is that its new bag is stronger than its former bag.
The most informative way to measure the weight of evidence is to use the p-value. For every hypothesis test considered in this book we can interpret the p-value to be the probability, computed assuming that the null hypothesis H0 is true, of observing a value of the test statistic that is at least as extreme as the value actually computed from the sample data. The smaller the p -value is, the less likely are the sample results if the null hypothesis H0 is true. Therefore, the stronger is the evidence that H0 is false and that the alternative hypothesis Ha is true. Experience with hypothesis testing has resulted in statisticians making the following (somewhat subjective) conclusions: