Type I and II errors
There
are two kinds of errors that can be made in
significance testing: (1) a true null
hypothesis can be incorrectly rejected and (2) a false null
hypothesis can fail to be rejected. The former error is called a Type
I error and the latter error is called a Type II error. These two
types of errors are defined in the table. The probability of a Type I
error is designated by the Greek letter alpha (a) and is
called the Type I error rate; the probability of a Type II error (the
Type II error rate) is designated by the Greek letter beta (ß)
. A Type II error is only an error in the sense that an opportunity
to reject the null hypothesis correctly was lost. It is not an error
in the sense that an incorrect conclusion was drawn since no
conclusion is drawn when the null hypothesis is
not rejected.
A Type I error, on the other hand, is an error in every sense of
the word. A conclusion is drawn that the null hypothesis is false
when, in fact, it is true. Therefore, Type I errors are generally
considered more serious than Type II errors. The probability of a
Type I error (a) is called the significance
level and is set by the experimenter. There is a tradeoff between
Type I and Type II errors. The more an experimenter protects him or
herself against Type I errors by choosing a low level, the greater
the chance of a Type II error. Requiring very strong evidence to
reject the null hypothesis makes it very unlikely that a true null
hypothesis will be rejected. However, it increases the chance that a
false null hypothesis will not be rejected, thus lowering
power. The Type I error rate is almost
always set at .05 or at .01, the latter being more conservative since
it requires stronger evidence to reject the null hypothesis at the
.01 level then at the .05 level.
Power
Power is the probability of correctly rejecting a false null
hypothesis. Power is therefore defined as: 1  b where
b is the Type II error probability.
If the power of an experiment is low, then there is a good chance that the experiment
will be inconclusive. That is why it is so important to consider power in the design
of experiments. There are methods for estimating the power
of an experiment before the experiment is conducted. If the power is too low, then
the experiment can be redesigned by changing one of the factors
that determine power.
Consider a hypothetical experiment designed to test whether rats brought up in
an enriched environment can learn mazes faster than rats brought up in the typical
laboratory environment (the control condition). Two groups of 12 rats each are tested.
Although the experimenter does not know it, the population
mean number of trials it takes to learn the maze is 20 for the enriched condition
and 32 for the control condition. The null hypothesis that the enriched environment
makes no difference is therefore false.
The question is, "What is the probability that the experimenter
is going to be able to demonstrate that the null hypothesis is false
by rejecting it at the .05 level?" This is
the same thing as asking "What is the power of the test?" Before the
power of the test can be determined, the
standard deviation (s) must be known. If s = 10 then the power of the significance test is .82.
This means that there is a .82 probability that the experimenter will
be able to reject the null hypothesis. Since power = .82, b = 1.82 = .18.
It is important to keep in mind that power is not about whether or
not the null hypothesis is true (It is assumed to be false). It is
the probability the data gathered in an experiment will be sufficient
to reject the null hypothesis. The experimenter does not know that
the null hypothesis is false. The experimenter asks the question: If
the null hypothesis is false with specified population means and
standard deviation, what is the probability that the data from the
experiment will be sufficient to reject the null hypothesis?
If the experimenter discovers that the probability of rejecting
the null hypothesis is low (power is low) even if the null hypothesis
is false to the degree expected (or hoped for), then it is likely
that the experiment should be redesigned. Otherwise, considerable
time and expense will go into a project that has a small chance of
being conclusive even if the theoretical ideas behind it are correct.
