Research Methods 1: Statistics
Problem-Sheet 6: Dependent-Means t-tests:
1. Ten subjects take a test
of motor coordination, once after drinking a pint of beer and once without
drinking alcohol. Their times in seconds to complete the task were:
Subject: |
With beer: |
Without beer: |
1 |
12.4 |
10.0 |
2 |
15.5 |
14.2 |
3 |
17.9 |
18.0 |
4 |
9.7 |
10.1 |
5 |
19.6 |
14.2 |
6 |
16.5 |
12.1 |
7 |
15.1 |
15.1 |
8 |
16.3 |
12.4 |
9 |
13.3 |
12.7 |
10 |
11.6 |
13.1 |
Perform a matched-pairs (also
known as a dependent-means) t-test to test whether drinking beer makes you
slower at the task. [Answer: t = 2.18, p>0.05]. How would you design this
experiment, in order to ensure a fair result?
2. An experiment is performed
to see if motorcycling makes people happier. Ten individuals are given a
happiness questionnaire before and after 30 minutes' motorcycle riding. The
scores are shown below (high score = high degree of happiness).
Subject: |
Before: |
After: |
1 |
5 |
7 |
2 |
4 |
8 |
3 |
6 |
9 |
4 |
4 |
10 |
5 |
7 |
11 |
6 |
3 |
3 |
7 |
4 |
5 |
8 |
6 |
5 |
9 |
3 |
3 |
10 |
5 |
5 |
(a) Calculate the mean and
standard deviation for each condition. (Use the s.d. formula which gives the s.d.
as an estimate of the population s.d.). [Answers: 4.7 and 1.34; 6.6 and 2.84].
(b) Perform a dependent-means
t-test. [Answer: t = 2.63, p<0.05].
What problems are there in
performing this experiment, in practice? (Apart from difficulty in persuading the
individuals concerned to stop riding the bikes!)
3. Five articles are selected
randomly from each of twelve "Daily Stir" journalists. The mean
number of lies and libellous comments per article is recorded for each
journalist. These journalists are then sent on a creative writing course, after
which five of their more recent articles are selected at random and analysed as
before. Here are the data:
journalist: |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
no. of lies before course: |
3 |
9 |
16 |
5 |
6 |
10 |
12 |
11 |
6 |
9 |
2 |
6 |
no. of lies after course: |
5 |
8 |
42 |
7 |
12 |
12 |
10 |
14 |
7 |
11 |
5 |
6 |
(a) What are the mean and
standard deviation for each condition? [Answers: 7.92 and 4.01; 11.58 and
10.03].
(b) Perform a matched-pairs
t-test on these data. [Answer: t = 1.73,
p>.10].
(c) Journalist number three
gets a new job at the Daily Moron, on the strength of his outstanding
performance following the creative writing course. Reanalyse the data, omitting
this subject. [Answers: mean and standard deviation for "before"
condition are 7.18 and 3.25; mean and s.d. for "after" condition are
8.82 and 3.12; t = 2.52, p<0.05].
4. An experiment is performed
to determine which is more unpleasant: singing by Kylie Minogue or singing by
Jason Donovan. Subjects are manacled to a chair, and forced to listen to either
a Kylie song and then a Jason song, or vice versa. The dependent variable is
the time (in seconds) before the first scream is uttered. Which singer is considered
more unpleasant, Jason or Kylie? Here are the data (presented in this way for
clarity: subjects 1 to 7 heard Jason before Kylie, while subjects 8 to 14
suffered Kylie before Jason). Calculate the mean and standard deviation for
each condition, and perform a dependent-means t-test on the data. [Answer: mean
and s.d. for Kylie condition = 4.42 and 2.62; mean and s.d. for Jason condition
= 4.71 and 3.12; t = 0.56, p>.50].
subject number: |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
Kylie song: |
6 |
4 |
7 |
2 |
1 |
8 |
2 |
3 |
1 |
3 |
4 |
5 |
9 |
7 |
Jason song: |
8 |
3 |
3 |
1 |
1 |
9 |
3 |
2 |
3 |
6 |
3 |
7 |
11 |
6 |
First-Year Research Methods: Worked
Solutions to Problem Sheet 6:
Question
1:
(a) Work out the difference score (D) for each subject:
|
with beer: |
without beer: |
difference (D): |
subject 1: |
12.4 |
10.0 |
2.4 |
subject 2: |
15.5 |
14.2 |
1.3 |
subject 3: |
17.9 |
18.0 |
-0.1 |
subject 4: |
9.7 |
10.1 |
-0.4 |
subject 5: |
19.6 |
14.2 |
5.4 |
subject 6: |
16.5 |
12.1 |
4.4 |
subject 7: |
15.1 |
15.1 |
0.0 |
subject 8: |
16.3 |
12.4 |
3.9 |
subject 9: |
13.3 |
12.7 |
0.6 |
subject 10: |
11.6 |
13.1 |
-1.5 |
(b) Find SD,
the sum of the difference scores: SD = 16.0.
Divide SD by n (the number of difference scores) to get the mean
difference score, .
(c) Find the standard deviation of the difference scores.
Here is the formula. (Don't get confused by the terminology here.
"S.D." or "s.d." are abbreviations in English for "standard deviation". The mathematical symbol for the sample
standard deviation is "s", and "sD" in this
context stands for "standard deviation of the Difference
scores"!)
This
standard deviation is a measure of the spread of the difference scores around
the mean difference score. If this s.d. is large, it means that difference
scores varied widely - so that some subjects might show a large difference
between the two conditions that they participated in, while others might show a
small difference; and/or it might reflect a difference in the opposite
direction (i.e., with some subjects performing better on the second test than
on the first, and others performing better on the first test than on the last.
This seems to be the case in our experiment, as shown by the fact that some
differences are negative in sign while others are positive).
(d) Find the standard error of the mean of the difference
scores. This is simply the standard deviation of the difference scores, divided
by the square root of the number of difference scores:
(Again, it's easy to get confused by the notation here:
the standard error is designated by S followed by a D with a bar over the top, whereas
the standard deviation has the same symbols but without the little bar. The SD (without a little bar) within this equation
stands for the whole of the standard deviation equation in section (c) above).
(e) Finally, find t. To get t, take the mean difference score (the result of
step b), and divide it by the standard error of the mean of the difference
scores (the result of step d).
This gives t = 2.18.
The degrees of freedom are given by the number of difference scores minus one;
in this case, d.f. = 10-1 = 9 degrees of freedom. Consulting a table of
critical t-values, we find that the critical t at the p = 0.05 significance
level for a two-tailed test is 2.262. Our obtained t is smaller than this;
hence we conclude that it is not statistically significant. In other words,
there is no reason to reject the null hypothesis, that the difference between
our two conditions is due merely to chance (i.e., that what we effectively have
is two samples from the same "population", which differ merely due to
random sampling variation).
How would you design this experiment, in order to ensure
a fair result? You would have to take care to avoid order effects, which are
always a potential problem with repeated measures designs. You would have to
make sure that half of the subjects did the two conditions in one order, and
the other half did the conditions in the opposite order; and that you allowed
enough time between tests for the effects of the beer to wear off, for those
people who experienced this condition first!
Question
2:
|
before: |
after: |
difference (D): |
subject 1: |
5 |
7 |
-2 |
subject 2: |
4 |
8 |
-4 |
subject 3: |
6 |
9 |
-3 |
subject 4: |
4 |
10 |
-6 |
subject 5: |
7 |
11 |
-4 |
subject 6: |
3 |
3 |
0 |
subject 7: |
4 |
5 |
-1 |
subject 8: |
6 |
5 |
1 |
subject 9: |
3 |
3 |
0 |
subject 10: |
5 |
5 |
0 |
Just looking at the difference scores, we can see that
most subjects were happier after riding the motorcycle than they were before. Three
subjects showed no difference, and one subject was happier before the ride than
after.
We have ten difference scores, so
d.f. = 10 - 1 = 9.
Comparing our obtained t, -2.63, to the critical t for a 0.05 significance level with a
two-tailed test (2.262), we find that our obtained t is larger. (Ignore the
sign when comparing them: the absolute value of t is 2.63, which is bigger than
2.262). Therefore we reject the null hypothesis and conclude that our two
conditions differ as a consequence of what we did to the subjects (i.e., the
motorcycle ride). In other words, the difference between our sample means is so
big that it is unlikely to have occurred merely by chance.
However, in practice there would be problems in
interpreting this result as showing that "motorcycle riding makes people
happier". All of our subjects were tested in the same order, i.e. first
before riding and secondly after riding. Consequently, we have not eliminated
other possible causes for our observed results. It might be, for example, that
people tested twice on a happiness questionnaire usually score higher on the
second test - regardless of any experimental manipulation we might choose to
insert between tests. If this were true, the motorcycle ride might have nothing
to do with our results at all. This is an important point: a statistics test
performed correctly on data from a badly-designed experiment may give you a
plausible result which is nevertheless actually worthless in terms of the
conclusions which you can validly draw from it. What would be a better design
in this particular case? We have the problem that we cannot run half of our
subjects in the opposite order - you cannot undo a motorcycle ride! We could
assign subjects randomly to two groups, an experimental group (who get a
motorcycle ride) and a control group (who do not). Then, we could compare the
happiness scores of the two groups, using an independent-means t-test.
Question
3:
(b)
journalist: |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
lies before course: |
3 |
9 |
16 |
5 |
6 |
10 |
12 |
11 |
6 |
9 |
2 |
6 |
lies after course: |
5 |
8 |
42 |
7 |
12 |
12 |
10 |
14 |
7 |
11 |
5 |
6 |
difference (D): |
-2 |
1 |
-26 |
-2 |
-6 |
-2 |
2 |
-3 |
-1 |
-2 |
-3 |
0 |
There are (12-1) = 11 degrees of freedom.
The critical value of t (obtained from a table of t-values)
for 11 d.f. is 2.201, for a two-tailed test and a 0.05 significance level. Our
obtained t is smaller than this value, and so we cannot reject the null
hypothesis, that there is no difference between the number of lies produced by
these writers before and after their creative writing course. (Note that, even
if there were, we would have problems attributing the change to the creative
writing course because of the uncontrolled order problems, as in the previous
question).
(c) Reanalysing the data, with journalist number three
omitted (and the remaining journalists renumbered accordingly):
journalist: |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
lies before course: |
3 |
9 |
5 |
6 |
10 |
12 |
11 |
6 |
9 |
2 |
6 |
lies after course: |
5 |
8 |
7 |
12 |
12 |
10 |
14 |
7 |
11 |
5 |
6 |
difference (D): |
-2 |
1 |
-2 |
-6 |
-2 |
2 |
-3 |
-1 |
-2 |
-3 |
0 |
t = -2.52, with
(11-1) = 10 degrees of freedom. Our obtained t value is larger than the
critical value of t for a two-tailed test (2.228). we can reject the null
hypothesis, that there is no difference in the number of lies produced by the
journalists in the two conditions (i.e., before and after the creative writing
course), in favour of the alternative hypothesis that there is a difference.
However, we are still faced with the problem of interpreting these results
because of the problems with the experiment's design. The solution would be to
use an independent-measures design: randomly allocate journalists to one of two
groups, journalist who took a creative writing course and journalists who did
not. Their subsequent capacities for lying could then be compared with an
independent-means t-test.
Question
4:
subject no.: |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
Kylie song: |
6 |
4 |
7 |
2 |
1 |
8 |
2 |
3 |
1 |
3 |
4 |
5 |
9 |
7 |
Jason song: |
8 |
3 |
3 |
1 |
1 |
9 |
3 |
2 |
3 |
6 |
3 |
7 |
11 |
6 |
D: |
-2 |
1 |
4 |
1 |
0 |
-1 |
-1 |
1 |
-2 |
-3 |
1 |
-2 |
-2 |
1 |
Just looking at the difference scores, we can tell
that there is not much difference between these two conditions: the differences
are small, and about half of the subjects thought Kylie was worse than Jason
while the other half thought the opposite.
There are (14-1) = 13 d.f.
The critical t-value is 2.160 for a two-tailed test. Our obtained
t of -0.56 is much smaller than this, and so we have no reason to reject the
null hypothesis that there is no difference between the two singers - both
cause people to scream very quickly!
[N.B. In all these examples, our t-values are negative. This
is just a coincidence: had we ordered our two data columns the opposite way
round, then we would have obtained positive t-values. In comparing obtained and
critical t-values, ignore the sign of the obtained t when doing a two-tailed
test].