One
variable: graphs and descriptive statistics
WHEN DO YOU NEED
THEM?
When does one ever
want to look at results for just one variable? True, most classical research
involves questions/hypotheses that entail looking at relationships between at
least two variables. But here are some common situations where you want
to look at graphs and descriptive statistics for cases on one variable:
● Some studies look directly at
more or less all of an entire population of interest, and have questions
about single variables, so the results just need to be given as graphs and
descriptive statistics. E.g.
-- A
census survey to find out how many people in Wales speak Welsh shows that 20.8%
claim to.
-- A
teacher has the feeling (hypothesis) that her vocab teaching (or the students'
learning of vocab) is not very successful. She tests her class to see how much
of the vocabulary of the last ten lessons they have learnt, as part of an
action research project to improve her vocabulary teaching.
●
In any study, however many variables are involved, it is often valuable to
describe our cases in terms of each of a set of variables that are not central
to the investigation, as part of your control of unwanted factors (cf
CVs and SAMPLING). E.g.
--
You are interested in the opinions of Greek learners in private schools in
Greece about their English course materials, so you send 50 questionnaires to
your friend who works in one to distribute for you. You get thirty-one back. On
the questionnaire, apart from questions about the central variables of your
study, you might ask questions to elicit their first language, age, gender,
experience of English outside school etc. You then check each of these
separately to see if it suggests your sample is unusual in any way
(?unexpectedly many girls), has odd cases in it (?two who say their first
language is Bulgarian) etc. You might well display proportions and graphs for
genders, age groups, respondents vs non-respondents etc. as part of your report
on your subjects.
●
In any study, however many variables are involved, you may be in the position
of deciding groups of cases on the basis of information gathered about
them, rather than in advance. E.g.
-- In
the above example you might actually want to make up groups out of your
subjects, using their questionnaire responses, to use as EVs. Examining the
'experience of English outside school' variable you find there are ten who have
been abroad to English speaking countries, so you might set up a two category
EV on this basis to see if opinions about course materials relate to having/not
having this experience at all. If so, would you have any hypothesis about the
answer? Anyway, you might display a graph or table showing the proportions.
●
In any study, however many variables are involved, it is often valuable to look
at results for all cases on each dependent variable or condition separately
and/or in each group separately as well as doing what is necessary to
establish relationships between variables. E.g.
-- In
our example of gender and attitude to RP, with the hypothesis that there is at
attitude difference between genders. Apart from doing the relevant two-variable
graphs and statistics one would do well to explore the data with histograms for
each group separately and the whole set of subjects as if one group. One does
not necessarily report in the write-up every statistic or graph one calculates
if it does not prompt anything interesting to say about it, but even
statisticians often comment that researchers often 'don't look at their
data enough, but just want to do a significance test and get on to the next
thing'.
DECISIONS ABOUT THE
RIGHT GRAPHS AND DESCRIPTIVE STATISTICS
If you are not into
one variable inferential stats (which we are omitting here), the choices
to be made are simple:
Scale type of Variable:
Interval Rank
order Categories Counts
of any sort in a
continuum
Graphic histogram
ordered list bar chart, single bar
Presentation: of scores, of cases pie chart
frequency (of
frequencies
polygon or
percent)
Centrality
statistic: mean, median
rank modal category frequency
median score,
modal score
Variation
statistic: standard quartile
index of commonality
deviation, deviation
range
-- Only the italicised
ones are commonly met and will be dealt with here.
-- The modal category
is simply the one with the most cases in it - the most popular one.
-- The mean (denoted
by M or X-bar) is what we usually call the average in everyday English.
-- The standard
deviation (SD) is a measure of the spread of scores. Roughly it is the
average of the differences between each score and the mean score (see any stats
book for the formula). So if everyone in a group scores the same, which will be
the mean for the group, then the average of the differences of each score from
the mean is 0 (SD=0; no variation). The more each score differs from the mean,
the higher the SD gets, indicating more variation or 'disagreement' in the
group. Usually one 'wants' small SDs.
-- Similar concepts to
SD, calculated in various ways, are called 'error' and 'variance' in
statistics.
The great majority of
people have more than the average number of legs. Amongst the 57 million people
in Britain there are probably 5,000 people who have only one leg. Therefore the
average number of legs is
((5000 x 1) + (56,995,000 x 2)) / 57,000,000 = 1.9999123.
Since most people have two legs... need I say more?
A FEW EXAMPLES OF SIMPLE GRAPHS: HOW TO MISLEAD
1. Shows two versions of a histogram of results
for one group of 16 learners on one variable ('interval' scores for quality of
each subject's written composition). Which version is better and why, or is
neither optimal? What distinguishes a histogram from a bar graph/chart (seen in
2)? When to use each?
2. Is a bar graph (=bar chart) showing the
broad subject specialism of participants in a study. I.e. it displays how all
cases are categorised on a two category variable. How would you improve it for inclusion in a
write-up?
3. Shows two bar graphs for the same data – a set of
several mean scores. One group of learners has given their ratings (on a five
point scale) of how much they think eight different aspects of their
compositions improved when done by word processing. The average ratings for
each of these 8 variables are displayed together. Which is the better version
and why?


SIMPLE
PERCENTAGES: HOW TO MISLEAD
1)
Which sounds more impressive, A or B?
A) 2 out of
4 subjects agreed B) 50% of subjects
agreed
A) 40 out of
80 subjects agreed B) 50% of subjects
agreed
OK, but which
result would you actually trust more? How should one report such results?
2)
What is unclear? How to restate this better?
In our survey we polled 50 people,
though 10 declined to participate. …. 60% said yes to the question ‘Do you like
the English class?’…
Why
do the percent differ in A and B? Which would the statistician prefer and why?
A)
Analysis with subjects as cases: percentage scores and
their mean
|
Case |
Correct |
Incorrect |
Total occurrences |
Percent correct score |
Mean percent correct |
|
Learner 1 |
12 |
12 |
24 |
50 |
|
|
Learner 2 |
8 |
12 |
20 |
40 |
|
|
Learner 3 |
3 |
9 |
12 |
25 |
|
|
Total |
23 |
33 |
56 |
|
B) Analysis with occurrences as cases: group percent
|
|
|
Total frequency |
Percent |
|
|
Correct |
23 |
41.1% |
|
Incorrect |
33 |
58.9% |
|
|
|
Total occurrences |
56 |
|
PJS rev 05