One variable: graphs and descriptive statistics

 

WHEN DO YOU NEED THEM?

 

When does one ever want to look at results for just one variable? True, most classical research involves questions/hypotheses that entail looking at relationships between at least two variables. But here are some common situations where you want to look at graphs and descriptive statistics for cases on one variable:

 

Some studies look directly at more or less all of an entire population of interest, and have questions about single variables, so the results just need to be given as graphs and descriptive statistics. E.g.

-- A census survey to find out how many people in Wales speak Welsh shows that 20.8% claim to.

-- A teacher has the feeling (hypothesis) that her vocab teaching (or the students' learning of vocab) is not very successful. She tests her class to see how much of the vocabulary of the last ten lessons they have learnt, as part of an action research project to improve her vocabulary teaching.

 

● In any study, however many variables are involved, it is often valuable to describe our cases in terms of each of a set of variables that are not central to the investigation, as part of your control of unwanted factors (cf CVs and SAMPLING). E.g.

-- You are interested in the opinions of Greek learners in private schools in Greece about their English course materials, so you send 50 questionnaires to your friend who works in one to distribute for you. You get thirty-one back. On the questionnaire, apart from questions about the central variables of your study, you might ask questions to elicit their first language, age, gender, experience of English outside school etc. You then check each of these separately to see if it suggests your sample is unusual in any way (?unexpectedly many girls), has odd cases in it (?two who say their first language is Bulgarian) etc. You might well display proportions and graphs for genders, age groups, respondents vs non-respondents etc. as part of your report on your subjects.

 

● In any study, however many variables are involved, you may be in the position of deciding groups of cases on the basis of information gathered about them, rather than in advance. E.g.

-- In the above example you might actually want to make up groups out of your subjects, using their questionnaire responses, to use as EVs. Examining the 'experience of English outside school' variable you find there are ten who have been abroad to English speaking countries, so you might set up a two category EV on this basis to see if opinions about course materials relate to having/not having this experience at all. If so, would you have any hypothesis about the answer? Anyway, you might display a graph or table showing the proportions.

 

● In any study, however many variables are involved, it is often valuable to look at results for all cases on each dependent variable or condition separately and/or in each group separately as well as doing what is necessary to establish relationships between variables. E.g.

-- In our example of gender and attitude to RP, with the hypothesis that there is at attitude difference between genders. Apart from doing the relevant two-variable graphs and statistics one would do well to explore the data with histograms for each group separately and the whole set of subjects as if one group. One does not necessarily report in the write-up every statistic or graph one calculates if it does not prompt anything interesting to say about it, but even statisticians often comment that researchers often 'don't look at their data enough, but just want to do a significance test and get on to the next thing'.

 

DECISIONS ABOUT THE RIGHT GRAPHS AND DESCRIPTIVE STATISTICS

 

If you are not into one variable inferential stats (which we are omitting here), the choices to be made are simple:

 

Scale type of Variable:

Interval           Rank order     Categories                  Counts

of any sort                  in a continuum

 

Graphic                       histogram        ordered list       bar chart,                    single bar

Presentation:              of scores,         of cases            pie chart

frequency                                 (of frequencies

polygon                                    or percent)

 

Centrality statistic:    mean,              median rank      modal category           frequency

median score,

modal score

 

Variation statistic:     standard          quartile             index of commonality

deviation,        deviation

range

 

-- Only the italicised ones are commonly met and will be dealt with here.

-- The modal category is simply the one with the most cases in it - the most popular one.

-- The mean (denoted by M or X-bar) is what we usually call the average in everyday English.

-- The standard deviation (SD) is a measure of the spread of scores. Roughly it is the average of the differences between each score and the mean score (see any stats book for the formula). So if everyone in a group scores the same, which will be the mean for the group, then the average of the differences of each score from the mean is 0 (SD=0; no variation). The more each score differs from the mean, the higher the SD gets, indicating more variation or 'disagreement' in the group. Usually one 'wants' small SDs.

-- Similar concepts to SD, calculated in various ways, are called 'error' and 'variance' in statistics.

Joke from WWW: Most of us have A Greater Than Average Number of Legs

The great majority of people have more than the average number of legs. Amongst the 57 million people in Britain there are probably 5,000 people who have only one leg. Therefore the average number of legs is

    ((5000 x 1) + (56,995,000 x 2)) / 57,000,000 = 1.9999123.

Since most people have two legs... need I say more?

 


A FEW EXAMPLES OF SIMPLE GRAPHS: HOW TO MISLEAD

 

1. Shows two versions of a histogram of results for one group of 16 learners on one variable ('interval' scores for quality of each subject's written composition). Which version is better and why, or is neither optimal? What distinguishes a histogram from a bar graph/chart (seen in 2)? When to use each?

 

2. Is a bar graph (=bar chart) showing the broad subject specialism of participants in a study. I.e. it displays how all cases are categorised on a two category variable.  How would you improve it for inclusion in a write-up?

 

3. Shows two bar graphs for the same data – a set of several mean scores. One group of learners has given their ratings (on a five point scale) of how much they think eight different aspects of their compositions improved when done by word processing. The average ratings for each of these 8 variables are displayed together. Which is the better version and why?

 

 


 

SIMPLE PERCENTAGES: HOW TO MISLEAD

 

1) Which sounds more impressive, A or B?

 

                                    A) 2 out of 4 subjects agreed      B) 50% of subjects agreed

           

                                    A) 40 out of 80 subjects agreed     B) 50% of subjects agreed

 

OK, but which result would you actually trust more? How should one report such results?

 

2) What is unclear? How to restate this better?

 

            In our survey we polled 50 people, though 10 declined to participate. …. 60% said yes to the question ‘Do you like the English class?’…

 

3) Percentage scores versus group/aggregate percent.
 
Two ways of handling data arising from different numbers of potential occurrences for different people. Imaginary example of data where three subjects have been recorded in quasi-natural conversation, and counts have been made of their NS-like/correct use of third person –s.  

Why do the percent differ in A and B? Which would the statistician prefer and why?

 

A)    Analysis with subjects as cases: percentage scores and their mean

 

Case

Correct

Incorrect

Total occurrences

Percent correct score

Mean percent correct

Learner 1

12

12

24

50

 

Learner 2

8

12

20

40

Learner 3

3

9

12

25

Total

23

33

56

 

 

 

B) Analysis with occurrences as cases: group percent

 

 

 

Total frequency

Percent

 

Correct

23 

41.1%

Incorrect

33 

58.9%

 

Total occurrences

56

 

 

 

PJS rev 05