Evaluation of CALL Software

P.J.Scholfield, Dept of Language and Linguistics, University of Essex UK. Work in progress

General foundation points about CALL evaluation

The judgmental method of evaluation of CALL by checklist

The empirical methods of evaluation of CALL

My checklist for judgmental CALL evaluation

The sort of checklist I don’t like

For other people’s ideas about how to perform CALL evaluation see:

THIS SITE

Chapelle 2001 Chapter 3  and Levy 1997 p144ff.

Hubbard 1996 ‘Elements of CALL methodology’ in ed. Pennington The Power of CALL esp. pp27-29.

Higgins 1995 Computers and English Language Learning ch10.

L. Murray and A. Barnes 1998 ‘Beyond the “wow” factor – evaluating multimedia language learning from a pedagogical point of view’ System 26.

Odell 1986 ‘Evaluating CALL software’ in ed. Leech and Candlin Computers in English Language Teaching and Research

 

For evaluations of specific CALL programs/tasks see references interspersed in my schedule.

Basic definitions

'CALL software' here can involve any software or programs potentially usable by language learners in connection with learning/teaching or use of language (esp. EFL/ESL). That includes both material claimed as designed for this purpose ('dedicated'), and that not. The latter includes both specific programs like adventure games for native speaker children, and 'generic' or content free software like email or word processing. It also includes whatever hard copy support materials, booklet etc. any software comes with. See further our Intro.

"Evaluation is a matter of judging the fitness of something for a particular purpose" (Hutchinson and Waters 1989: 96). 'Evaluation' therefore implies an activity where something is declared suitable or not and consequent decisions are to be made or action taken. Evaluating something therefore is not the same as researching it, though research may be done to find out things which then inform the value judgment and hopefully make it better. Research on its own may just end up with information, not judgment and action.

CALL software and general teaching materials and tasks - a parallel?

Much of what we say below about evaluation of CALL software is similar to what one would say for 'materials evaluation' generally in language teaching. CALL software is often analogous to an individual exercise or task in a book, though some series of CDROMs constitute entire courses and so are parallel with complete coursebooks. The parallel is valuable... up to a point. There are some important differences, however.

Firstly, a book is not typically dynamic or interactive; a program, by contrast, may not always present an exercise the same way every time you use it, and can usually give some response to the user dependent on what they click or type in. That is why CALL programs have often been seen as replacing a teacher rather than just teaching materials, though that clearly does not fit all software.

Secondly, a book is more limited in its media capability. CALL can involve sound as well as pictures, diagrams and text all in the same package.

Thirdly, use of written materials has few technological prerequisites: eyes and a desk to put them on will do. CALL by contrast requires computers, network access etc.

Fourthly, the language content of material in a coursebook is essentially unalterable, while some CALL software allows 'authoring': i.e. the teacher can put in his/her own choice of text, words etc. for the program to make an exercise out of, or whatever. In fact some software, such as a wordprocessing program, is essentially content-free and is nothing unless someone enters text to make an exercise, or designates a task for learners to do with it (see next).

Fifthly, the activities to be done with each section of a coursebook are usually heavily constrained by the book itself, though there may be some latitude for the teacher to implement exercises in different ways, and of course skip some material. A CALL program on the other hand may be very constrained (e.g. a hangman game), or may be almost entirely open in this respect (e.g. email).

The last two are important for evaluation, as they make it hard to draw a line sometimes between evaluating the software and evaluating the specific language material a teacher has put in, or a specific task done with the software which is not determined by the software itself. I.e. the borderline between evaluating software ‘in itself’ as a material and evaluating some proposed or imagined use of the software becomes impossible to maintain.

The importance of evaluation

Evaluation is one of three key aspects of CALL that need consideration: Creation, Use and Evaluation. See Intro.

CALL shares one important thing with teaching materials and tasks in general. All these are under-evaluated. Just as new coursebooks and types of task are constantly being proposed and promoted by their creators … and adopted and used… so are CALL programs and activities (Chapelle top of p10). What rarely happens is any proper evaluation of the value or effectiveness of any of this…. by teachers or researchers. Correction: some teachers may well do a lot of evaluation of what they use… but, if so, it remains within their personal teaching process and is not published. Hence we have no idea how much of this goes on, or what evaluation methods and criteria are used; furthermore, nobody else gets the benefit of the information arising from the evaluation.

The three key components in CALL evaluation

Mostly evaluation cannot be done in the abstract. I.e. things are rarely universally good or bad. With CALL you may feel some programs have features which in NO situation would be any good. Possible candidates for ‘universal’ status could be software glitches (e.g. the program crashes whenever the help icon is clicked) and inaccuracy of language (e.g. multiple choice exercises where the option counted as correct is actually wrong). However, a lot is really 'relative' and it is as well to start off thinking of everything as potentially relative than the reverse. As Chapelle says (2001 p52): ‘Evaluation of CALL is a situation-specific argument’.

Clearly most features may be good for one type of person, situation etc. but bad for another. For example the kind of vocabulary included, the kind of computer knowledge required to work it. This is as true of general materials evaluation as of evaluation of CALL specifically. So one important aspect of evaluation is to establish the specific users (learners and teachers), situation, purpose etc. etc. that you are evaluating the materials for. This means that you cannot really evaluate without also thinking of how the material will be used in the learning and teaching process. It is quite possible for one and the same program to seem 'good' when used one way with a class and 'bad' used another way, or with a different class.

Software and materials evaluation in ELT, then, can be seen as an activity where you match materials to teaching/learning situations. I.e. there are three things to think about -

One may of course do that for just one piece of software at any one time, but it is often easier to evaluate two or more programs of the same type together. Comparisons are often revealing. In addition, one may often usefully compare a CALL activity/program with a non-CALL (pen and paper) counterpart, as has widely been done in writing research (pen versus wordprocessor).

Furthermore you can deal with the above three components one of two ways round:

When the evaluation is done

It is also worth noting that there can be several types of occasion when evaluation of teaching materials, including CALL, may occur (overlooking evaluation done while the software is actually under development):

1) Evaluation of materials prior to purchasing them or creating access to them for any learners. I.e. as a result of evaluating materials you decide whether to buy or adopt them or not, for some specific learners. (Direction i usually, though ii is also possible).

2) Evaluation after purchase or otherwise acquiring availability of software, but before use. Here usually the question is what learners it would suit. So the consequent action is to use it with/recommend it to these learners not those, and so on. (Direction ii, or i).

3) Evaluation after the program has been acquired and used with some learners for a bit. Here the question is whether it was a success and the action is to use/not use the program again with these or other learners, or to alter the way it is used in some way. (Direction ii).

This account is focussed more on 1 and 2, since most of us are not teachers who have just been using CALL with any actual learners, but the same ideas pervade all three situations. In all of them you decide if the materials are good or bad, not just what they consist of or 'do' etc.

Who evaluates

The evaluators we are thinking of here are primarily language teachers, though of course other people evaluate materials too - curriculum/program planners, government education departments, reviewers writing for journals, researchers in applied linguistics...etc. In the realm of CALL, it is especially necessary for teachers to be good at evaluating. There is a lot of poor material about; publishers are especially prone to hype; curriculum designers who might evaluate to choose suitable coursebooks for a course are less likely to extend this activity to CALL, so the job is left to the teacher; only a few teachers write their own CALL software (compared with the number who might write bits and pieces of their own non-CALL teaching materials) - most rely on professional products (though remember programs may require or allow some teacher 'authoring').

Methods of evaluation (A): Introspective judgmental evaluation; checklists

There are two broad types of way of actually executing evaluation studies (A and B here). In many ways A suits situations 1 and 2 above, B suits situation 3. (Cf Chapelle 2001 p53).

Introspection means relying on one's own judgment/experience, and maybe published consensus on what should be there, what is good or bad, or AL theory.

(A1) Evaluation can be done purely individually, subjectively, globally and introspectively. I.e. the teacher simply looks through the material, or in our case tries out the program (or just reads the blurb about it in a catalogue), and comes to an overall intuitive judgment about whether it would suit their class or what class it would suit. When teachers evaluate in this way it may help in part to try to place themselves in the role of some type of learner using the material. When trying out a CALL program it is especially useful often to make deliberate mistakes to see how the program responds - e.g. give wrong answers and press the wrong keys etc.

This could be described as the global 'expert judgment' method of evaluation. The evaluator introspects and somehow accesses an unanalysed notion of some users of the software, an unanalysed impression of the software, and matches the two using often inexplicit criteria.

(A2) However, to regard evaluation as in any way systematic it is necessary at the very least to 'unpack' this armchair approach a bit. The teacher (or anyone else) acting alone as evaluator should break down the 'overall' or global judgment into parts. This means (a) looking carefully at different aspects of the materials separately and (b) thinking of all the relevant different aspects of the learning situation, learners, potential use etc. etc. and (c) judging aspects of (a) in respect of (b), broken down into points. This last in part resembles the process of assessing 'content validity', often talked about in language testing: one can check on an achievement test by analysing the aspects of language tested and comparing them with what the syllabus or the teaching course before the test covered. Another general principle of language testing also applies here: it is known that tests with more items are more reliable than shorter ones, and a set of agree/disagree items circling round some issue is more reliable than a single one targeting it. So here, the summary of a whole series of introspective judgments of specific aspects is more reliable than one global one.

This is where 'checklists' come in. These are written records of the sort of 'breakdowns' just described. They may be made by the teacher/evaluator, or adopted from someone else. They at least provide a way of ensuring that important aspects do not get forgotten and that there is some consistency if the same person evaluates several things. However, the evaluation still remains individual, introspective and maybe pretty subjective. Checklists generally take the form of sets of headings to be considered or sets of questions to ask oneself. They may or may not include a system for weighting different elements, or adding up a total score in some way. Two I know of for CALL are the list of points in Jones and Fortescue, and a more reasoned and systematic framework by Odell (in Leech and Candlin). Recently Chapelle has a set of 6 points formed from an SLA research perspective (2001 p54ff). John Roberts has a much bigger collection of such checklist used in general materials evaluation.

However, many published checklists strike one as a rather miscellaneous collection of points or questions, not clearly distinguishing between (a) and (b) and (c) above, and not obviously exhausting the types of point that should be considered, or organising them in a motivated way.

For teachers, often the checklist-based evaluation just described is the only one feasible, since it is the one that can be done quickly and easily and before the materials have been extensively used or even bought. It can be enhanced by incorporating the views, arrived at in a similar way perhaps, of more than one person. I.e. the teacher can get other teachers to do the same sort of evaluation, or read reviews in journals etc. This makes it less individual, though still introspective and rather subjective.

(A3) Additionally the teacher may enhance the checklist approach, if he/she has the time and energy......, by doing things that in a loose sense could be called 'research'. By this I mean looking systematically with some analytic techniques etc. at aspects under the (a) or (b) head above, not just deciding what they are on an instant introspective basis. This may focus more on the (a) side: e.g. linguistic analysis of the structures used in the content of the program (if it is fixed), checking the frequency level of the vocabulary against a standard reference list, grading the exercise types that are incorporated on a recognised scale of task difficulty etc. This might be called 'materials analysis'. Or it may focus on the (b) side: e.g. finding out what the syllabus for the current year actually says my learners should be doing, doing an analysis of learners' needs or interests, finding out what the school budget actually has available, etc. This is in effect 'analysis of the learning/teaching situation'. These are all things that might appear on a checklist and of course can all alternatively be decided by the evaluator just "off the top of his/her head".

Further, with respect esp. to (c) the suitability judgment itself, these may bear some 'research' in the form of reading up what theory, research studies and so forth have to say. You have a program with certain characteristics and you want to use it with young learners (as the publishers indeed claim it is suited to be). Instead of just relying on one's own judgment of what is suitable, one can read up what the collective wisdom of psychologists, educators etc. have to say about what the characteristics are of young learners and so what suits them. Similarly the general wisdom on how to construct multiple choice items (e.g. in books on testing) may help evaluate the suitability of m/c items in a CALL package. Research studies of the way learners use CALL, teaching with CALL etc. may also be worth looking at, and indeed if a program is supposedly designed to aid reading, the general wisdom on the teaching of reading and reading strategies, and so forth. However, there is always the danger that supposedly 'general' research findings do not actually apply in your situation for some reason.

But if you are using the checklist approach there are some key things not to forget:

Be explicit about where the list comes from, which existing one is being used/adapted, and have as many detailed subsections as possible. Make sure whatever system/list you use covers all three of the (a) (b) and (c) aspects

Cover the (a) aspect. A description of detailed aspects of how the program works, with examples of actual items, screens etc., and what it does (a) has to be incorporated, since the reader cannot be assumed to be familiar with the software. If part of what you are evaluating is a particular task that is not part of the software itself, or some language element supplied by the teacher, make that clear. But that alone is not an evaluation.

Cover the (b) aspect. Give a full account of (imagined or real) target learners in a situation in a particular country at a particular level etc. Evaluation for some generalised 'learner' is not very convincing.

Don't forget (c) i.e. explanation of how each feature of the program (a) does or doesn't fit (b). This needs to be supported wherever possible by more than your expert intuition - reference to applied linguistic concepts, research, models etc. (E.g. Chapelle 2001 pp45-51). This is the crux of evaluation.

The actual organisation of the writeup of such an evaluation can be done several ways. The most popular and sensible probably is to describe (b) fully in advance, and the relevant research/theory background to (c). Then go through a systematic set of (a) points - different aspects of the materials - giving a clear description of each aspect and the actual evaluation (c) of each in relation to (b).

Some people use the overt structure of the specific materials themselves as the (a) basis for proceeding. E.g. instead of having a prior idea of what categories to look at (e.g. from a published checklist), and using headings such as 'language content', 'balance of focus on the four skills' etc., they proceed through a list like 'reading passage', 'cloze exercises' (i.e. things the programmers present as separate parts of the materials). That is in some ways 'easier' but of course instead of the evaluator imposing a relevant set of categories of things to look at it puts the materials in the driving seat and may mean that relevant things do not get looked at. Compare what happens when you visit TESCO without a shopping list of one's own made in advance, and just uses the shelves of the store as a prompt for what to buy as one goes round!

Methods of evaluation (B): Empirical evaluation

Other methods of evaluation generally require much more work, and for the materials to have been used for some time by learners/in actual classes (compare situation 3), so they are often firmly fixed in a specific teaching/learning situation (b). However, they do move away from the purely introspective approach. These are the ones that incorporate activities that are just like those we would otherwise regard as typical of regular empirical 'research' - measurement, surveys etc. I.e. they may entail using questionnaires and interviews, systematically observing, eliciting 'think-aloud' data from software users, or testing users. They may mean doing 'studies' (experimental or not) comparing the success of one material against another and so forth, or indeed doing 'action research' with CALL. (See Chapelle, Jamieson and Park 1996 in ed. Pennington The Power of CALL for an overview of types of empirical research done on CALL classified by the kinds of methods used; and Chapelle 2001 pp66-94 for a more detailed coverage, in relation to CALL tasks of the more communicative type, and classic SLA research issues looked at in CALL)

In themselves these 'research' type activities are non-evaluative, in the sense considered here (except action research). They are best seen as scientific means of gathering facts and testing hypotheses which can then either remain as cold statements of fact about what the effectiveness of the materials is or what people's opinions about them are, or be exploited for practical ends as part of an evaluation exercise - i.e. to make decisions like those described at the start.

Examples are:

Doing a survey of teachers and/or learners who have used the material and finding out how they use it, their difficulties, attitudes to the interest and usefulness of the content, tasks etc. Checklists can come in here again. E.g. one can base a questionnaire to users around the same set of (a) and (b) points that might otherwise be the points one asks oneself about in A above.

Observing a class using the program, taping and making systematic notes on their difficulties, actions, strategies, what they say, the teacher's involvement etc. Or one can ask learners to keep a diary of their reactions.

Getting the computer to store records of actions performed by learners using a program and analysing them to infer learner strategies and processes. (E.g. revisions when wordprocessing, accesses made to an online glossary when reading). Example in T. Johns 1997 ‘Contexts’ in ed Wichmann et al Teaching and Language Corpora (Longman).

The classic research comparison of those using one program with those using another differing in a small or large way (or no program… just doing non-computer equivalent tasks) over a period, with before and after tests to check on how much has been learnt.

If A type and B type evaluation are both done, the connection between the two needs to be spelt out. If the A evaluation resulted in adoption of the software, did the B evaluation show that was a good decision?

If actually doing B, remember to use all the paraphernalia of headings and sections as for any piece of empirical research (see LG475/575): background - research questions - method - results and interpretation, with an evaluative interpretation of all that added.

 A Checklist for Judgmental CALL Evaluation

The beginnings of a CALL checklist follow, inspired mainly by Odell 1986 ‘Evaluating CALL software’ in ed. Leech and Candlin Computers in English Language Teaching and Research and John Roberts’ 1996 article in System 24, but not exactly following either. This is definitely not meant to be exhaustive. You are invited to add to it, and subdivide into more detail, especially in the pedagogical area, as you look at actual software and think of points that aren't covered. It is meant to apply as much to generic software like the Internet used in some way for CALL as to a dedicated MMCD.

Remember you can organise an account in various ways – e.g. describe all the (b) first, then the (a) then finally do (c); or you can make a list of points each of which deals with (a,b,c) in one.

Some side questions I am not sure of the answer to:

How much CALL evaluation can be done using 'universal' criteria, how much is inevitably local to particular learners and situations? Chapelle 2001 ch3, from an SLA perspective, tends to emphasise the former, I, from an ELT perspective, the latter.

Should one pay any attention to the claims of the producers of software? Should one just evaluate the program for one's own purposes regardless? Or should one separately consider also (i) if the program does what it says it does, and (ii) if what it says it does is suitable to the target teaching/learning situation? Some suggest evaluation should have these two stages - External: Relevance to particular needs of particular learners (e.g. specific level, ESP, syllabus). Internal: quality of the work per se in meeting its declared specification/ aims. A prog. may be unsuitable (alone, or compared with another) EITHER because it is perfectly good but the wrong level of sophistication, coverage of items etc. for some class OR because it is just badly made.

As you try out CALL software: BOTH evaluate the software using the checklist, whatever comes to your 'expert' mind, and my hints (aimed to make you focus in more depth on either (a) or (b) elements), AND revise the checklist to become more comprehensive.

Specification (External pre-requisites of the software, consideration of which usually needs to be prior to any consideration of real pedagogical value. Used to assess basic practicality of using the software.)

(a) Aspects of software that are usually present and need to be looked at separately for evaluation:

What price (if not free), for multiple or single users? (Bought? Shareware? Freeware? Licenced? Homemade?)

Is it readily available?

What hardware platform required (type of computer PC/Macintosh, speed of processor, amount of memory, type of CD/disk drive, type of graphics screen capability, printer...)?

What other software needed as prerequisite (e.g. Windows, Soundblaster, particular fonts...)?

Does it have restricted compatibility with operating systems (e.g. Windows NT) or networks? Does it allow multiple use, backups?

What management required - i.e. someone's time to set things up and keep them running properly?

(b) Aspects of the teaching/learning situation that are usually present and which are relevant to deciding if (a) is suitable or not:

Specific school/learners - what do they have or can they afford in the above categories?

What school resources of staff and expertise are there to get things working and manage them?

(c) Does a fit b ? OR What b would a fit?

…. Go through all the a/b points above checking the match.

Can one even begin to consider this program - no point unless one has or can afford the platform etc?

Program design (A lot of these points broadly relate to 'userfriendliness' of the software, or the ‘computer-user interface’, largely independently of any pedagogical value, but overlapping a bit)

(a) Aspects of software that are usually present and need to be looked at separately for evaluation:

How is the program loaded and run?

Speed?

What typing, deleting, mouse use, clicking buttons and suchlike basics are required?

What is the navigation means (menus, buttons, icons etc.) to jump back, forward, begin again, see where you are in the program etc? Organisation of component exercises etc.?

What means like Escape/f10/Home etc. to exit program at any point?

Does the program readily crash or hang when the wrong keys are pressed (e.g. Break, Escape...)? Or when you click fast with the mouse? Idiotproof?

Does it deal with responses with trailing spaces, mixed cases, numbers when words are required etc. etc., or consider them 'wrong' or crash?

Does it cope with typos, slight misspellings?

What output features: Sound, Graphics, Video, Written fonts, Screen layout? Presentation? How multimedia is it?

Clarity of screen layout – e.g. text size, chunking, margins?

Clarity of icons and their style (cartoon?)?

Can features like sound be switched on and off? Can graphics be skipped when one doesn't want to wait while they appear, but get on with the task?

What instructions provided - amount of them and the language they are in, and level of difficulty? (A reflection of how far the software is general purpose versus targeted on a specific set of learners in a particular class/country/level)

Separate booklet and/or online help about how to work things?

Opportunity to print?

Opportunity to save uncompleted tasks or scores under individual ID and carry on next time?

Is content fixed or allowing/requiring to be provided by teacher etc? Authoring procedures? Or indeed is the software only an authoring language?

Kind of program in computational terms (pattern matching, AI, parsing....)? If on WWW is it in HTML, Java…?

(b) Aspects of the teaching/learning situation that are usually present and which are relevant to deciding if (a) is suitable or not:

Specific users - what can they manage, given their prior experience of computers? What do they find clear and 'friendly'? Are they even familiar with the querty keyboard?

Specific users - what appeals to them as attractive/important in a program? How sophisticated are they?

Specific users - what instructions can they understand easily (given their competence in the language the instructions are in). What computer actions do they know already as against need to be trained to do?

What facilities for hard copy and individual scoring are needed by course requirements?

Teacher - what time/inclination to author, what expertise at authoring?

(c) Does a fit b ? OR What b would a fit?

…. Go through all the a/b points above checking the match. E.g.

Are the program features too poor? too unattractive? sound obtrusive/irrelevant? … given the experience and expectations of these learners.

Is there so much that is unfamiliar that the students and/or teacher would spend too much time just mastering the technology, not doing real language work?

etc.

Pedagogically relevant features (These are mostly to do with either the language or the task).

Note1: where a program is content free or allows authoring, some of the things listed here under (a) are not so much features of the program itself as of whatever someone put into it. One needs to distinguish where one is evaluating what the program does pedagogically as against pedagogical aspects of things not fixed by the program but probably organised by the teacher, like maybe some of its language content or the precise task done with it. This applies especially to content-free utilities like email or word processing: just as one cannot pedagogically evaluate a pen, one cannot evaluate email without considering some specific kind of communication or task performed with it – i.e. its use.

Note2: often a good deal of what one evaluates here would be the same for an equivalent task done in a non-computer form. Maybe one needs to distinguish between evaluation of matters that are specific to the fact that the material is on computer rather than in book form and those that are just to do with the language or task etc. and would arise whether it was computerised or not. You may find quite a lot of evaluation points you come up with are ones that really apply to evaluation of teaching materials, tasks, tests and EFL reference materials (dictionaries etc.) in general.

Chapelle 2001 reduces all the following to five points on p55ff. …

(a) Some aspects of software that are often present and need to be looked at separately for evaluation:

What are the stated objectives of the program, if any? Is it 'dedicated' to EFL/learning some language, or not (e.g. 'generic' like WWW)?

Does the program address only the learner or also the teacher?

What type of language is involved? General or ESP, UK or US, RP or regional....? What style/register/genre? Authentic or made-up? Is it accurate for the type of language intended?

What language proficiency level is the language at? Beginner - Intermediate - Advanced - Native Speaker? What is the targeted level of learner?

What (combinations of) areas/levels/skills of language are most focussed on/most have to be handled by the user?

Levels: Sounds - Orthography/spelling/punctuation - Grammar - Vocab - Discourse

Skills: Reading - Writing - Speaking - Listening

Within the above, what specific structures, bits of language, subskills etc. are focussed on (i.e. forced to be used, or have opportunities presented for their use, etc.)… or is the language element minimal and other non-language aspects dominate (e.g. gaining electronic literacy)?

How is comprehension ensured? I.e. how is input made meaningful? Translation, pictures...? How much focus is on meaning?

What sorts of offscreen language activity does it promote/allow?

Does it involve users in handling language in an integrative or discrete point manner?

Metalinguistic element: how far is the work performance versus awareness oriented? (I.e. does it develop knowledge and use of language or knowledge about language more?). If the latter, what terminological tradition is it in then?

What is the non-linguistic content, if any: topic, theme etc. (e.g. a whodunnit story, general target knowledge of target culture, selling burgers, ....)? What cultural or pragmatic content (NL or TL)? Realistic or stereotyped?

Relates best to which of the following from the 'teaching sequence'?

Initial 'Presentation' of new language. If communicative, what sort of focus on form techniques does it use? (Cf Chapelle 2001 p49).

'Practice'/Learning of language already introduced elsewhere, or game involving such

'Production'/Use in a realistic more or less communicative way of language already learnt.

'Testing'/scoring of ability

or all? Complete course?

What task type(s) are involved? Fluency/accuracy? On a scale between

Open (i.e. with no specific language points inevitably focussed on, and no way the computer checks on the language the learner supplies; learner in control)?

adventure, simulation, 'real' communication/interaction e.g. with penpal....

Closed (i.e. with specific language points inevitably focussed on, and the computer can check entirely and say right/wrong; computer/teacher in control)?

a/c - m/c - o/c. One right answer or several allowed?

completion - substitution - order change - deletion - matching - translation - imitation/repetition - monitoring.....

Strict start to finish regime (programmed learning style), or cyclical, or allows browsing, selection, exploration? If there is an inbuilt order, is it difficulty based?

Is the task unique to computer, or similar to a non-computer task? (So how does it differ?)

How reusable is it? How much variation in the program is there? Does the program work the same way/with same material every time it is used? Authoring again?

Does it require/allow whole class work, or pair and group work or only individual work?

Does it allow/involve time limits? Competition with self or others and score keeping?

What strategies it teaches/practises: how far are they language ones (e.g. revising strategies for use when writing) versus universal ones (e.g. metacognitive ones like planning how to tackle a task in advance) as against ones that just help you do that sort of task (rather than learn/use language successfully), e.g. short term memory, logical reasoning, strategies for beating the program? (It may help to think whether the task would be a challenge even if done purely in learner's NL).

Feedback type(s) it supplies or engenders? Amount of interaction between computer and user?

computer-as-teacher feedback - peer feedback - self feedback

immediate error trapping or delayed till end of task?

hints provided, or just right/wrong?

how detailed and tailored to individual users/items?

coping with near answers? silly answers?

variety of response, or always the same type?

affective aspect of feedback (e.g. just 'correct' or 'well done Suzy!' with a tune?)

allows 'cheating' - go direct to complete answer?

provides error profile or score at the end?

What is the overall 'role' of the computer? (Levy 1997  p83-4, 126-129 and ch7; Jones and Fortescue p5-6)

magister - pedagogue - stimulus (Pennington and Stevens 6) - reference resource - tool - learner...?

What philosophy of teaching/learning does it seem to espouse? (Programmed learning/Audiolingual? Communicative? Krashen/input hypothesis? Humanistic? Learner autonomy?....) Cf Levy 1997 p154-155

etc. etc. let the program itself prompt more pedagogic questions: try its various options

(b) Some aspects of the teaching/learning situation that may well need to be taken into account when deciding if the aspects of the program listed in (a) are suitable or not for particular learners in a particular situation:

Learners

What level of English/FL are they at? So what would be 'easy', what 'difficult' for them, what i+1?

What real language learning needs do they have? So what would be 'important' for them?

What exam needs?

What age are they? Hence what cognitive and affective requirements?

What non-linguistic interests/prior knowledge do they have?

Are they already 'motivated' or not? … to perform what kinds of language task? Their attitudes, willingness to communicate etc….

What is their typical learning style?

etc. etc.

The teaching

The syllabus - what type is it? (Structural, Functional, Task-based...?)

Where are they currently in the syllabus - what are they doing in class concurrently?

The teaching method they are used to, what is it primarily (Grammar-Translation, Communicative, learner-centred...?)

What exercise and task types are they already familiar with (e.g. cloze, role play, multiple choice, free composition......)? What other materials/media are they using?

What 'management' arrangement they are used to? Individual, pair, group or whole class?

etc. etc.

(c) Does a fit b ? Could it be authored/used in some way so that it fitted them better? OR What b would a fit?

…. Go through all the a/b points above checking the match. For example....

Will the task fail for some reason with these learners? NB This may well require consideration of general principles of language learning and teaching to establish what would suit your learners.

Is the task well defined (e.g. by program or teacher)?

Is the task familiar to the learners? Does it need to be?

Does the language element meet the needs of the learners - i.e. is it the right type of language etc. and is it accurate?

Is there a match between the level of ability of the learners and the level of the language in the program?

Does the program fit the sort of interest the learners have- is it motivating for that reason … or by its novelty?

Are the translations, if any, misleading? Is misunderstanding possible?

Is the language etc. pragmatically plausible?

(It can help to think, ‘what would my ideal version of this have that the one I am evaluating hasn’t got?’ It can also help to think, ‘how could this program be used so as to get the best out of it and minimise the bad features for particular learners?’)

 

 The sort of checklist I don’t like…

…but which students often come up with, and are often found in dissertations, brief articles…is the type which:

·        Is a mixture of statements and questions

·        Has no declared basis in other people’s lists, or in any sources in the literature such as models of language teaching, task-based SLA etc.

·        Appears to be a jumbled set of items in no particular order where pedagogical points are interspersed with software programming points higgledy piggledy: no apparent rationale

·        Does not appear to exhaust obvious sets of things: e.g. it has a point about graphics but not sound, or about listening but not reading…: no completeness, just apparently subjectively eclectic

·        Mixes items that address a feature of the program alone with ones that address the match between a program feature and something to do with the learners or teaching situation. E.g. the first question below is just a question about the software (my a above) so has no evaluative element without adding from somewhere unstated a consideration of what the needs, classroom writing instruction  etc. of the learners are (b), and then considering the match/suitability of one to the other (c). However, the second question embodies all three elements in one: it is a complete evaluation question, and in fact is so general that it includes the first item!

o       How does the software use the writing medium?

o       How well does the software match pupils’ expectations and the needs of the course?

·        Has items with hidden evaluation criteria unstated. Questions of the first type above, though only mentioning an aspect of the software, often seem to imply the b and c aspects unstated. Sometimes that may be OK if a universal requirement is involved, as in the first item below: nobody would question that instructions should be clear for any user. But in the second it is by no means certain that all learners need the feature mentioned (e.g. Saudi Arabian learners of English, since in SA it is usually considered appropriate to teach English through materials set in an SA context). Loaded items like this should be challenged and justified, not just slipped in:

o       How clear are the instructions for users? (Implies: Instructions should be clear for any user)

o       Does the software attempt to create a target language context? (?Implies: A target language context would be a good feature for all learners?)

Such lists may be defensible if they have been tailored to evaluate a specific program in a specific situation… Then certain things may not be relevant to include. But often how the refinement was made from larger/more complete lists, or how otherwise and on what reasoning just this set of points was arrived at  is not made clear. One just has to take the judgment of the person offering the checklist, unsupported, as sufficient endorsement of its validity. But in language learning and teaching research ‘I think this is a useful checklist’ is not enough…

PJS rev Jan 03