ADVICE ON DOING RESEARCH INTO THE ACQUISITION OF SYNTAX

Andrew Radford, October 2001

INTRODUCTION

PhD-level research in the area of first language acquisition is often based on a substantial naturalistic corpus -- i.e. a large sample of the spontaneous speech of one or more young children acquiring whatever language you are interested in. There are two possible sources of data for a corpus-based study. One is to use data from existing public or private corpora -- e.g. the CHILDES corpus is publicly available on compact disc or via the internet, or the new (as yet unanalysed) JEROME corpus (a Canadian mother's record of the daily development of her four English-speaking children from birth until their teenage years; this is available from me); I also have a substantial private corpus of data on the acquisition of English as an L1 by children from one to four years of age, a corpus of Italian data (collected by a colleague in Tuscany) and a corpus of Mexican Spanish data (collected by one of my former PhD students).

The alternative is to do a corpus-based study of your own (involving collecting your own naturalistic data samples). One possibility along these lines is a cross-sectional study, which involves making a number of recordings of various different children in a given age-range (e.g. 1;6 to 2;6 years) acquiring a particular language; a second possibility is to do a longitudinal study of one or more children, recording weekly samples of the child's spontaneous speech in interaction with his or her caretaker/s (though note that because of the extensive intrusion into private family life which a regular longitudinal study involves, it is generally only practicable to do this kind of study for one of your own children, or for a child of a very close friend or relative). The main advantage of using a public corpus like the CHILDES data-base is that the relevant data are already transcribed and tagged (for computer searches); the obvious disadvantage is that the relevant corpora have already been analysed dozens of times before -- are you really likely to have something radically new to say about them? The obvious advantage of making recordings of your own is that you end up with novel data which nobody else in the whole world has analysed; the disadvantage is that it can take an inordinate amount of time to collect, transcribe, and (if you decide to) tag the data. See the section COLLECTING A CORPUS below.

If you collect your own data, you may find it useful to supplement corpus data with elicited data. For example, if you are studying the acquisition of negation and the child you are studying doesn't produce many negative sentences, it may be a good idea to supplement your corpus data by elicited data. Two recent books giving details of elicitation techniques suitable for use with young children are

TOPIC OF YOUR STUDY

You have to be prepared to be flexible when using corpus-based data, and not to narrow down your thesis topic prematurely. This is particularly true if you are using a fairly small corpus of your own: you have to be prepared to shift the focus of your study in the light of the data you collect. For example, suppose you wanted to study the acquisition of negation in English. A crucial question which arises (for a variety of theoretical reasons) is whether very young children (e.g. aged 20 months) position negatives before or after subjects (e.g. Do they say 'No Daddy have one', or 'Daddy no have one'?). Alas, what you will find is that what they typically say is 'No have one' -- i.e. their negative sentences typically have null subjects, so making it almost impossible for you to be sure exactly whether the negative is positioned before or after the (invisible) subject. If this is so, you have a choice between supplementing your data by elicited data (see above), or choosing to focus on a different aspect of the data in your corpus. For similar reasons, it's difficult to study (say) the syntax of noun phrases in early child grammars: what you want to establish is (e.g.) whether young children systematically position determiners before prenominal adjectives, whether they make adjectives/determiners agree with the nouns they modify, whether they have the right morphosyntax for noun complements (e.g. 'a picture of Mummy', with genitive of). But what you will inevitably find is that young children use very few nominal modifiers of any kind, almost never use noun+complement structures -- and so you simply don't have enough data to make into a workable thesis.

My advice (based on two decades of working on child language corpora) would be to start out with a general topic along the lines of Aspects of the Acquisition of Clause Structure in L (where L is the language you choose to study). There are both theoretical and developmental reasons for choosing such a topic. The theoretical reason is that clause structure is at the very hub of a lot of work in theoretical syntax (cf. Pollock's 1989 work on verb raising, Chomsky's 1995, 1998, 1999, 2001 work on Minimalism, Rizzi's (1997) work on the syntax of the Left Periphery, etc.) -- hence there are plenty of ideas around for you to test in relation to developmental data. The developmental reason is that children produce lots and lots of clause structures in their spontaneous speech, so you are unlikely to be short of data to talk about if you are doing a naturalistic study of the acquisition of clause structure. A thesis on the acquisition of clause structure usually gives you plenty of scope for 'specialist' chapters on particular topics (e.g. one chapter on the syntax of interrogatives, one on the use of null arguments, one on the morphosyntax of verbs, one on case-marking, etc.). Of course, if you end up with a lot of data on a particular topic that interests you (e.g. the syntax of wh-constructions, or case-marking), you can (at that point) choose to narrow down your thesis to the topic in question: but it's unwise to start out with the assumption that you're going to work on one narrow topic.

OVERALL ORGANISATION OF YOUR THESIS

A PhD thesis by supervised research is up to 80,000 words in length at Essex. An 80,000-word thesis will typically contain around 6 chapters (e.g. an introduction, a conclusion, and four core chapters). You can break your chapters up into numbered sections if you wish (e.g. 1.1, 1.2, 1.3, 1.4, etc.), and even into subsections (e.g. 1.1.1, 1.1.2, 1.1.3), but avoid having too many sections and subsections (A rule of thumb would be that no subsection should be less than 1,000 words in length). Use footnotes only if you have to -- and avoid making the footnotes excessively long (since they break up the reader's reading rhythm): it's a good idea to print your footnotes at the foot of the relevant pages in the text.

Two of your chapters have a standard format: chapter 1 is an Introduction which sets out the background to your study; and the last chapter is the Conclusion, in which you summarise your main findings, highlight the strengths and weaknesses of your research, and point to possible future areas of research. This leaves four central chapters which form the core of your work -- i.e. where you analyse various aspects of the data in your corpus. It is important that each of your core chapters should be of roughly equal length. A thesis which has (say) one core chapter of 20 pages and another of 80 looks 'unbalanced'; so, if any particular chapter starts to become longer than the others, try and break it up into two separate chapters in some way. There is a particular danger that the first chapter (the Introduction) can grow to disproportionate lengths, as you keep adding bits and pieces (e.g.) to your historical background section. However, the first chapter (which sets out your research agenda and involves a literature review of some kind) is in many ways the least original, so it is unwise to make it more prominent than other chapters by making it unduly long.

GETTING STARTED ON YOUR FIRST CHAPTER

It's a good idea to start writing as soon as you can. Writing is a skill which comes with practice (It takes an awful lot of practice for most people). Don't delude yourself into thinking that you'll spend three years collecting and analysing your data, and then write up the results in the summer vacation of your final year: the odds are that you probably won't even be able to complete a decent draft of the introduction over the summer, let alone complete the whole thesis. Moreover, as your supervisor, I will certainly want you to make substantial corrections to any draft chapter you produce -- indeed, it's not uncommon for drafts to have to be rewritten several times. So, the best idea is to start writing as soon as you start your research. And the obvious place to start is chapter one (Introduction).

Your first chapter might be organised into sections in the following way (where T is the topic of your thesis, and L is the language you are working on):

(though if the chapter threatens to exceed 10,000 words, you should move some of the material from e.g. 1.3 and 1.5 into later chapters). In section 1.1, you outline your topic (the provisional topic being the acquisition of clause structure in Portuguese, say), and say why you have chosen this topic (e.g. much contemporary theoretical and developmental work is oriented to clause structure -- cite some references), and why you chose to study language L (e.g. there has been little serious developmental research on L so far), and why you chose to study the particular stage of development you are focusing on (If you are studying children under 2;6 years, you can say that children after that age are generally agreed to have acquired most of the adult clause structure, and the main debate in the acquisition literature relates to the nature of clause structure in children under 2;6).

In section 1.2, you say that what the descriptive framework is that you will be using, and why you chose that framework. If your chosen framework is PPT (Principles and Parameters Theory), the main point to note is that PPT is a theory which acknowledges the symbiotic relationship between theoretical and developmental research. For example, Chomsky has claimed in various works that the ultimate goal for linguistic theory is explanatory adequacy, and that this can only be achieved by a theory which can explain acquisition (Find some suitable quotations). There is now general acceptance that learnability is a key criterion of adequacy for linguistic theory (cf. work by Wexler and others). Rizzi has argued (1994) that developmental data provide an important means of testing linguistic theory, etc. You should also note that PPT makes explicit claims about the nature of language acquisition (viz. that it involves parameter-setting, etc.). You should further note that PPT has spawned a whole range of insightful acquisition studies -- one of the earliest and most notable being Nina Hyams' (1986) study of the acquisition of the Null Subject Parameter. If you wish, you can include a brief outline of the main modules in a PPT-based grammar or (more simply) give some examples of typical parameters. BUT... note that many students are rather weak at providing detailed technical descriptions of the framework being used, so it might be wise for you to avoid doing this.

In section 1.3, you can provide a brief outline of adult clause structure in the language you are studying: don't make this section too long (indeed, you can even omit it if you want to). Such a section would be useful if you are studying a language very different from English, but is less important if you are studying the acquisition of English (where the relevant ideas should be relatively well known). If you happen to know (or want to say) a lot about clause structure in your language, and if your language is different from English in interesting ways, you can even expand this into a separate chapter (e.g. chapter two). However, think carefully before you do this, since if all you are going to do is summarise the work of other people, you will end up with two chapters which largely summarise other people's work -- and that's not such a good idea (since the main criterion by which a thesis is judged is its originality). On the other hand, if you have original ideas on the syntax of clauses in your chosen language, a full chapter on adult clause structure would be fine: the revised emphasis can then be incorporated into your title, which might be (e.g.) 'The nature of clause structure in adult and child grammars of L' (L = the language being studied).

In section 1.4, you provide an outline of alternative theories of acquisition (e.g. the Radford/Guilfoyle & Noonan/Vainikka/Tsimpli structure-building model, the Clahsen and Penke impoverished functional architecture model, Rizzi's truncation model, Wexler's optional/root infinitives model, Schütze's underspecification model, etc.). It's important to be aware that examiners generally expect a thesis to focus on a specific set of hypotheses to be investigated (and evaluated) within a specific model; if (for example) you are going to choose the underspecification model (e.g. testing Schütze's 1997 claim that all early child clauses are IPs but that INFL may be underspecified for tense and/or agreement), you must justify choosing this model rather than others: in general terms, this means highlighting the strengths of your chosen model, and the weaknesses of the other models you have chosen (pointing out the empirical and theoretical problems they pose). But beware of giving the impression that you are slavishly adhering to a particular model, and blind to its potential weaknesses; you should show some awareness of potential problems with your chosen model (and indeed one of the aims in your thesis should be to see whether -- and if so how -- these can be overcome). Examiners want to be able to see evidence that you evaluate all ideas with equal objectivity (which doesn't preclude you from having a 'favourite' model of your own, as long as you show that you're aware that it has potential imperfections). How far back you go in your literature review is up to you; personally, I think there is no need to go back more than 10-15 years.

In section 1.5, you should write a review of existing studies of the acquisition of particular aspects of clause structure in the language you are studying (of course, it may only be after analysing quite a large part of your data that you decide exactly what the main focus of your study will be). Your review should be selective, not simply an unreflective summary of a series of papers. By being selective, I mean that you should only single out key articles for discussion, highlighting the main strengths and weaknesses of each article, and not get embroiled in detailed discussions of how the authors analyse particular constructions (e.g. a discussion of how a particular author analyses case errors would be more appropriate to discuss in the background section of a separate chapter on the acquisition of case marking). If you present a review of a particular paper to a seminar group meeting, bear this in mind: if your seminar paper is 10 pages long (and you have another 10 or so key papers to review in the literature review section of your thesis), the obvious conclusion is that your review paper is simply far too long to 'fit' into your thesis. If there is too much important background literature to review in this section (or if other sections of chapter 1 become so long that this section is being squeezed), you should think about including the relevant material in your core chapters (e.g. putting your review of existing work on case in your chapter on case, your review of existing work on agreement in your chapter on agreement, and so on). For reasons already noted above, it's not a wise move to let chapter 1 become too long.

In 1.6 you give a detailed justification of the data-base used for your study -- e.g. are you using naturalistic data and/or elicited data, and what are the reasons for your choice? What are the advantages and pitfalls of naturalistic data/elicited data? What is the nature of (and rationale for) the corpus of data which you are using: e.g. How many children did you record? What were their backgrounds and ages? What equipment did you use for making the recordings? What practical problems did you have in recording, transcribing and analysing the data? Of course, you can't really answer most of these questions until after you've collected your data, so you'll probably have to add this section later. It is important to identify a key set of research questions which you are setting out to try and answer through your research -- i.e. a set of hypotheses which you are aiming to test.

One final comment to note about chapter 1 is that this is a chapter which you may have to revise extensively after you have written all your core research chapters; this is because detailed analysis of relevant data in specific core chapters may change the overall focus of your research, and this then MUST be reflected in appropriate modifications to chapter 1 (e.g. if your original version of chapter 1 lays a lot of stress on the theoretical significance of tense errors but your detailed analyses in the core chapters highlight the significance of case errors, there is an obvious mismatch which -- you can be sure -- will not escape the attention of the examiners).

COLLECTING A CORPUS

There are several questions which you have to ask yourself when collecting a naturalistic corpus. One is what stage of acquisition you want to study. By and large, the most interesting stages (from both a developmental and a theoretical point of view) are the earliest stages; what's most interesting about child syntax is how it differs from adult syntax -- and hence the most interesting stages are the earliest stages (prior to the age of two and a half or three years, by which time most children are capable of producing adult-like structures). So, an ideal period to study is the period between one and a half and two and a half years of age. One possibility is to try and study the whole period. The most obvious way of doing this is by a longitudinal study of one child followed over the period. But protracted longitudinal studies are fraught with difficulties: the child may reach the two-word stage much later than you expect; the child's speech may initially be relatively unintelligible; the child may go away on holiday (making it impossible for you to continue your recordings) at a crucial stage -- or indeed you may go away on holiday; either you or the child may be ill for a certain period or time...and so on. Worst of all, you will have to wait 12-18 months before you have finished collecting your data, which means it will be a long time before you can start analysing it, let alone writing up your analysis. In practical terms, it makes much more sense to do an intensive study of a one-month period in the child's development, perhaps recording the child for 1 hour a day over a period of a month. (Wexler and Poeppel in a 1993 article in Language took this approach to a new extreme in discussing one day's speech output for a boy called Andreas). As a rough guide, I'd suggest that you need to collect somewhere around 30 hours of recordings for the child you study (which should be enough to give you a corpus of 5,000-10,000 child utterances). There more data you have from a given child, the longer it takes to collect and transcribe, but the more 'evidence' it provides you with about language development. It's important to have enough data to be able to present quantitative data (e.g. on how many subjects in the child's root clauses were nominative pronouns, how many were accusative pronouns, and how many genitive pronouns). My own advice (if you want to do a naturalistic study) would be to do an Andreas-like intensive study of one child over a short period (e.g. recording 5-10 hours per week over a period of 3-4 weeks).

There are three main problems you have to be wary of in choosing a child to record. One is that some children have a relatively immature phonology, and it can be difficult to work out exactly what they are saying (Such children don't make good subjects). Another is that some children don't produce a very wide range of structures (e.g. don't ask questions) and in extreme cases may produce mainly one-word utterances (These aren't very good subjects either). A third is that some children are so advanced (even at a very young age) that their grammar is virtually error-free: and I doubt whether you can fill a dissertation with insightful and original observations about a 'perfect' child. Instead, try and find a child who is making frequent grammatical errors (either of commission or of omission).

RECORDING CHILDREN

One of the choices you face is whether to use an audio or video recorder in making recordings. Of course, a video provides more contextual information, but tends to be more intrusive (and often the sound quality is not that good). A more practical alternative is a small battery-operated portable audio recorder (e.g. a Sony Walkman recorder or even a minidisc recorder) using a good quality external microphone: this will generally produce very high sound quality, provided you can keep the microphone near enough to the child. Make the recordings using batteries rather than mains electricity (The latter can produce a 'hum' on the sound track, and impair the recordings): it might be a good idea to invest in a set of rechargeable batteries.

Always remember that your goal in recording a given child is to see just how wide a range of structures that child is capable of producing in his or her 'free speech'. So, the first question to ask is 'Who is the best person to talk to the child?' Children very often don't like talking to strangers like you, so it may be a better idea to record them in conversation with their mother, father, brother, sister or friends -- they may 'open up' a lot more if they are talking to someone they know well. It's helpful to be present when the recordings are made, so that you know the context in which a particular utterance was made. It can be important to have recordings of the mother's (or other caretaker's) speech, since one of the questions you inevitably ask e.g. when children produce null subjects is: 'Are they producing them more or less frequently than adults do?'

There are several obvious things to avoid in talking to children. One is to avoid asking questions which elicit one-word answers (e.g. looking at a picture book and asking 'What's that?' -- the child will reply 'Train', etc.). Another is to avoid 'routines' (e.g. nursery-rhymes and other rote-learned chunks of language), since these are not genuinely representative of the child's spontaneous speech. Don't record children while they're watching their favourite TV cartoon -- they won't say much! Try and get the child used to the microphone (Initially he'll want to bite it, shake it, break it, etc.). And remember to be really nice to the parents (Tell them about how amazingly interesting their child is) -- otherwise they won't invite you back!

TRANSCRIPTION

The most tedious part of data-collection is always transcribing the data. The Department has some transcription machines which you can borrow. You should make an orthographic transcription of everything the child says (and add an English gloss if the language is not English) and an orthographic transcription of what other people say -- as well as including notes on the context in which an utterance was produced. Where what the child says is unclear, you should give a phonetic transcription. You can expect that it will take you 10-20 hours to transcribe one hour of recorded speech. It is sensible to use the CHAT computerised transcription and coding system developed for the CHILDES data base, since this contains programs which enable you to (e.g.) find (and print out) all examples of nominative subjects in the corpus at the press of a button.