Wednesday, January 8, 2014

Concordance of Offline Reading 1

Pinar Tankut's workshop on the reading book "Offline Reading 1" was very enlightening.  I hadn't realized that this book, produced at the Ankara campus of METU in 2001, had used the out-dated 1984 University Word List (see http://jbauman.com/aboutUWL.html) as the basis for the vocabulary lists.  Neither had I realized that the book hadn't been updated since 2001.  What is more, Pinar's explanation of how the rather bizarre title 'Offline Reading' had been chosen prompted me to consider ways to exploit the 'offline' texts in a more accessible 'online' fashion.

Having been involved in creating corpora for the last decade, and publishing concordances online, I thought it would be a useful project, from a lexical point of view, to create a corpus from the OLR1 reading texts and publish a concordance that teachers and students alike could access.  So, I approached Pinar with the idea. We agreed to work together on the project and Pinar, on her own time, copied each reading text from OLR1 and saved them as individual files.  Thanks to her efforts, in my lunch break I was able to compile a corpus from these files and create an online concordance.  The concordance can be accessed from any browser at http://sanalokul.biz/metunccsfl/olr1-conc/framconc.htm.

The concordance is interactive in that you can select any word from the pane on the left and that word will be shown in the pane on the right as a keyword in context (KWIC).  The number beside the word in the right pane indicates how many times it appears in the entire OLR1 book.  Each occurrence of the word is shown in context in one line in the pane on the right.  The specific reading that that occurrence of the word can be found is shown on the far right.  Clicking on the link to the text will display the full context in the pane at the bottom.

This can yield some very interesting insights. For example, the 'classic' grammar explanation about the use of ANY is that "Usually, we use SOME in positive (+) sentences and ANY in negative (-) and question (?) sentences." (http://www.englishclub.com/grammar/adjectives-determiners-some-any.htm). However, in reality this is not the case, and it is reflected quite clearly in the OLR1 concordance extract below.  We can see that ANY occurs 16 times in the entire OLR1 book, in all units except unit 2.  But, what is quite striking is the fact that the majority of examples in the readings are of ANY used in positive sentences.


There are other interesting glimpses into the contexts and meanings of words, for example:
  • students see HIDE three times as a noun, meaning the skin of an animal, and only once as a verb.
  • students only see AGREE once in the entire reading book, and see no other forms (e.g., AGREEMENT, AGREED, DISAGREE).
  • students only see REPORT once in the entire reading book, as a noun.
  • STUDY appears in the text almost exclusively as a noun. Yet, our students mostly use STUDY as a verb.
What is particularly striking is the parcity of exposure to different word forms, meanings and contexts. The current thinking is that students need a certain amount of exposure to words before they can ‘acquire’ that word (in terms of receptive knowledge).  The actual number is open to debate (see http://utpjournals.metapress.com/content/xk7q2k77gp4j772w/) but the minimum figure is often set at seven exposures.  So, to consider that a student has seen a word enough times in a reading programme, one would expect that word to appear in at least seven different texts.  This made me curious to see how many words students would 'acquire' at the end of a semester of reading from the OLR1 book.  To do this I created a lexical frequency profile of the OLR1 corpus and focused on the 2,709 most commonly used words in English (see http://www.sciencedirect.com/science/article/pii/S0889490608000355) and from my very rough and initial analysis discovered the following with respect to the OLR1 reading texts:
  1. The exposure to vocabulary to acquire receptive knowledge seems to be limited to A1 and A2 in the Common European Framework of References for languages. 
  2. There is very little exposure to B1 lexis and beyond – which tends to contain words that people often refer to as more ‘academic’ vocabulary. 
  3. About 1,046 of the 2,709 most commonly used words are NOT in OLR1 at all. 
  4. About 1,177 words of the 2,709 most commonly used words DO NOT HAVE SUFFICIENT exposure (that is, they only appear in six or fewer readings). 
  5. That leaves about 486 of the 2,709 most commonly used words that DO APPEAR IN SEVEN READINGS OR MORE, which one would expect students to ‘know’ receptively.
This, in fact, confirms what I found out at the end of last year in the extended semester by testing students' knowledge of the most common words in English at the end of two semesters of study, starting in the beginner level.    See the graph below--the first five students from the left were in the PIN extended semester. Although the sample was small, it does illustrate the problem of lack of vocabulary development as none of the students came close to the minimum vocabulary research suggests is needed to cope with academic study in English (shown by the TARGET bars at the far right).  I also used a placement test to determine their proficiency level according to CEFR, and was shocked to find two students were only at A2, and the rest at B1.  Considering the general 'rule of thumb' is to allow 150 hours of study to progress from one CEFR level to another, a majority of students were taking 480 hours to progress one level.  This raises a lot of questions about the effectiveness of our current approach.


Not unsurprisingly, the students' proficiency levels correlated quite closely to their vocabulary knowledge, which again is what research suggests.  To see how a successful student compared, I included the vocabulary knowledge of one student who had passed the EPE in June before the extended semester began-the last student on the right next to the TARGET values in the graph above. This student's profile matched the target quite closely, suggesting that a measure of vocabulary knowledge appears to be a good indicator of language proficiency.  This particular student was extremely hard-working and had embarked on a self-directed programme of extended reading throughout both semesters.  

From this rather superficial analysis, it would seem that students need to do much more reading than they do, and certainly cannot afford to rely only on exposure to vocabulary in the reading texts supplied by the Ankara Campus of METU.  In an intensive period of study, it would also suggest that a much more systematic approach to vocabulary development (both receptive and productive) is needed.

Indeed, looking beyond the EPE and into the undergraduate programme, when I tested some of the METU students in the freshmen year for their knowledge of vocabulary for the first 10,000 words in English, the average was well below what is normally expected as the minimum required vocabulary knowledge to adequately cope with university study in English as the medium of instruction. See the graph below.  It would be interesting to test students in their final year of study to see what their vocabulary knowledge is like after four years of academic study.

It would seem that vocabulary development is a key factor in the success of students acquiring English, and that currently our approach is not providing the students with the opportunity to develop their vocabulary to the required level within the limit of the intensive period of study of two semesters.  The fact that it appears that many students are taking 480 hours to progress one level in the Common European Framework of Reference for languages (and some take even longer), rather than the general 'rule of thumb' of 150 hours also puts into question the efficacy of 30 hours of face-to-face instruction every week.  

The million dollar question is what can we do about it?