Corpora were not just used for linguistic research, they were also used to compile dictionaries (beginning with the American Heritage Dictionary of English Language in 1969) and grammar guides such as A Comprehensive Grammar of the English Language , published in 1985 .
Experts in the field have different opinions about annotating a corpus. These views range from John McHardy Sinclair , who advocates minimal annotation so that texts speak for themselves, to the Survey of English Usage team ( University College, London ), who advocates annotation as enabling greater linguistic understanding through rigorous records.
A corpus is a database in which everything written and spoken in a language is stored. Scientists who study a language ( corpus linguists ) take everything that is published in a language ( English, for example ) and put it on a computer: texts from newspapers, books, magazines, pamphlets, newsletters, medicine leaflets… can take everything possible and save it on a super computer. All this information gathered in one place is called a written corpus ( after all, we only have written texts there ).
As for the spoken corpus , the thing is much more interesting. Linguists record ( with people’s permission ) conversations at work, in the supermarket, at home, on the phone, on the streets, park benches, buses, etc. They also record TV shows, interviews, radio shows, news, etc. Afterwards, they transcribe everything and transfer it to the computer, thus obtaining the spoken corpus ( the data of the spoken language ).
With these two sets of data – written corpus and spoken corpus -, we – linguist researchers – can verify everything with the help of a program developed to search the information in the corpus . So we can discover interesting things. Corpus linguistics definition
For example, did you know that the most used word in the English language is the article “the”? This in the written corpus ! However, if we evaluate only the spoken corpus , we will find that the most used word is the pronoun “I”! If we put the two corpus together, “the” wins out over everything that is a word.
Another curiosity: did you know that the passive voice in English is used much more often in scientific and journalistic texts? In other words, if you want to learn English, just to travel and make friends, you don’t need to memorize the rules of passive voice in English. But if you want to be a good journalist or write good scientific texts then the conversation will be different.
With the corpus we also discover which words are most used with other words ( collocations ). We found that the present perfect is used more often than the past simple . And we also found that the present simple is by far the most used tense in the English language.
Anyway, with this wonderful science English teachers can have an idea of what to teach their students. Book authors can write more accurate information about one grammatical structure or another, they can also tell readers and students how words are used in conjunction with other words.
And that’s how folks, based on this information, I tell you how a word or another is used in English and how I also inform the ranking of another word. I remind you that the explanation given here is very simple and just to satisfy the curiosity of many. After all, there is still a lot to be said about such corpus linguistics and its benefits to the teaching/learning of a language. Corpus linguistics definition