A linguistic corpus is defined as “a set of texts of the same origin” and whose function is to compile a set of documents such as essays, plays, transcripts among others in order to gather in the same database or program the use of a language term at any given time. In our case, the Spanish language corpus are administered by the Royal Spanish Academy from the twelfth century to date.
The use of a linguistic corpus allows us to obtain reliable information of words used more or less frequently in a language, allowing us to analyze and know which term is the most suitable for both a translation, research or language studies such as its morphology, syntax and lexicon, thereby identifying the term record, that is, formal or informal.
Finally, the linguistic corpus are the basis for any study of a language as well as the basis for reliable consultation when using a term in a translation by having a real sample of the language in use.
Corpus linguistics is a branch of linguistics that bases its research on data obtained from corpus, that is, actual samples of language use . Strictly speaking, the term does not define a linguistic discipline, such as morphology, syntax or pragmatics, but a methodological approach that can be adopted from diverse disciplines, which is opposed to a methodology based primarily on introspection.
In the specialized literature, the term corpus (plural corphora, although the corpus form is also used ) is used in two different senses. In a first meaning, it designates the compilation of linguistic material made for a specific research purpose, be they samples of sentences , sentences or texts . This use of the term is frequent in the field of applied linguistics , especially in language acquisition and learning research. In this sense, for example, a researcher interested in studying some aspect of the interlanguage of the learners of an L2You can gather a series of productions (long or short, made ex profeso for this purpose or originally intended for another learning objective), which will constitute your study corpus; This is the data that is used for that specific study. In a second meaning, a corpus is an extensive collection of texts (written, oral or both) collected in order to serve as a representative sample of a language, as a set of real linguistic data that reflect the use of the language (or the type of specific language) of which they want to be representative. It is to this second meaning that the linguistic term of corpus refers.
In its conception, the linguistics of corpus is very old, since there are many classic works that have based their descriptions on real samples of language. This is, for example, the only working method that studies of historical linguistics or acoustic phonetics have known; likewise, it is the method that many of the classical grammars (basically sentence corpus) have followed and that, in the last century, adopt some works on vocabulary , for example Gougenheim et al.(1956), or the way in which A. Juilland and E. Chang-Rodríguez developed their dictionary of Spanish frequencies in 1964. However, as a branch of linguistics, corpus linguistics had its peak since the 1960s and 70 of the last century, encouraged by the possibilities that informatics offered to process and manage sets of texts with an increasing number of words..
The large corpora in computer support constitute a very rich source of information on the use of the language, whether grammatical, semantic, lexical, discursive or other information. They are used, for example, as a source of information for the writing of dictionaries. They have also allowed a great advance in computational linguistics, concerned with the automatic processing of natural language; the application to the corpora of the automatic analysis tools provided by this linguistic discipline, combined with statistical analysis programs, allows to obtain vocabulary frequency listings and detect habitual linguistic structures. Other applications that a corpus allows (Biber, 1993) are translation, through the use of bilingual corpora, and voice processing.
In the field of second language teaching, corpora have been used primarily as a research tool that allows the detection and recognition of the most frequent uses and structures, with a view to including them in the curriculum. In particular, word frequency lists have been used, which have served as a starting point for the preparation of vocabulary listings that should be taught at the various levels of learning. The applications in the preparation of dictionaries for learners should also be highlighted; In this sense, the Cobuild dictionary for advanced students of English (Sinclair, 1987), which pioneered this field , deserves special mention .