Definition of corpus
According to Wikipedia, a corpus is:
“In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). They are used to do statistical analysis, checking occurrences or validating linguistic rules on a specific universe.” (Wikipedia)
There are several types of corpus. They are subjected to a process known as annotation
English language:
- American National Corpus
- Bank of English
- British National Corpus
- Corpus Juris Secundum
- Corpus of American English 360 million words, 1990-2007. Freely available online.
- Brown Corpus, forming part of the “Brown Family” of corpora, together with LOB, Frown and F-LOB.
- Oxford English Corpus
- Scottish Corpus of Texts & Speech
Other languages:
- Amarna letters, (for Akkadian, Egyptian, Sumerogram’s, etc.)
- Bijankhan Corpus A Contemporary Persian Corpus for NLP researches
- Croatian National Corpus
- Hamshahri Corpus A Contemporary Persian Corpus for IR researches
- Neo-Assyrian Text Corpus Project
- Persian Today Corpus
- Thesaurus Linguae Graecae (Ancient Greek)
Bibliography
http://en.wikipedia.org/wiki/Text_corpus
Advertisement