CREA & CORDE

junio 22, 2008 at 11:27 pm (Language resources)

CREA
It is a Corpus of Reference of the current Spanish.It has all the variantions which Spanish can have nowadays.This corpus has a mixture of written and oral texts. These texts are from 1975 until our age.
  1. Its structure of work is the following:
  2. It determines the dimension of the corpus
  3. It can classify texts in a chronological way, spatial, etc.
  4. Acquisition of texts
  5. It can classify texts:
Its works began in 1996.With texts edited between 1975-1999, its first phrase was in December,2000.It is a finished phase but in continuous review.It possesses 130 million forms, so that there were fulfilled the aims(lenses) marked to the beginning of the project. At the end of 2004, it possessed 170 million of forms.
Its materials are selected by a series of parameters:
1. Means :
90 % corresponds(fits) to the written language and 10 %, to the oral language.
Of this 90 %, 49 % are books, other one 49 % is remaining press and 2 % gathers the texts that we name a miscellany: leaflets, prospectos.
2. Chronological:
In periods of five years: 1975-79; 1980-84; 1985-89; 1990-94; 1995-99.
3. Origin:
The texts belong to: 50 % to Spain and other 50 % from Spanish America. 50 % of Spanish America is distributed in linguistic zones according to the number of speakers. The zones are: Andean, Caribbean, central, Chilean, Mexican text.
4. Type of text:
It is formed by three blocks of materials: books and press miscellany transcriptions of spoken language
Books and press.
They were divided in two blocks: fiction and not fiction
These two blocks were divided too in other seven blocks. These blocks are distinguished because of their capacity and the number of forms. They are:
     1. Sciences and technology
     2. Social sciences
     3. Politics, economy, and finance
     4. Arts
     5. Leisure
    6. Wealth
    7. Fiction: novel, statements, theatre
Miscellany
It is divided in two blocks:
   1. Printed
   2. Not printed
Oral corpus
It takes information from television programs or radio programs.
It has one big block which has subspecies:
    1. News
   2 . Reports
   3. Interviews
  4. Debates
  5. Gathering
  6. Documentaries
  7. Sport news
  8. Magazines
  9. Sport magazines
 10. Varieties
  11. Drawings and contests

CORDE
What made that RAE created this corpus, CORDE, was the good results obtained guring the first months of its project.
CORDE consists on something more of 300 million forms which proceeded from texts from the origin of language until 1974.
Bibliography
http://corpus.rae.es/cordenet.html
http://corpus.rae.es/ayuda_c.htm
http://www4.ujaen.es/~ncontrer/interele/links.htm
http://216.239.59.104/search?q=cache:vYr6ppD3loUJ:paginaspersonales.deusto.es/abaitua/konzeptu/ta/soria00.ppt+crea+corde&hl=es&ct=clnk&cd=9&gl=es
Advertisement

Deja un comentario

Fill in your details below or click an icon to log in:

Logo de WordPress.com

You are commenting using your WordPress.com account. Log Out / Cambiar )

Twitter picture

You are commenting using your Twitter account. Log Out / Cambiar )

Facebook photo

You are commenting using your Facebook account. Log Out / Cambiar )

Connecting to %s

Seguir

Get every new post delivered to your Inbox.