|
COES. General Information
and Distribution
Santiago Rodríguez and Jesús Carretero
November 2010
This set of data files implements a Spanish (castellano) dictionary
with 54,000 roots (aprox.) and their derivated forms. The number
of roots increases every day, but new versions of COES are not
available until their are tested for correctness.
The current distribution of COES includes a speller for the Spanish
language.
COES tools must be used integrated with the international ispell
program (version 3.1.13 or further) or aspell.
Several releases of COES have been publicly distributed with the
following agenda:
The last COES release can be obtained from
http://www.datsi.fi.upm.es/~coes/espa~nol-1.11.tar.gz. This package
contains the affix file and the spanish word list.
If you use aspell, you
can get two instalable packages:
A special entry on ispell general information.
Aspell is another spell checker and more information can be accessed
here.
If you want to run the Spanish dictionary, you have to undefine the
NO8BIT macro in the local.h configuration file.
The distribution is included in the espa~nol-X.X.tar.gz file
(X.X is the version number).
To extract sources that are in files ending in `.tar.gz' you can use the command
gzip -d < espa~nol-X.X.tar.gz | tar xf -
where espa~nol-X.X.tar.gz is the name of the file.
This file is expanded to the following files:
espa~nol.aff : Affixes file.
espa~nol.words : Contains a list of words that appear in the
official español dictionary (Diccionario de la Real Academia Española
de la Lengua 21st edition).
espa~nol.nofl : Contains a list of words not appearing in the
official dictionary but being used normal spanish and they are "correct"
words.
espa~nol.comp : Contains a list of words not appearing in the
official dictionary but being used in computer related texts.
antiguas.words : Contains a list of words that appear in the
official espa nol dictionary and they are old ones that are not
currently in use.
espa~nol.words+ : Contains the expanded list of words generated
from the espa~nol.words and espa~nol.comp word files.
e~nes : Script for replacing the 'n and 'N by n and N in the
espa~nol.aff , espa~nol.words and espa~nol.words+ . If
you use the second way to specify this letter you have to run this script.
This
script uses the sed utility. It has been checked by using the GNU sed
version 2.05. If you want to run this script make sure that you have the
GNU sed installed and type:
make e~ne
Makefile : Makefile for building the hash file
(espa~nol.hash ) from the affix file and the espa~nol.words file.
First, you have to decide how to represent the e~nes .
There are two options: 'n 'N and ~n ~N . If you
use the second option, you must execute the script e~nes . This
script uses the sed utility. It has been checked by using the GNU sed
version 2.05. If you want to run this script make sure that you have the
GNU sed installed and type:
make e~ne
To generate the Spanish dictionary (espa~nol.hash file) type:
make
This way of building the hash file needs about 50Mb of paging space and
100 Mb of temporary disk space. Please, ensure that you have
enough disk space in the tmp partition (usually /usr/tmp).
If you do not have it, you have to set the TMPDIR environment variable to
a path where you can allocate 100 Mb of temporary disk storage.
If you want to create the espa~nol.hash from the expanded word
list (espa~nol.words+ ), just type:
make build
It does not need so much temporary space.
The size of the spanish dictionary (espa~nol.hash ) is
4 Mbytes. If you get a size much bigger, probably it is due to the
sort command of the operating system (Solaris 2.7 has this problem). In this
case we recommend to install the textutils package of GNU and be sure that
the sort command that you use is the textutils one.
To install the hash file become root and type
make install
Six different formats are supported by COES.
Default format: The acute characters are coded as follows:
Code |
Char |
' a |
á |
' e |
é |
' i |
í |
' o |
ó |
' u |
ú |
' n |
n |
" u |
ü |
' A |
Á |
' E |
É |
' I |
Í |
' O |
Ó |
' U |
Ú |
' N |
N |
" U |
Ü |
TeX format: The acute characters are coded as follows:
Code |
Char |
\ ' a |
á |
\ ' e |
é |
\ ' {\ i} |
í |
\ ' o |
ó |
\ ' u |
ú |
\ ' n |
ñ |
\ " u |
ü |
\ ' A |
Á |
\ ' E |
É |
\ ' {\ I} |
Í |
\ ' O |
Ó |
\ ' U |
Ú |
\ ' N |
Ñ |
\ " U |
Ü |
plainTeX format: The acute characters are coded as follows:
Code |
Char |
\ ' {a} |
á |
\ ' {e} |
é |
\ ' {\ i} |
í |
\ ' {o} |
ó |
\ ' {u} |
ú |
\ ' {n} |
ñ |
\ " {u} |
ü |
\ ' {A} |
Á |
\ ' {E} |
É |
\ ' {\ I} |
Í |
\ ' {O} |
Ó |
\ ' {U} |
Ú |
\ ' {N} |
Ñ |
\ " {U} |
Ü |
html format: The acute characters are coded as follows:
Code |
Char |
á |
á |
é |
é |
í |
í |
ó |
ó |
ú |
ú |
Á |
Á |
É |
É |
Í |
Í |
Ó |
Ó |
Ú |
Ú |
ñ |
ñ |
Ñ |
Ñ |
ü |
ü |
Ü |
Ü |
latin1 format: The acute characters are coded as specified
in the iso_8859_1 code.
msdos format: The acute characters are coded as specified
in the extended ASCII MSDOS code.
If you want to run ispell by using one of the previous formats
please type:
ispell -T <formatter> -d espa~nol <file>
espa~nol.hash file is available for MSDOS users at:
http://www.datsi.fi.upm.es/~coes/espa~nol.zip
Note that the affixes list and the word list are under
development. We are currently working on them. If you find words
that does not appear in the word list or words that must not appear in
the word list, please send a message to
espanol-bugs@datsi.fi.upm.es.
It is very important that you send us the that does not appear in the
dictionary and they must. You can easily do this by sending to the above
Email address the file .ispell_espa~nol stored in the home
directory of every user.
COES was developed in the Universidad Politécnica de Madrid.
Prof. Jesús Carretero moved to Universidad Carlos III
de Madrid and he goes on collaborating in the project. Postal addresses
of both authos follows.
Santiago Rodríguez
Departamento de Arquitectura
y Tecnología de Sistemas Informáticos (DATSI)
Facultad de Informática.
Universidad Politécnica de Madrid
Campus de Montegancedo s/n.
28660 Boadilla del Monte, Madrid, España.
Email: srodri@fi.upm.es
|
Jesús Carretero
Universidad Carlos III de Madrid
Despacho 2.2.A.25
Edificio Sabatini
Campus de Leganés
Avda de la Universidad, 30
28911, Leganés, Madrid, España
Email: jesus.carretero@uc3m.es
|
Copyright (c) 1994 1995 1996 1999 2001 2005 2008 2010
Santiago Rodríguez and Jesús Carretero
Two kind of licenses are available for this package:
GNU. This package is distributed as free software; you can
redistribute it and/or modify it under the terms of the GNU General
Public License as published by the Free Software Foundation. This
program is distributed in the hope that it will be useful but WITHOUT
ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
PRO. For applications where GNU license is not usable,
please contact the authors.
|