Computer Architecture Department Universidad Politécnica de Madrid Universidad Carlos III de Madrid

COES. General Information
and Distribution

Santiago Rodríguez and Jesús Carretero
May 2008


Contents


1. What Is COES and Why Do I Want It?


This set of data files implements a Spanish (castellano) dictionary with 53,000 roots (aprox.) and their derivated forms. The number of roots increases every day, but new versions of COES are not available until their are tested for correctness.

The current distribution of COES includes a speller for the Spanish language.

COES tools must be used integrated with the international ispell program (version 3.1.13 or further) or aspell.

Several releases of COES have been publicly distributed with the following agenda:

Version Date
V 1.1 December 1994
V 1.2 January 1995
V 1.3 February 1995
V 1.4 April 1995
V 1.5 November 1996
V 1.6 April 1999
V 1.7 June 2001
V 1.8 March 2005
V 1.9 November 2005
V 1.10 May 2008
COES PROfessional NON GPL version


2. Where can I get COES?


The last COES release can be obtained from http://www.datsi.fi.upm.es/~coes/espa~nol-1.10.tar.gz. This package contains the affix file and the spanish word list.

If you use aspell, you can get an instalable package from http://www.datsi.fi.upm.es/~coes/aspell-es-0.50-2.tar.bz2. This package is based on COES (release 1.7). The original distribution can be located at http://aspell.sourceforge.net.

3. What are Ispell/Aspell and how to get them?


A special entry on ispell general information.

Aspell is another spell checker and more information can be accessed here.

4. How to install COES?


If you want to run the Spanish dictionary, you have to undefine the NO8BIT macro in the local.h configuration file.
The distribution is included in the espa~nol-X.X.tar.gz file (X.X is the version number). To extract sources that are in files ending in `.tar.gz' you can use the command
gzip -d < espa~nol-X.X.tar.gz | tar xf -
where espa~nol-X.X.tar.gz is the name of the file.
This file is expanded to the following files:

5. How to generate the dictionaries?


First, you have to decide how to represent the e~nes. There are two options: 'n 'N and ~n ~N. If you use the second option, you must execute the script e~nes. This script uses the sed utility. It has been checked by using the GNU sed version 2.05. If you want to run this script make sure that you have the GNU sed installed and type:
make e~ne

To generate the Spanish dictionary (espa~nol.hash file) type:
make

This way of building the hash file needs about 50Mb of paging space and 100 Mb of temporary disk space. Please, ensure that you have enough disk space in the tmp partition (usually /usr/tmp). If you do not have it, you have to set the TMPDIR environment variable to a path where you can allocate 100 Mb of temporary disk storage.
If you want to create the espa~nol.hash from the expanded word list (espa~nol.words+), just type:
make build

It does not need so much temporary space.
The size of the spanish dictionary (espa~nol.hash) is 4 Mbytes. If you get a size much bigger, probably it is due to the sort command of the operating system (Solaris 2.7 has this problem). In this case we recommend to install the textutils package of GNU and be sure that the sort command that you use is the textutils one.

6. Dictionary Installation?


To install the hash file become root and type
make install

7. Which character maps are supported by COES?


Six different formats are supported by COES.
Default format: The acute characters are coded as follows:
Code Char

' a

á
' e é
' i í
' o ó
' u ú
' n  n
" u ü
' A Á
' E É
' I Í
' O Ó
' U Ú
' N  N
" U Ü

TeX format: The acute characters are coded as follows:
Code Char

\' a

á
\' e é
\' {\i} í
\' o ó
\' u ú
\' n ñ
\" u ü
\' A Á
\' E É
\' {\I} Í
\' O Ó
\' U Ú
\' N Ñ
\" U Ü

plainTeX format: The acute characters are coded as follows:
Code Char
\' {a} á
\' {e} é
\' {\i} í
\' {o} ó
\' {u} ú
\' {n} ñ
\" {u} ü
\' {A} Á
\' {E} É
\' {\I} Í
\' {O} Ó
\' {U} Ú
\' {N} Ñ
\" {U} Ü

html format: The acute characters are coded as follows:
Code Char
&aacute; á
&eacute; é
&iacute; í
&oacute; ó
&uacute; ú
&Aacute; Á
&Eacute; É
&Iacute; Í
&Oacute; Ó
&Uacute; Ú
&ntilde; ñ
&Ntilde; Ñ
&uuml; ü
&Uuml; Ü

latin1 format: The acute characters are coded as specified in the iso_8859_1 code.
msdos format: The acute characters are coded as specified in the extended ASCII MSDOS code.
If you want to run ispell by using one of the previous formats please type:
ispell -T <formatter> -d espa~nol <file>

8. Is There a MSDOS dictionary?


espa~nol.hash file is available for MSDOS users at:

http://www.datsi.fi.upm.es/~coes/espa~nol.zip

9. Where to send bug reports?


Note that the affixes list and the word list are under development. We are currently working on them. If you find words that does not appear in the word list or words that must not appear in the word list, please send a message to
espanol-bugs@datsi.fi.upm.es.
It is very important that you send us the that does not appear in the dictionary and they must. You can easily do this by sending to the above Email address the file .ispell_espa~nol stored in the home directory of every user.

10. Who developed COES?

COES was developed in the Universidad Politécnica de Madrid. Prof. Jesús Carretero moved Prof. Jesús Carrectero moved to Universidad Carlos III de Madrid and he goes on collaborating in the project. Postal addresses of both authos follows.
Santiago Rodríguez
Departamento de Arquitectura
y Tecnología de Sistemas Informáticos (DATSI)
Facultad de Informática.
Universidad Politécnica de Madrid
Campus de Montegancedo s/n.
28660 Boadilla del Monte, Madrid, España.
Email: srodri@fi.upm.es
Jesús Carretero
Universidad Carlos III de Madrid
Despacho 2.2.A.25
Edificio Sabatini
Campus de Leganés
Avda de la Universidad, 30
28911, Leganés, Madrid, España
Email: jesus.carretero@uc3m.es

11. Copyright


Copyright (c) 1994 1995 1996 1999 2001 2005 2008 Santiago Rodríguez and Jesús Carretero

Two kind of licenses are available for this package:
  • GNU. This package is distributed as free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
  • PRO. For applications where GNU license is not usable, please contact the authors.

  • COES Home Back to COES home page

    COES
    2008-05-14