A METHOD TO REDUCE LARGE NUMBER OF CONCORDANCES. 
Maria Pozzi, Javier Becerra, Jaime Rangel, Luis Fernando Lara. 
Diccionario del Espa~ol de M~xico. 
El Colegio de M~xico 
Camino al Ajusco #20 M~xico 20,D.F. 
MEXICO 
Summary 
In order to help to solve the problem of 
analysing large number of concordances of a 
given word 'W', the 'Diccionario del Espa~ol de 
M~xico~ (DEM), has implemented a programme that 
i) Reduces this number, as to obtain the 
maximum possible informati.on with the minimum 
number of concordances to be handled. 
ii) Sortes and rearranges the output so that 
similar concordances are printed out together. 
This was done by comparing up to four words 
to the left and to the right of word W, through 
the whole set of concordances, associating toge: 
ther those which were repeated in a particular 
context. Once knowing this, some significant 
concordances were selected to be printed out, 
and the rest was discarded. 
I Introduction 
In the composition of a dictionary,those 
involved in the definition of each word have to 
study very consciously its set of concordances, 
so that no meaning or use is missed. 
there are, of course, some difficulties since 
on one hand, the sample is never large enough as 
to insure the occurrence of all the different 
meanings and uses of every word to be defined. 
This problem is solved by consulting other dic- 
tionaries and expertees on the particular sub- 
ject. 
On the other hand, there are words having a 
very large number of occurrences, making their 
analysis a very difficult task, since it is not 
possible to have present in mind everything that 
is being analysed. At first thought this could 
be solved by taking at random a smaller number 
of concordances; however, when reducing in this 
way, one is about to loose the grammatical and 
semantic information contained in all those 
concordances to be taken away; hence a method 
had to be implemented as to attain the maximum 
possible information. 
In order to solve this problem, the DEM 
presents a method whose aim is to obtain optimal 
information with the minimum number of concor- 
dances to be handled. 
This method consists of, for each concordance 
to analyse and compare four words to the left 
and to the right of word W together with their 
grammatical category associated; and establishing 
which one of them is identical to which other in 
a particular context: A tree structure is 
generated. 
Having known this, it is proceeded to reduce 
the number, by selecting some of them considered 
to be representatives. 
II Preliminary Requirements 
Our sample (Corpus del Espa~ol Mexicano Con- 
tempor~neo: CEMC), consists of 1,973,151 
occurrences, resulting in 65,200 different 
types, I whose frequency vary from I to 68,252. 2 
Some preliminary work has been done consis- 
ting in the automatic labeling of each and every 
word of the corpus with its grammatical catego- 
ry, 2 in which from the total number of occurren- 
ces, 1,083,945 were automatically solved, and 
--590-- 
the rest had to be solved by hand, then the 
computer was fed with the results, obtaining in 
this form, the complete sample labelled. We took 
advantage of this work, since otherwise it would 
have been impossible to try to reduce the number 
of concordances in terms of the same grammatical 
category. 
Next, was to implement a programme that prod£ 
ces, for any given word, its set of concordances; 
each word stating its own grammatical category. 
This is stored in a file called CONCUERDA, and 
it is organized in the following way: 
Every concordance has three lines, each one 
of them consisting of: 
- 6 characters (nnnnnn) reserved for the number 
of occurrence. 
- 12 characters (tttppplll) reserved for the 
register of that line, according to the 
original text, and stating text code, page 
and line. 
- 72 characters reserved for the actual text 
- 18 characters for the label of each word of 
the line, stating the grammatical category 
code. The first two characters indicate the 
number of words in the line. 
Figure number 1 shows part of file CONCUERDA 
and its organization. 
III The Algorithm 
3.1 Association of the i-Concordance to table 
ORDENA. 
For each concordance, a table ORDENA is asso- 
ciated in the following way: 
- The word in question is located in the 
middle line and associated to ORDENA(5) 
- Four words are selected to the right and 
to the left of W, since they are supposed 
to be carrying the most significant gramma- 
tical and semantic information about the 
word W. 3 We took this idea from the Centre 
du Tr#sor de la Langue frangaise"s work 
concerning to the treatment of binary groupes 
Each of the next four words to the right of 
W will take its place in Oi+ 1 if and only 
if 
w5+i ~- 05+i and1# punctuation 
mark Pi such that 
w5+i-I Pi w5+i and Pi~{";:L?i\] 
as they are considered to break up the 
continuity of a context. 
In similar way, the words to the left of W 
are associated to their place in ORDENA. 
Figure No. 2 shows how to construct table 
ORDENA from a given concordance. 
3.2 Generation of a Tree Structure startin 9 
from ORDENA. 
Once obtained this set of up to nine words, 
it is proceeded to construct a tree structure 
for the words to the right of W and one for the 
words to the left of W. 
It will only be described here the construc- 
tion, of the right branch of the tree. The left 
is generated immediately after, though in symme- 
tric form: 
- The tree has a root node which is the word W 
itself, and has five levels, being the root in 
level 5. 
- A direct descendant of a node w i is given by 
the word wj such that wiw j are adjascent, i.e. 
if wi-~ORDENA i and wj.~ORDENAi+ 1 then 
wj is a direct descendant of w i. 
- The label of each node consists of: 
- Word w associated. 
- Its grammatical category. 
- Its frequency. 
And pointers to: 
- Direct ascendant. 
- First direct descendant. 
- Next node whose direct ascendant is the same 
as the one of itself. 
- Another file called CONCORD, where it is 
stored the number of the concordance or 
concordances where that word in that 
--591-- 
W Q 
n ,d 
o 
W T 
h 
o 
W 
o -z 
o 
z o 
..t" cO 
d" O0 O~ cod" 
(0 -'~ d" I/~ 0D CO 
O ~OO O ~t 0 O,,-I (3 mo 
0 c~o o~D~ 
I~ O0 1=3 0~0 00=t 0 tt~O 
~D 000", d" O,,,-I 
n ~ Z 
I-- H 
OW 
~ Z hI~ 
0 O~ .;.1" I~ 0 0,1 rn ....~ 00,I ,~ d" I~ 001'3 ~ eO~O eO0 O ~I ~- ,,.00 
00~ "~P~O 0 040 eO(;O eO0 ~o,:,' g ~.o g~° o °':°° C),~ ",.0 C~ 040,1 
000 ,,.~0 ~,o, g2-g ~o go,=, '=~,I 
0~0 O',cO mO ~1"0~ O0 0 =.'.1" i"- 0..t"=,1" O~O 0 O0 ,,.-~ ,-t ,"r*, 
,..Oi '~- oO', n'~ec ,,-40 C, JO -.1" ~0 c3o (:z, fO ~, ,::} 
b'} ),- U') ,~ u~ o ~ U~ ~ WbJ 
>.. ,< Z ~ -Z ' -0 J 0 0~ 
0 b..I .~ ,< 0 H ,,J ,,~n,' ~ OI-- ~E ~OZ r ~ , .Jr~O ~ I~ J,mW (:3 17'1 I--~ OV~J ~ ~ 0 "~ 
,~ (J, ,~,,~ ,.J 0 O0 O rr' 
~C ZO ..J ~,,.-, - O0 I-- 0 bJ~- m,,~° 
O~ W~ ~ r~-r -r 0 n ~ b~ ,,:(J ~ n.'~. I..,J 
u~w mD~ - m ~ row< ~.J -m 
>,-- w<m ~oo wz :" .Q~ ~m / Z I-.- bJ ~ tj~.. ,,_I c\[O b~ 
0 Z) ;-- rr "3 rr' (~ O IZI <~ W O ~ L/~ J~U 
~ ~/~ > ~0 Z 'YH ~.WbJ ~: 
Zl--Z • . u1~ I,.- LO ."~ Z 7 ~-I >-W~ 
J r'~" X 0 
)- Z ~. ~ ~lr~ CC lip" 0 0 -~ "~0 0 
~- .'3.. b..I ~ 0 ~ C~ ."~ --I 0 ~.~-. Z ~d ~ (~) 
J o~ e- m ~- ,<I- m ow .~ u,, ~: 
MUJ ~'~ ,~1-- o ~7 IM Z 
a~- w ~a o 
=DJ :D 0. o ul>- ~ b9 b~J 0 
b..I ,~ b.I Z,~. 
WZ 
• ~H W t/~ ~ ~0 
bJ --IW ~ b.l I-- 
W ILl 
H r~r~ Z) ~, 
> ~ OW~ 
bl OZ6~ 
~-o ~° , J 
D J I (3~ 13 
W ~-~ UIC3 (:r. LO Ld 
J J O~- ""~' OC ZbO ~.~,< 
O~Ld Ztm ~g~ u~ 
O~-~ ~- ~bJ N ZO O .~W ~ b'~° ~ ~-4 I.-=~ ~..-~ "r" U~ J r-~ WO ,~ ..-I I-. 
dlJ ~ Ul L,.I b'~ 0 W W J0. l-- I~I ~ I,dh ~ td~ < Jb~ _rl MO L~ 0~. H 
Oh-cq J'~ OJ N ,~ b-I IWWOJ 
. • I& J-J "~ ° U') .J .JO ~ ,.J ~'~ 
010 <0 0 _J ~tJ Z ,J OZ d~ ~/) trl ~W ~dH O~ A. OW 
+~ W UIC~ D. 6q~ ~ ~W 
OW i~O 0.~ O m JbJW 
o ~ ~'~o ~-~ ~ ~o~ > =~ ~5o ~ ~ ~ ~ ~ ',1 0--3 ~:> :D ~1 I.I. ::D ~ .~ I,~ l.h ~1~ ,Y 
Uo 
OO -J ~. g,.,× g~,~ .I=,, <o o-,~ ~o o-,,, 
o o g g ° o o o o 
D- ,-~ 0~ :I" ,r; • 4 O~ ,~I ed J',l ~Q ,'0 4" 
0 ~ o c3 o Q C) o o c) C) 0 
• -4 c~I r~ 4" .I'I ,,9 • t ~- 
c- 
O "3 
.c- 
o 
I1) e- 
4- 
o 
0 c- 
t~ 
"O 
O 
U 
O 
U 
e- 
4~ 
~2 
~J 
~-- 0 
• e- 4 -~ 
b_ in 
Z 
0 
LZ 
--592-- 
particular context came from, making in this way 
possible the retrieval operation. 
'-A node has as many branches as different words 
are found to be direct descendants to that 
word, with the same grammatical category 
through the whole set of concordances. 
The process repeats itself until the last 
concordance has been processed. 
Figure No. 3 shows, for a set of 14 concor- 
dances, the left and right trees generated. 
12 ObOl6B030 
SI/, LA AooLESCENCIA NO PUEDE SER SUPERAOA SIP IO COMO OLVIDO DE SI/p 12500100030045 
COMO ENTRCGA. POR ESO LA AOOLESCENCIA NO ES SO/LO LA EDAD DE LA 130045001g16846 
SOLLDAD, SINO TAMBIE/N LA E/POCA DE LOS GRANDES ANOREs, DEL HEROI/SMO Y 12631004000783 
ORDENA\[I :9\] 
NO I 
ES 9... 
SOL.O i 
LA 6 
EDAD' 8 
DE q 
,LA 
SOLEDAD 8 
Figure No. 2 Table ORDENA is obtained from a given concordance. Note that ORDENA(9) 
is void, since there is a comma (,) after the word 'soledad' 
--593-- 
90 323025064 
91 32403#073 
92 33306(,012 
93 3351#8021 
94 33.;148034 
95 3,~50#q.023 
96 3:'.50 q.# 02t~ 
97 5#2251019 
98 3450650g5 
99 3~09603~ 
SUCEDI/A ALLA/ POR EL A+O DE 18,~!, CUANDO DON ~ PEPE~ T!:NI/A UNOS 55 
A+OS DE EDAD Y MUCHOS RI+ONES AU/N° TUVO UN IMITAOOR NOTABLE~ QUE FUE 
UN BANDERILLERO LLAMADO ANTONIOP GONZA/LEZ~ EL~,ORIZA6E+O~ QUIEN DIO A 
AHORA, LA EMPRESA GUE LA TIEN!~ R~NTAOA~ SE ~sTA/ GASTANDO UN DINERAL EN 
ESTE SERIAL~ BUSCANDO NUEVOS VALORES~ MISMO5 QUE - HASTA QLiE SU EDAD SE 
LOS PERMITA - NO HABRA/N DE SALIR DE ENTRC LOS NI+OS TOREi~OS, 
CONSEGuIR DINERO PARA SACAR ADELANTE LA FUNDACIO/No PRIMERO HABLO/ EL 
SE+OR CURA eUE FNTONCES NO TEHI/A NI TRZZi4TA A+OS DE EDAD. LUEGO DON 
TOMA/S8 SA/NCHEZO (ESTE SI/ VIEJO Y COLUDO) PROPUSO COLECTAS Y RIFAS. 
CABALLOS. I 0 
EN SAN ~ JOSE/g HABI/A MEDIO MILLAR DE HOMBRES EN EDAD DE TOMAR LAS 
ARMAS E IRSE A LA GUERRA~ PERO NO TODOS SE SINTIERON CON A/NIMOS DE 
CASADOS Y TENI/AN HIJOS. LOS MA/S ERAN JO/VEN\[:S Z:N EL VERDOR DE LA 
EDAD, DE 16 A 30 A+OS, CON ALGUNA DESTRE2A EN L:L MANEJODE ARMAS Y 
CABALI..OS Y SIN DISCIPLINA MiL!TAR. 5 03#OJ 
ENCUBIERTOS DEL DIABLO~ O AL MENOS DO/CILES INSTRUMENTOS DE SUS AVIESOU 
DESIGNIOS, LA BEATA IMAGEN DE LA EDAD \[)E ORO REDIVIVA SE TRANSMUTO/~ AL 
CONJURO DEL DES~NGA+O~ EN EDAD DE HIER~<O EN QUE DOMINABA LA CRECIENTZ 
DESIGNIOS, LA BEATA IMAGEN DE LA EDAD DE ORO REDIVZVA SE TRANSMUTO/~ AL 
CONJURO DEL DFSENGA+O~ EN EDAD bE HIERRO EN QUE DOMINABA LA CRECIENT~: 
CONVICC:Io/N DE QUE ESOS DESNUDOS HIJOS DEL OCE/ANO ~ FORMABAN PARTE DEL 
INDI/GEHAS~ COMO ES AU/N~ EN PARTE EsTE/RIL, SINO ~UE REALIZARI/A SU 
PROGRESIVA EDUCACIO/N EN LA ADOLESCENCIA Y HASTA EN LA EDA9 ADULTA'. 
EN EL PLAN DEFINITZVAMENTE REG::Ni:RADOR DICTADO EN EL LLANO DEL RODEO ~ 
JURA/IS Y YO PIERDO UN ALUMNO. 6 935958 
1391468#C!0805C 
1384,932(~106,.30:,9 
1068081~68594 
13100:,l.! 59i684 
12280U!:345285 
1L59L0404:,683 
I00 ~ ; : .: 96 
1380,) 1..'. 73 B~:;4CI 0 
11B52!9300'i3 .' 
134850084~3#3",~96 
I~830 #0 .:31359~9q. 
13839800 ,LZ q-6846 
158@Cz!C 0 f~O :~q-66483 
l.i_ 076370;" 426 
13008~;#6843:~597 
1287~4C~L~O'I.OL; ; 
1300' ,468L~0 .597 
12875#54:3q.50 O C! 
1.L0#02637C907 
110:~91#01;3 ! 2 
I J, 80t+O ' ,34;~68~ 
11466.\[ 0 :) q.687,'3 
PERO DESD.r:. ILA EDAD OE OCHO O NUEVE A+OS HASTA LA DE DIEC!NUZVZ O VEINTE 153468#839840#639 
NO EXISTE ILL DESI:O DE UN TRABAJO MANUAL PESADO. ESTO ESE:XAC'FO EN LA 1#i0684580.~593#:; 
PERCIBIR SUS CUALiDADES TANTO MATERIALES COMO FUNCIONALES, ASI/ COMO 
SU CONVENIENCIA RESPECTO A LA EDAD DE QUIEN LA IBA A USARI OBSERVO/ SU 
CONTENIDO Y MANEJOp SE DIO CUENTA DE SU PESO Y RESISTENCIA AS// COMO DE 
DESARI<OLLO DE LA IMAGINACIO/N CREADORA YALGUNAS HABILIDAOES PARA oFERAR 
100 34409603(~ CON HERi.,AMIENTAS SENCILLASI ES, ADEMA/S, ADECUADO A LA EDAD DE MI HIJO, 
INSTRUCCIONES, LA PRESENTE "SCALA CONTIENI\] OCHO ASPECTOS E~iCNCIALES EN 
LEES FA/CIL HACER AMISTADES l 'ME ES BASTANI'E FAICIL HACERLAS Y ME 
101 345322048 GUSTA QUE SEAM ALEGRZS, DE MI EDN) Y TENGAH UN I;IVEL CULTURAL POCO MA/S 
0 MF'NOS COMO EL MI/O.' Y FRENTE A UN GRUPO DE NI+OS: 'ME DA GUSTO VER 
TERMINAR LA CAR'~EIA DE M',ZDICINA. 5 O. 4~ 
102 3#5355029 CASO 2. ALUMNO DE 19 A+OS DE E:DAO! SEXO MASCULINO. PROCED~NTE DE LA 130C8#C8#GOJ'4: 
ESCUELA" DE ARQUITECTURA DE UNA UNIVERSIDAD DE PROVINCZA. 6 0#u468~',.0 
IDENTIFICA CON LA PLENA REALIT.~CZO/N DE LAS A~SPIR/',CtONE.~; f~UE ::L HOMBRE 
103 "IIENL Dt:SDF: !:L TI_'/RMINO DE LA EDALr" MEDIA", Y NO SE DANCUEliTA DE QU\[: 
EL A+O 20,~' PIED\[: NO SE LA CULMINACIO/N ROTUNDA Y FELIZDE UN PERIODO 
3#6037919 
UNOS- 
NI~NUME.-- DE ~ ANO5---- DE, 
9 0280,! '1 
14281468L~5594092 
14830598#2830104 
9 8409:' 
1240;\]9i8468428 
90L).;':. : 4 
12590~),599 35 
14909842839660q 
1630d6553468485980 
1~.9#0,) 4i 66 
150468#68';31598#0 
1#66C0153C 3 #68 
MILLAR-- DE-- HOMBRES--EN ~MUCHOS----RI QONES--AU N 
E'L--TE RMINO~__ /¥ TENGAN--UN NIVEL EL-- VERDOR ~DE\ //MEDIA 
B EATA--IHAGEN--"- ,X //jADULTA 
Y--HASTA EN~L~ //-,I HI~HIJO PERO--DESDE7 ' IEDADF 
/.TOHAR--LAS--ARHAS \\ 
CONVENIENCIA At \ ~DE~-- HI E R RO-- E N QU E \ ~OCHO--O ~NUEVE 
DE--M \ QUI EN--LA~IBA 
QUE-----HASTA, QUE--SU ~E--LOS--PERHITA--NO 
Figure No. 3 Left and right trees generated from a set of 14 concordances. 
--594-- 
3.3 The algorithm to select significant 
concordances. 
Once the tree is fully constructed it 
is proceeded to make the actual reduc- 
tion. 
There are some facts to be considered 
beforehand: 
The more words repeated exactly in 
the same context, the greater is the 
probability that the meaning of the word 
W in that context is the same. 
A set of words repeated a small num- 
ber of times may be more significaqt 
than another one repeated a larger num- 
ber of times since there are not so ma- 
ny different meanings or grammatical 
functions of ~ word W followed by the s~ 
me set of words. 
Next, it will be described the proced 
ure: 
In order to analyse the tree, a left- 
most path is followed. 
- A 6th level branch of the tree is 
first analysed (Remember that the root 
is in level 5, and that the tree to the 
right of W is being analysed). If the 
frequency is greater than I, then its 
leftmost direct descendant is analysed 
in the same way. 
If a 9th level rode is reached in 
this form, and the frequency n > I, it 
means that the words W followed by these 
four words ocurred a times in n diffe- 
rent concordances. As it was said befo- 
re there is a good probability that the 
meaning of the word W in this particu- 
lar context is the same in all of the 
n concordances" hence, by talking only 
one or two of them, by means of a ran- 
dom function, we obtain a significant 
concordance, and the ( n - 1) or (n-2) 
left can be safely omited from the fi- 
nal output. 
- If at same intermediate level it is 
found that the frequency of the word 
associated to that ~ode is I, then the 
analysis of such branch would have to 
be stopped; however, it was thought that 
a possible way to reduce was not by 
identical words but by the same gramma- 
tical category. It is proceeded then 
to find all direct descendants of its 
own direct ascendant with the same fre- 
quency and grammatical category, and 
then the number of these concordances 
is reduced. 
It is clear that the process takes 
into account that as the level of re- 
duction is closer to 5, then the con- 
text is less significant; hence a lar- 
ger number of concordances have to be 
chosen to mantain the required quality 
information. 
After some study and many trials it 
was empirically decided by our team of 
linguists* that a reasonable pattern of 
reduction was the following: 
If the level of reduction is 4 or 6 
and the frequency F~ 30 then the number 
of concordances selected Q would be 
Q=F//2 + I and 
Q=F//4 if F>30. 
If level is 7 or 3 then 
Q = F//3 + 1 for F~50 
Q = F//5 for F >50 
If level is 8 or 2 then 
Q = F//4+I ~or F~70 
Q = F//7 for F>70 
Finally, if level is 9 or I then 
Q = F//5 + 1 for F~ 50 
Q = F//IO + i for F>50 
* At this point, we would like to 
thank in particular to Paulette Levy 
for her valuable discussions and inter- 
esting suggestions. 
--595-- 
It has to be mentioned here, that 
this pattern of reduction may be chan- 
ged according to the wprd analysed., as 
to obtain the best results each time. 
When it is already Known the number 
of concordances that will be chosen 
( Q out of F) it is proceeded to select 
them again, by means of a random func- 
tion, and each one of them is marked as 
such, to avoid any one of them be selec- 
ted twice or more times. 
3.4 Output. 
The final output is presented indi- 
cating the group of words repeated the 
grammatical category of the last word 
when applicable and the frequency. 
Next, the Q concordances chosen are lis- 
ted below. 
Figure No 4 shows the form in which 
the output is presented. 
IV The Computational System~ 
The system was implemented in the 
University of Norway version of ALGOL 60 
NUALGOL for a UNIVAC 1106 computer of 
the "Centro de Procesamiento Arturo 
Rosenblueth" of the Secretar~a de Educa- 
ciSn P~blica (Ministry of Education), 
with 262K words of 36 bites of central 
memory and 8,000,000 of characters in 
disc. 
4.1 Data Storage. 
We made use of 3 files: 
a) File CONCUERDA, where the whole 
set of concordances of the word 
W was stored, and it was descri- 
bed above. 
b) Files ARBOL and CONCORD~ these 
two files are supposed to contain 
the information obtained while 
generating the right and left 
trees. 
ARBOL: Each node of the tree is 
stored in a line composed of 72 
characters, distributed in the fo- 
llowing way: 
7 for its own address in file 
ARBOL 
I for the level 
24 for the word 
2 for the grammatical category 
3 for the leng~ of the wo~d 
4 for the frequency 
7 for the address of its direct 
ascendant 
7 for the address of the next di- 
rect descendant of its own direct 
ascendant (i.e. like next brother) 
7 for the address of the first di- 
rect descendant 
4 for the number of direct descen- 
dants (i.e. No of branches emer- 
ging from it) and 
6 for the address in file CONCORD 
where it is stored the number of 
the concordance where it comes from, 
From the com#utational point of view, 
each one of the trees is generated in 
the following way: 
- The root, whose node associated is 
the word W is in a prefixed address, 
and it will be present in every concor- 
dance. This word is taken from ORDENA 
(5) 
- The next word in ORDENA will be sto- 
red by means of a hash function, and it 
is decided to be the same node as one 
previously stored, if and only if the 
word, its grammatical category, level 
and direct ascendant are exactly the 
same, in such case the frequency is 
aumented by one and in file CONCORD is 
stored the number of this concordance 
in addition to the previous one. 
--596-- 
O 
r-, L~ 
t3d 
/ 
n 
J 
Id 
L/1 
cl 
,j 
c2~ 
O u 
o 
r~ 
-/ 
I-- o 
I-.. 
o z 
:3 
Ld 
Z 
O 
I! 
O 
W 
i, 
U 
W 
W t7 
N Z 
._1 ,~ 
ul 
bJ 
D 
0 
0 O0 ~H 
00~ 
l~m-" 
U'I .~ 0 "r~ 
I~ .J 
~ NW 
~ ,~m &. ~..1 
121W 
m ~ 
O I,~VI 
'~ ZZ 
~ H wmm 
C3 W~ ,~ n~" .J 
MO 
ffl U3Z O O~ 
~ W ,~ m.J 
H ~.- I-- 
HO m (3:: 
O IsJ l:g ,.J n O 
~ bJ 
O t/l~ 0. ~td 
O Wkt. 
! 
u~ 
o 
~ W~ W (~ C3 I- O(:3J 
O Z Q Ld 
-J W 69 ~ O~r~ 
• :~ Jr'," .~- -I O 
...I :3 0 ,,~ 
Wk- L) O,~ 
0 7n~ ZOhJ 
Z OV) U'I Z 
W 7~\]1 l:1 ~ 
LO LL. Z ~ -'~' Ul 
D 0 W 0 
WO DH 
O ~-~ ZE~ 
.J .-I L~JO 
0. Z bJ _~ ~_ b~ XW t~ O~ 
U1 vlX ,~_J 
(D W O --I -J I./1 Z Z 
4 Qz ~m~ 
m o ~o< 
ul o 0,. ;.-. W ~0 b9 
1:3 00~ W *O 
bJ~' ~ OW ~-~ L~ 0 ~. ~.J 
L~ H~ ~-I O0 
I I 
If) e~ tO 
II 
U bJ 
E I.L 
Z al ~ 
+ ~ 
W 
r~ 
tlJ 
I./1 ~ j 
O ZDI-- ~1 W 
O ~ ~-'~ ~. Z 
O OV1 ~ O1~ e~by .~ I--H 
W 01 ~W 
,,~ rTZ ~ ZO 
O ZOl Z ~Z 
0 -I :E H=E 
U I :3>0 bJ ~:: H C3 
I OE~ I-- C~ 
~/I 01WW 
n~ ~ 0 
Z W WOW 0 ~.0 .J 
• J 0 OLd 
Z W~ ~ ~ ~-"~ 4., ~ t.~'T" 
~'~ ~0 0 W m n o Z ~n 
:~ H kd .J 
H J r'~ C3 
WOJ. '~. W t-- U1 0 Z 
H ,~ I~I O WZ 
O_ OW J :-40 m + 
J WbJ H I-(z} 
o ow WW I-- Hc~ J rn~ 
Z~ J • - 0 ~- 09 .J 
O U1 I-- ~ r-~ 
H O ~ O 
.J Z'~ L41 ~-~O 
O HH n~ O~ 
~-~ .-I O ~ h.IO 
I ! 
II 
n~ 
,, g .~ 
m ~ f2~ c~ 
uJ 
o 
ul 
QV~ ~'0 
W ~,-~ W'~ W bJ tw J 
Zb9 .JO .~ W 
ZO I--W ~I~Q 
ON ZU ~,~ ,,~ Wb9 Q r~ W 
I./') ,,~ ,,~ t'," C- W 
~lZ ~ Z_lr, 
~W WZ ~ 
69n~ O Jl~ 
~w 
~. U'I DLQ Z~,--I W m b.I W 
ZJ 09 :~ 
W'~ ~W WN( 
~O 0~ ,-~.- 
VlZ O ~W ZUI ,~ Z~3t~ 
I-~0 60 bJ ~J. ~ 
J"~ 0 ~. 
C~ WbJ Z Z 
W(.) 0 =~ ~CI 
bJbJ. bJO H X b::~ 
ID~ ~J, r'~ b..i r'~ nvI b.J 2) 
W Z 
W V~J n<~ 
0~I --I bJ (~ ,w LdO 
--I F-m Ld~-~ 
cO LLI -~Z- t~ LJO 
0 IJ I~1 t~ t 0 ~-~ b9 
0 ~ 0~" ~W ~'4 L~ O ~l~O L~I bJ t7 
O ~9C GhOUl ~ :~ b.I 
Z C~I~J Z no O 
W W~ ~ .J Ln Z0- 
H ~T3 ~ ~i~ Z W 
01 _.I Z O ~ 1::3 ~0~ 
I I I El • • I~ 
II II (J 
l:g e¢ 
~ o o O o 
Z e4 o + C~l 
+ ~ n 
,,.1 Z 
W W 
cl e-~ 
W 
LD 
v 
r'7 
L~ 
"O 
O 
e- 
q-. 
O 
O c- 
t~ "O 
O 
U c- 
O O 
O 
O 
% 
m 
4- 
o 
4~ 
4-.) 
O 
% 
i- 
LL 
6 
Z 
ft.) 
L 
O~ 
--597-- 
TREE STNUCTURE GENERATED FOR WORD *EDAD* (AGE). 
51596 2DE 
31608 3Y 
31620 4MI 
31652 2MODERADAMENTE 
31644 2PUES 
31656 5Y 
51668 3Y 
31680 20E 
31692 2DE 
51716 3SO/LO 
51728 4ESA 
31740 2ALGU/N 
31752 2PROBABLENENTE 
31800 2CON 
31812 2CON 
31824 2CON 
31856 45U 
31848 4TU 
51860 4TAL 
31896 4POCA 
51908 2INVERSAMENTE 
51920 2PARA 
31932 3A 
31944 5A 
31956 3A 
31968 5A 
31980 3@UE 
31992 4TODA 
32004 3A 
32016 80UE 
32028 3A 
32040 3CIERTAMENTE 
52052 #ESTA 
32004 2POR 
320?6 4CUYA 
3~088 5DE 
32100 5OK 
32112 3DE 
52124 40TRA 
52136 3DE 
32148 5DE 
52160 2HASTA 
32172 2i4ASTA 
32184 30E 
32196 2CONFORME 
52208 bEN 
32220 6YA 
52252 SEN 
52244 SEN 
32256 6140 
32268 4ESTE 
32280 3A 
52292 5PERO 
32304 3A 
32516 2LE 
32528 3EXACTAMENTE 
52540 3A 
PR2 i 30936 4647 
COl i 32712 36576 36552 I 3228 
AJ2 9 12 33168 32100 4 69 
AV13 I 36360 2046 
C04 1 32436 3879 
COl 1 34008 32496 37344 1 3987 
COl i 32688 5252 1 4149 
PR2 2 33372 4674 
PR2 1 31056 4749 
AV5 1 3#008 32208 3#644 1 141 
AJ5 5 12 31896 32232 # 507 
AJ6 2 36312 2685 
AVIS I 31944 55028 1563 
PR3 1 31980 31920 219 
PR5 1 35400 633 
PR5 1 35664 33192 1680 
AJ2 16 12 3434# 32088 6 3 
AJ2 2 12 31728 31116 2 q98 
AJ3 1 12 52552 32280 1 2802 
AJ4 2 12 32076 32148 1 540 
AVI2 i 32424 32580 1749 
PR4 i 51980 260? 
PRI i0 54344 5556 5472 5 15 
PR1 25 34008 50828 3468 20 27 
PRI 1 52832 31532 1 54 
PRI 1 51620 32880 31560 1 i14 
C03 2 34008 32112 31800 2 216 
AJ4 1 12 32124 32532 I 366 
PR1 2 31728 56524 35496 1 557 
C03 i 31856 52568 52160 1 1128 
PRI I 33060 1740 
AVII i 34008 35604 411 
AJ4 6 12 33252 52184 5 1980 
PR3 2 50504 2625 AJ4 1 12 82556 648 
PR2 9 31856 32016 35256 8 6 
PR2 6 31620 51968 31344 4 72 
PR2 21 34008 320#0 37520 19 345 
AJ4 I 12 31848 32156 I 381 
PR2 1 32124 34524 1 384 
PR2 2 31896 30996 2 543 
PR5 I 32016 1131 
PR5 1 32208 34856 1215 
PR2 I 32052 32504 35124 1 1983 
C08 i 35628 2546 
PR2 15 34008 31980 35340 11 189 
AV2 2 24 54080 39324 1 222 
PR2 1 51728 32004 35976 1 510 
PR2 I 34344 32796 55892 1 1809 
AV2 2 24 39264 37884 2 651 
AJ4 2 12 51860 36156 1 2787 
PRI 1 31860 36168 1 2805 
C04 2 52712 31608 2877 
PRI 2 52052 52856 36804 1 3000 
PN2 1 36900 3159 
AVll 1 34008 50468 36516 1 3165 
PRi i 31848 5220 1 5192 
Figure No. 5 File ARBOL, where the tree ~tructure is generated. 
--598-- 
Otherwise it will be a new rode. 
Figure No 5 shows part of file ARBOL, 
EDAD (AGE) is being processed. 
V Results And Applications~ 
The first results were very encoura- 
ging, since for those words with medium 
number of concordances say up to 600 - 
we were able to reduce the number bet- 
ween 30% and 40%, according to the word 
in question. 
No lost information was reported (by 
comparing the original set of concordan- 
ces with the reduced version) 
It is expected that for words with 
higher frequency, the method here des. 
cribed will be more efficient. 
However, from the computational point 
of view, there are still some difficul- 
ties, since the generation of each tree 
is very time consuming as the frequency 
of the word in question increases. ~e 
are still working to optimize it. 
The most important application besi- 
des the original main objectives, is 
that by this method it is possible to 
find expressions and patterns of langua- 
ge repeated and used consistently. 

References 

Roberto Ham Chande: Del 1 al i00 
en Lexicograf~a, in Invest!~1~ones 
Lingu~sticas en Lexicograf~a, Jor- 
nadas 89 El Colegio de M~xico, 1979 

Isabel Garc~a Hidalgo: La Formali- 
zaci6n del Analizador Gramatical 
del DEM y 

Luis Fernando Lara y Roberto Ham 
Chande: Base Estad~stica del DEM 
in Investigaciones lingu~sticas en 
Lexicograf~a. Jornadas 89 El Cole- 
gio de M~xico, 1979. 

G. Gorcy, R. Martin, J. Masscourt, R. Vienney
Centre du Tr~sor de la Langue J 
Francaise: Le Traitement des 
Groupes Binaries. Cahiers de 
Lexicologie. I~-\[~0-I~ 
