A DISTRIBUTED ARCHITECTURE FOR TEXT 
ANALYSIS IN FRENCH: AN APPLICATION TO 
COMPLEX LINGUISTIC PHENOMENA PROCESSING 
Marie-H61&ne STEFANINI and Karine WARREN 
Equipe CRISTAL-GI{~ESEC 
University Steil(thal 
BP 25 38040 Grenoble Cedex 9 Franc(~ 
E-mails: stefanini(O)stendhal.grcn(~t.fl, karine((}ull-cristal.grenet.fl 
Abstract 
Most Natural Language Processing sys- 
tems use a sequential architecture em- 
bodying classical linguistic layers. When 
one works with a general language and 
not a sublanguage, there are different 
cases of ambiguities at difterent classi- 
cal levels; and more particularly when 
one works on COml)lex language t)henom- 
ena analysis (coordination, ellipsis, nega- 
tion...) it is ditfic.ult to take into account 
all the different types of these construc- 
tions with a general grammar. Indeed, 
the inconvenience of this approach is the 
possible risk of a combinatory explosion. 
So, we have defined the TALISMAN ar- 
chitecture that includes linguistic agents 
that corrost)ond either to classical lev- 
els in linguistics (morI)hology, syntax, se- 
mantic) or to coml)lex language phcnoln- 
ena analysis. 
1 Introduction 
Tile goal of this paper is to show that complex 
linguistic phenomena like coordination, ellipsis or 
negation, call be defined and processed in an 
distributed architecture. In tile processing of a 
very large corpus, the problem is to find an ap- 
t)roach allowing tile best interaction between dif- 
ferent knowledge levels (morphological, syntactic, 
semantic...) in order to reduce the generation of 
tile ambiguities, that occur within any general sys- 
tem of sequential analysis. 
Most NLP systems use a sequential architec- 
ture embodying classical linguistic layers. Among 
them one can find systems for English analysis 
such as ASK \[Thomson 85\], LOQUI \[Binol, & al 
85\], TEAM \[Pereira 85\] and for Prench analysis 
such as SAPHIR \[Erli 87\] or LEADER \[Benoit 
& el. 86\]. Due to l;he necessity for cooperation 
between differents modules, we have turned our- 
selves to the technics of multi-agents systems for 
tile construction of TALISMAN architecture \[Ste- 
fanini 93\]. This system also uses linguistic models 
of the CRISTAL system \[MMI2 89\]. Tile TAMS- 
MAN architecture includes linguistic agents that 
correspond either to classical levels in linguistics 
(morphology, syntax, semantic) or to complex lan- 
guage phenomena analysis (coordination, ellipsis, 
negation...). 
2 Ambiguities in text analysis 
2.1 Examples of ambiguities at different 
levels in the CRISTAL system: 
In NLP, when on(', works with a general language 
and not a sublanguage, there are different cases of 
ambiguities at different classical levels. 
Preprocessing: the characters are standardized 
and tile text is cut; into forlns. So, tile lmnctua- 
lions can be ambiguous. For example, a fllll stop 
(:all indicate an abbreviation or the end of the sen- 
tence. M. Clavier (prot)cr noun/common noun) 
Morphology: the text forlns are processed in- 
dividually by the morphological analyser \[Aho & 
Corasick 1975\] that attributes one or more inter- 
pretations to each in terms of a pair (lexical entry, 
category). 
One of the difficulties is to find tile verb in 
tile homonymous sequence with D/Y (determi- 
nant/preverbal) F/V (noun/verb). It is possible 
to predict either the beginning of a noun phrase 
(SN) or a verbal phrase (SV). 
Exalnple: Pilots (l) like (2) flying (3) planes (1) 
can (1) be dangerous. 
(1) Pilots, planes, can are either be verbs (to 
pilot/to plane) or nouns (a pilot/a t)lane/a (:all of bee,-) (v/F). 
(2) like is either a verb (to like) or a preposition 
(like) (V/P). The cooperation between agents in 
the Talisman system is detailled in \[Koning & al 9q. 
Syntax: A general grammar has rules which 
interfere with other rules. For example: N"-> 
N"N" enat)les to tmilt N" resulting fi'om the con- 
catenation of two N (noun or adjective)". This 
rule allows to construct the juxtaposition of noun 
phrases. Example: Le lyee Louis (F(nom,ppr) Le 
Grand. But this rule also is applied in tile follow- 
ing example: On associe h \[chaque gtudiant\]SN 
1151 
fun num&o de carte\]SN. 
Semantics: there are notions of ambiguity and 
paraphrase. The inodals can be paraphrased 
in a variety of ways. Example: he may come 
/ it's possible that he comes/will come/I per- 
mit/authorize/empower him to come. 
2.2 Disambiguisation methods 
An ambiguity appears when several solutions are 
possible for the same problem. These ambiguities 
are produced by a module or are the consequence 
of different analysis modules. 
2.2.1 Local grammars for 
disambiguisation: 
We advocate the use of local grammars for some 
disambiguisation of several solutions produced by 
a module. For example, we can use contextual 
laws for some morphological disambiguisation. In- 
deed, the following laws are always valid for writ- 
ten fi'ench analysis: 
- Law 1: Determiner+ (Noun or Verb) > De- 
terminer(D) + Noun(F) 
English example : "the address ..." 
- Law 2: Pronoun + (Noun or Verb) > Pro- 
noun(Y) + Verb(V) 
English example: "I address ..." 
These laws can be viewed as partial sohttions 
for combinatory explosion. 
2.2.2 Interactions for disambiguisation: 
In some cases, the interactions between different 
modules allow a faster disaInbiguisation. Indeed, 
an agent can use tile knowledge of another agent 
when needed. 
For example, during the morphological and syn- 
tactical analysis of the sentence "I(Y) want(V) 
the(D) c-mail(F) address(F or" V)", interactions 
between MORPH and SYNT are useful. MORPH 
will send all the sure morphological informations 
to SYNT; MORPH will propose tile two morpho- 
logical interpretations for "address". SYNT will 
immediatly reject the "address" = Verb solution 
in this sentence, thanks to its knowledges. 
In other cases, cooperation between agents is 
needed when two agents produce different solu- 
tions for the same problem. For example, the form 
and can be viewed as a syntagm coordinator or as 
a proposition coordinator. 
3 Distributed approach for 
Natural Language Processing 
Blackboards have been applied in linguistics to 
Speech Understanding Systems \[Erman 80\] and 
more recently to the analysis of written French 
(HELENE \[Zweigenbaum 89\], CARAMEL \[Sabah 
90\]) and documentary research \[Mekaouche 91\]. 
The global control of these systems is fully cen- 
tralized: the distribution of the reasoning capa- 
bilities enforces the maintenance of a global rep- 
resentation that is c.oherent and thus requires the 
use of belief revision mechanisms. Architectures 
based on direct communication between agents al- 
low complete distribution of both knowledge con- 
trol and distribution of partial results. 
We will briefly report, on the agent and on agent; 
society concepts as they are defined in \[Stefanini 93\]. 
A linguistic agent; can be divided into two 
main parts: its knowledge representation and its 
knowledge processing. Knowledge and goals can 
be given or acquired through coInmunication with 
other agents. 
At present, the society in the TALISMAN ap- 
plication is represented by the following linguis- 
tic agents: PRET for preprocessing, MORPH for 
morphological analysis, SEGM for splitting into 
clauses \[Maegaard & Spang Hanssen 78\], SYNT 
for syntactic analysis, TRANSF for transforma- 
tions of utterances (interrogatives, imperatives, 
etc...) in declarative clauses, COORD for coor- 
dinations, NEGA for negations and ELLIP for el- 
lipses. These agents arc describe.d in details iil 
\[Stefanini 93\]. There are different types of decom- 
position: Knowledge decomposition by abstrac- 
tion (PRET, MORPH, SEGM, SYNT...), task de- 
composition by type of input (COORD, NEGA), 
task decomposition by type of output (ELLIP). 
Tile TALISMAN system is based on direct con> 
munication between agents and thus uses mail- 
boxes for sending messages with an asynchronous 
mode of communication. Speech acts \[Searle 69\] 
are usually used to comnmnieate in a Multi-Agent 
System. Intentions of the sender are expressed in 
a common eomnmnication language. The possibh; 
interactions between agents during a conversation 
have to be regulated, this is clone by means of 
interaction protocols. 
In the TALISMAN system, tile communication 
language and the interaction protocols are based 
on the work of Sian \[Sian 90\]. 
3.1 Messages 
In the systetn, an agent willing to send a message 
will use the following message format: 
((sender, receiver(s)), (performative, for(:(;), 
content). 
Tile name of tile sending agent enhances the 
message understanding and the answer. The 
sender should determine the addressee agent(s) 
with the help of its knowledge about the other 
agents; if he has none, he will send the message to 
every agent in tile system. 
The performative of the message is either a 
simple sending information, a request or a reply. 
However, these types of messages do not suffice to 
express all tile intentions agents may have. We 
1152 
have "used" the. comnnmieation language (level- 
oped by Sad Sian because it is adapted to the 
eomlnulfication neetts of the system. This com- 
munication language figures out 9 forces : propose, 
modify, assert, agree, disagree, noopinion, confirm, 
accept and withdraw. 
We will not use the force. "accept" that requires 
the agreement of every agents. We also did not 
use the forces "agreed" and "disagreed" because 
our agents only have reliable information. 
The l)ropositional content is tbrmulated in the 
knowledge ret)reselfl;ation language of the agellt. 
3.2 Cmninunication I)rotocols 
An interaction i)rotoeol is a set of rules contain- 
ing t;he 1)ossible intt:ractions during a conversa- 
tion; it provides strategies for t)rol)leln solving due 
to the co-existence of several agents in tile same 
system. For the (:ooperatioi~ t)etween agents, we 
have adapted the protocol of Sian to the neetls of 
a natural language processing syst(;m for written 
frent:h. 
Our i)rotocols will use tit(: language communi- 
cation detined al)ove. Sian's protocol will be sin> 
1)litied aim det:omposed for better understanding 
into three l)roI;o(:ols: 
- mt assertion proto(:oh this l)rotoc()l allows 
a,gents to send t)artial or eomt)lete results to the 
concernetl agents; it is use.(t when an agent has 
only one sohltion or when the work of an agent is 
tlnished. 
- an information request protocol: this t)roto - 
col allows an agent (;o ask a t)recise qtlestioll to 
Olle or liloI'O ~tg;ents. If (;he receiver (:&n a.nswer, i(; 
will send an "Answer(Assert;)", otherwise an "An- 
swer(Noopinion)" (i.e if the agent can not answer 
or does not understand the question). 
- a cooperation request protocol: this prot;oeol 
allows atl agen(; to ask one or nlore agents to (:()- 
operate with it in order to solve the conflict it has 
crt:ated: it has l)roduc.ed several sohltions for (;he 
same. t)rol)lem and the other agents have to COl> 
tirm or rejet:t i(;s hyt)othesis. An agent will answer 
noopinion if it; (:~m not answer or if i(; does not, m> 
derstand the question; it; will confirm the hyI)oth- 
esis if it obtains a positive evaluation of it and it 
will withdraw it in case of negative evaluation. If 
the receiver's agent obtains a negative evaluation 
and has another hypothesis, it will reply to the 
sender agent an "answer(modify)" containing its 
new hyt)othesis. 
Not(:: when an hypothesis is (:ontirmed and 
withdraw by different agents, the re.jection of (;tie 
hyl)othesis will t)e retained. 
4: Example of complex linguistic 
phenomena processing: 
The sentence to process is: "Should (V) I (Y) 
correct (Fadj/V) the (D) t)apcr (Fnoun/V) and 
(C(intra / inter)) address (Fnoun/V) hiln (Y)?" 
Tit(.' process of this interrogative sentence will 
begin by the. sending of the sentence transformed 
in an atIbmative form and pret)rocessed; Tilt: tbl- 
lowing messages will be sent: 
SEND (Pret, \]i'ansf; hfform, Asse.rt; \[Sen- 
tence="Should I correct the paper and address 
him'}"\]) 
SEND Cl\]'ansf, Broadcast; Inform, Assert; \[Sen- 
tence .... I should correct the paper and address 
hiln", QTO\]) 
SEN1) (Pret, Broath:ast; lntbrm, Assert; \[Sen- 
tence="I should corre.ct the paper att(l address 
him", QTO\]) 
'Hmn Morph agent process the disamt)iguisation 
with the linguistic conl;extual laws t)resented in 
the tirst part. It will tind: "I (Y) should (V) cor- 
re.ct (V) the (i)) paper (Fnoun) and (C(intra / 
inter)) address (Fnoun,V)him (Y)?" The cooper- 
ation between the morphological and synt;mtical 
leve.ls can start; Morph send first, all the sure in- 
t'orinal;ions: 
SEND (Morph, Synt; In- 
form, Assert; \["I:--Y","shouhl= V", "(:orrect---V", 
"the--D", "paper = F", "and = C", "him =Y"I) 
Then, the coordinator "and" can be viewed by 
the segmentation (Segm) as a proposition coor- 
dinator (inter-prol)osition coordination) and by 
the coordination (Coord) as a nominal syntagln 
(noted SN) coordinator (intra-proI)osition coordi- 
nation), l/ut, after the disamt)iguisaton of "ad- 
dress" the Coord agent will change his I)oint of 
view. The sending of messages can be done like 
billowing: 
1- R1 SEND (Morph, (Coord, Seam); Re.quest, 
/; \["and"=C(intra/inter)\]) 
2- Ill SEND (Mol't)h , Synl;; Request, Propose; 
\[" address" :F,V\]) 
3- R2 SEND (Seam, Morph; Ih'.quest, /; 
\[nb_verb_(:onj ug --- ?\] ) 
4- 11.2 SEND (Morph, Seam; Answer, Assert; 
\[nb_verb_con.i ug -- 2\]) 
5- 1/.1 SEND (Seam, Morph; Answer, Assert; 
\["and"=C(inter)\]) 
6- I1 SEND (Segln, (Synt, Coord, Ellip); In- 
form, Assert; \["aild":C(inter)\]) 
7- I2 SEND (Synt, (Coord, Segm); hfform, As- 
sert; \["the paper"=SN\]) 
8- H1 SEND (Synt, Morph; Answer, Assert; 
\["address" =V,F\]) 
9- I3 SEND (Synt, (Se.gln, Coord); hfform, As- 
sert; \[" address" =V,F\]) 
10- I4 SEND (Ellip, (Synt, Seam); Inforin, As- 
sert; \[eJlips_subjeet =" I"\]) 
11- I5 SEND (Ellip, (Synt, Segm); Inform, As- 
se.rt; \[ellips_c.o.d="The paper"\]) 
12- R1 SEND (Coord, Morph; Answer, Assert; 
\[" and"< > C(intra)\]) 
13- 16 SEND (Coord, (Seam, Ellip, Synt); In- 
form, Assert; \["and"<>C(intra)\]) 
1153 
Legend: Hi : is the hypothesis i on which the 
agents have to work. 
Ri : is the information request i at which the 
agents have to answer. 
Ii : is the information sending i. 
In fact, this is an example of a possible devel- 
opment of the interaction protocols by the agents 
concerned by the coordination phenomena. But 
the use of pseudo-parallelism and asynchronous 
sending of messages can provide different sending 
of messages. 
5 Conclusion 
In this paper, we have proposed a method to 
solve some ambiguities and some complex lin- 
guistic phenomena in a TALISMAN Multi-Agent 
System. To allow cooperation and resolution of 
conflicts, we have developed interaction protocols 
adapted to the needs of a natural language pro- 
cessing system for written french. These interac- 
tion protocols allow cooperation and resolution of 
conflicts that appear at one time in the system, 
particularly during complex linguistic phenomena 
treatment. 
Currently, we are integrating the prototype lin- 
guistic agents (which implement different types of 
coordination, negation and ellipsis) in order to val- 
idate the developed protocols. The implementa- 
tion is realized with Prolog II+ on an IBM Work- 
station. After the implementation, we will be able 
to evaluate and possibly refine the cooperation 
and conflicts resolution methods that have been 
developed. 
References 
Benoit P., Rincel P., Sabatier P., Vienne D. A 
user-friendly natural language interface to or- 
acle. In Proccedings of the European Oracle 
Users' group Conference, Paris, 1988. 
Berrendonner A., "Grammaire pour un analy- 
seur, aspects morphologiques", Les Cahiers du 
CRISS, N 15, Novembre 1990. 
Binot J.L., Demoen B., Hanne K., Solomon L., 
Vasiliou Y., von Hahn W., and Wachtel T. LO- 
QUI: a logic oriented approach to data and 
knowledge bases supporting natural language 
interaction. In Proceedings of ESPRIT 88 Con- 
ference, 1988. 
Aho A. V., Corasick, M.J., " Efficient string- 
matching: An aid to bibliographic search", 
Communications of the ACM, Vol. 18, N6, June 
1975. 
Erli. SAPHIR : manuel de description du logiciel. 
Technical report, Socit rli, 1887. 
Esprit Project 2474. MMI2. Common meaning 
representation by D. Sedlock. Report Bim/13, 
October, 1989. 
Koning J.L., Stefanini M.H., Demazeau Y. "DAI 
Interaction protocols as Control Strategies in 
a Natural Language Processing System". Pro- 
ceedings of IEEE International Conference on 
Systems, man and Cybernetics, Vancouver, Oc- 
tober, 1995. 
Maegaard B., & Spang Hanssen E., "Segmenta- 
tion automatique du frangais ~crit", documents 
de linguistique quantitative, Dunod 1978. 
A. Mekaouche, J.C. Bassano,Aimlyseur lingnis- 
tique multi-expert pour la recherche documem 
taire. Avignon, Mai 1991. 
F. Pereira, "The TEAM Natural-Language Inter- 
face System", Final Report, SRI International, 
Menlo Park, 1985. 
Thomson B., Thompson F. : "ASK is Trans-. 
portable in Half a Dozen Ways". ACM Trans- 
actions on Office Information Systems, Vol 3, N 
2, April 1985. 
Searle J.R., "Speech Acts" by Cambridge Univer- 
sity Press, 1969. 
Sian S.S., Adaptation based on cooperative learn- 
ing in multi-agents systems". In Proceedings 
of the European workshop on Modelling Au- 
tonomous Agent in a Multi-Agent World, 
MAAMAW 1990. 
M.H. St~fanini TALISMAN : un syst~me multi- 
agent pour l'analyse du fran§ais ~crit, Thhse de 
Doctorat, Jan. 1993. 
P. Zweigenbaum, Helene: ComprShension 
de compte-rendus d'hospitalisation. Deuxi~ine 
Ecole d'St6 sur le Traitement des Langues Na- 
turelles. L'ENSSAT, Lannion, Juillet 1989. 
1154 
