File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2210_metho.xml

Size: 15,667 bytes

Last Modified: 2025-10-06 14:14:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2210">
  <Title>A DISTRIBUTED ARCHITECTURE FOR TEXT ANALYSIS IN FRENCH: AN APPLICATION TO COMPLEX LINGUISTIC PHENOMENA PROCESSING</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
A DISTRIBUTED ARCHITECTURE FOR TEXT
ANALYSIS IN FRENCH: AN APPLICATION TO
COMPLEX LINGUISTIC PHENOMENA PROCESSING
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Most Natural Language Processing systems use a sequential architecture embodying classical linguistic layers. When one works with a general language and not a sublanguage, there are different cases of ambiguities at difterent classical levels; and more particularly when one works on COml)lex language t)henomena analysis (coordination, ellipsis, negation...) it is ditfic.ult to take into account all the different types of these constructions with a general grammar. Indeed, the inconvenience of this approach is the possible risk of a combinatory explosion.</Paragraph>
    <Paragraph position="1"> So, we have defined the TALISMAN architecture that includes linguistic agents that corrost)ond either to classical levels in linguistics (morI)hology, syntax, semantic) or to coml)lex language phcnolnena analysis.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Tile goal of this paper is to show that complex linguistic phenomena like coordination, ellipsis or negation, call be defined and processed in an distributed architecture. In tile processing of a very large corpus, the problem is to find an apt)roach allowing tile best interaction between different knowledge levels (morphological, syntactic, semantic...) in order to reduce the generation of tile ambiguities, that occur within any general system of sequential analysis.</Paragraph>
    <Paragraph position="1"> Most NLP systems use a sequential architecture embodying classical linguistic layers. Among them one can find systems for English analysis such as ASK \[Thomson 85\], LOQUI \[Binol, &amp; al 85\], TEAM \[Pereira 85\] and for Prench analysis such as SAPHIR \[Erli 87\] or LEADER \[Benoit &amp; el. 86\]. Due to l;he necessity for cooperation between differents modules, we have turned ourselves to the technics of multi-agents systems for tile construction of TALISMAN architecture \[Stefanini 93\]. This system also uses linguistic models of the CRISTAL system \[MMI2 89\]. Tile TAMS-MAN architecture includes linguistic agents that correspond either to classical levels in linguistics (morphology, syntax, semantic) or to complex language phenomena analysis (coordination, ellipsis, negation...).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="1151" type="metho">
    <SectionTitle>
2 Ambiguities in text analysis
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="1151" type="sub_section">
      <SectionTitle>
2.1 Examples of ambiguities at different
</SectionTitle>
      <Paragraph position="0"> levels in the CRISTAL system: In NLP, when on(', works with a general language and not a sublanguage, there are different cases of ambiguities at different classical levels.</Paragraph>
      <Paragraph position="1"> Preprocessing: the characters are standardized and tile text is cut; into forlns. So, tile lmnctualions can be ambiguous. For example, a fllll stop (:all indicate an abbreviation or the end of the sentence. M. Clavier (prot)cr noun/common noun) Morphology: the text forlns are processed individually by the morphological analyser \[Aho &amp; Corasick 1975\] that attributes one or more interpretations to each in terms of a pair (lexical entry, category).</Paragraph>
      <Paragraph position="2"> One of the difficulties is to find tile verb in tile homonymous sequence with D/Y (determi null nant/preverbal) F/V (noun/verb). It is possible to predict either the beginning of a noun phrase (SN) or a verbal phrase (SV).</Paragraph>
      <Paragraph position="3"> Exalnple: Pilots (l) like (2) flying (3) planes (1) can (1) be dangerous.</Paragraph>
      <Paragraph position="4"> (1) Pilots, planes, can are either be verbs (to pilot/to plane) or nouns (a pilot/a t)lane/a (:all of bee,-) (v/F). (2) like is either a verb (to like) or a preposition (like) (V/P). The cooperation between agents in  the Talisman system is detailled in \[Koning &amp; al 9q. Syntax: A general grammar has rules which interfere with other rules. For example: N&amp;quot;-&gt; N&amp;quot;N&amp;quot; enat)les to tmilt N&amp;quot; resulting fi'om the concatenation of two N (noun or adjective)&amp;quot;. This rule allows to construct the juxtaposition of noun phrases. Example: Le lyee Louis (F(nom,ppr) Le Grand. But this rule also is applied in tile following example: On associe h \[chaque gtudiant\]SN  fun num&amp;o de carte\]SN.</Paragraph>
      <Paragraph position="5"> Semantics: there are notions of ambiguity and paraphrase. The inodals can be paraphrased in a variety of ways. Example: he may come / it's possible that he comes/will come/I permit/authorize/empower him to come.</Paragraph>
    </Section>
    <Section position="2" start_page="1151" end_page="1151" type="sub_section">
      <SectionTitle>
2.2 Disambiguisation methods
</SectionTitle>
      <Paragraph position="0"> An ambiguity appears when several solutions are possible for the same problem. These ambiguities are produced by a module or are the consequence of different analysis modules.</Paragraph>
      <Paragraph position="1">  disambiguisation: We advocate the use of local grammars for some disambiguisation of several solutions produced by a module. For example, we can use contextual laws for some morphological disambiguisation. Indeed, the following laws are always valid for written fi'ench analysis:  These laws can be viewed as partial sohttions for combinatory explosion.</Paragraph>
      <Paragraph position="2">  In some cases, the interactions between different modules allow a faster disaInbiguisation. Indeed, an agent can use tile knowledge of another agent when needed.</Paragraph>
      <Paragraph position="3"> For example, during the morphological and syntactical analysis of the sentence &amp;quot;I(Y) want(V) the(D) c-mail(F) address(F or&amp;quot; V)&amp;quot;, interactions between MORPH and SYNT are useful. MORPH will send all the sure morphological informations to SYNT; MORPH will propose tile two morphological interpretations for &amp;quot;address&amp;quot;. SYNT will immediatly reject the &amp;quot;address&amp;quot; = Verb solution in this sentence, thanks to its knowledges.</Paragraph>
      <Paragraph position="4"> In other cases, cooperation between agents is needed when two agents produce different solutions for the same problem. For example, the form and can be viewed as a syntagm coordinator or as a proposition coordinator.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1151" end_page="1153" type="metho">
    <SectionTitle>
3 Distributed approach for
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1151" end_page="1151" type="sub_section">
      <SectionTitle>
Natural Language Processing
</SectionTitle>
      <Paragraph position="0"> Blackboards have been applied in linguistics to Speech Understanding Systems \[Erman 80\] and more recently to the analysis of written French (HELENE \[Zweigenbaum 89\], CARAMEL \[Sabah 90\]) and documentary research \[Mekaouche 91\].</Paragraph>
      <Paragraph position="1"> The global control of these systems is fully centralized: the distribution of the reasoning capabilities enforces the maintenance of a global representation that is c.oherent and thus requires the use of belief revision mechanisms. Architectures based on direct communication between agents allow complete distribution of both knowledge control and distribution of partial results.</Paragraph>
      <Paragraph position="2"> We will briefly report, on the agent and on agent; society concepts as they are defined in \[Stefanini 93\]. A linguistic agent; can be divided into two main parts: its knowledge representation and its knowledge processing. Knowledge and goals can be given or acquired through coInmunication with other agents.</Paragraph>
      <Paragraph position="3"> At present, the society in the TALISMAN application is represented by the following linguistic agents: PRET for preprocessing, MORPH for morphological analysis, SEGM for splitting into clauses \[Maegaard &amp; Spang Hanssen 78\], SYNT for syntactic analysis, TRANSF for transformations of utterances (interrogatives, imperatives, etc...) in declarative clauses, COORD for coordinations, NEGA for negations and ELLIP for ellipses. These agents arc describe.d in details iil \[Stefanini 93\]. There are different types of decomposition: Knowledge decomposition by abstraction (PRET, MORPH, SEGM, SYNT...), task decomposition by type of input (COORD, NEGA), task decomposition by type of output (ELLIP).</Paragraph>
      <Paragraph position="4"> Tile TALISMAN system is based on direct con&gt; munication between agents and thus uses mailboxes for sending messages with an asynchronous mode of communication. Speech acts \[Searle 69\] are usually used to comnmnieate in a Multi-Agent System. Intentions of the sender are expressed in a common eomnmnication language. The possibh; interactions between agents during a conversation have to be regulated, this is clone by means of interaction protocols.</Paragraph>
      <Paragraph position="5"> In the TALISMAN system, tile communication language and the interaction protocols are based on the work of Sian \[Sian 90\].</Paragraph>
    </Section>
    <Section position="2" start_page="1151" end_page="1152" type="sub_section">
      <SectionTitle>
3.1 Messages
</SectionTitle>
      <Paragraph position="0"> In the systetn, an agent willing to send a message will use the following message format: ((sender, receiver(s)), (performative, for(:(;), content).</Paragraph>
      <Paragraph position="1"> Tile name of tile sending agent enhances the message understanding and the answer. The sender should determine the addressee agent(s) with the help of its knowledge about the other agents; if he has none, he will send the message to every agent in tile system.</Paragraph>
      <Paragraph position="2"> The performative of the message is either a simple sending information, a request or a reply. However, these types of messages do not suffice to express all tile intentions agents may have. We  have &amp;quot;used&amp;quot; the. comnnmieation language (leveloped by Sad Sian because it is adapted to the eomlnulfication neetts of the system. This communication language figures out 9 forces : propose, modify, assert, agree, disagree, noopinion, confirm, accept and withdraw.</Paragraph>
      <Paragraph position="3"> We will not use the force. &amp;quot;accept&amp;quot; that requires the agreement of every agents. We also did not use the forces &amp;quot;agreed&amp;quot; and &amp;quot;disagreed&amp;quot; because our agents only have reliable information.</Paragraph>
      <Paragraph position="4"> The l)ropositional content is tbrmulated in the knowledge ret)reselfl;ation language of the agellt.</Paragraph>
    </Section>
    <Section position="3" start_page="1152" end_page="1153" type="sub_section">
      <SectionTitle>
3.2 Cmninunication I)rotocols
</SectionTitle>
      <Paragraph position="0"> An interaction i)rotoeol is a set of rules containing t;he 1)ossible intt:ractions during a conversation; it provides strategies for t)rol)leln solving due to the co-existence of several agents in tile same system. For the (:ooperatioi~ t)etween agents, we have adapted the protocol of Sian to the neetls of a natural language processing syst(;m for written frent:h.</Paragraph>
      <Paragraph position="1"> Our i)rotocols will use tit(: language communication detined al)ove. Sian's protocol will be sin&gt; 1)litied aim det:omposed for better understanding into three l)roI;o(:ols: - mt assertion proto(:oh this l)rotoc()l allows a,gents to send t)artial or eomt)lete results to the concernetl agents; it is use.(t when an agent has only one sohltion or when the work of an agent is tlnished.</Paragraph>
      <Paragraph position="2"> - an information request protocol: this t)roto col allows an agent (;o ask a t)recise qtlestioll to Olle or liloI'O ~tg;ents. If (;he receiver (:&amp;n a.nswer, i(; will send an &amp;quot;Answer(Assert;)&amp;quot;, otherwise an &amp;quot;Answer(Noopinion)&amp;quot; (i.e if the agent can not answer or does not understand the question).</Paragraph>
      <Paragraph position="3"> - a cooperation request protocol: this prot;oeol allows atl agen(; to ask one or nlore agents to (:()operate with it in order to solve the conflict it has crt:ated: it has l)roduc.ed several sohltions for (;he same. t)rol)lem and the other agents have to COl&gt; tirm or rejet:t i(;s hyt)othesis. An agent will answer noopinion if it; (:~m not answer or if i(; does not, m&gt; derstand the question; it; will confirm the hyI)othesis if it obtains a positive evaluation of it and it will withdraw it in case of negative evaluation. If the receiver's agent obtains a negative evaluation and has another hypothesis, it will reply to the sender agent an &amp;quot;answer(modify)&amp;quot; containing its new hyt)othesis.</Paragraph>
      <Paragraph position="4"> Not(:: when an hypothesis is (:ontirmed and withdraw by different agents, the re.jection of (;tie hyl)othesis will t)e retained.</Paragraph>
      <Paragraph position="5"> 4: Example of complex linguistic phenomena processing: The sentence to process is: &amp;quot;Should (V) I (Y) correct (Fadj/V) the (D) t)apcr (Fnoun/V) and (C(intra / inter)) address (Fnoun/V) hiln (Y)?&amp;quot; Tit(.' process of this interrogative sentence will begin by the. sending of the sentence transformed in an atIbmative form and pret)rocessed; Tilt: tbllowing messages will be sent: SEND (Pret, \]i'ansf; hfform, Asse.rt; \[Sentence=&amp;quot;Should I correct the paper and address  with the linguistic conl;extual laws t)resented in the tirst part. It will tind: &amp;quot;I (Y) should (V) corre.ct (V) the (i)) paper (Fnoun) and (C(intra / inter)) address (Fnoun,V)him (Y)?&amp;quot; The cooperation between the morphological and synt;mtical leve.ls can start; Morph send first, all the sure int'orinal;ions: null SEND (Morph, Synt; Inform, Assert; \[&amp;quot;I:--Y&amp;quot;,&amp;quot;shouhl= V&amp;quot;, &amp;quot;(:orrect---V&amp;quot;, &amp;quot;the--D&amp;quot;, &amp;quot;paper = F&amp;quot;, &amp;quot;and = C&amp;quot;, &amp;quot;him =Y&amp;quot;I) Then, the coordinator &amp;quot;and&amp;quot; can be viewed by the segmentation (Segm) as a proposition coordinator (inter-prol)osition coordination) and by the coordination (Coord) as a nominal syntagln (noted SN) coordinator (intra-proI)osition coordination), l/ut, after the disamt)iguisaton of &amp;quot;address&amp;quot; the Coord agent will change his I)oint of view. The sending of messages can be done like  Legend: Hi : is the hypothesis i on which the agents have to work.</Paragraph>
      <Paragraph position="6"> Ri : is the information request i at which the agents have to answer.</Paragraph>
      <Paragraph position="7"> Ii : is the information sending i.</Paragraph>
      <Paragraph position="8"> In fact, this is an example of a possible development of the interaction protocols by the agents concerned by the coordination phenomena. But the use of pseudo-parallelism and asynchronous sending of messages can provide different sending of messages.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML