File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1074_metho.xml

Size: 23,378 bytes

Last Modified: 2025-10-06 14:12:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1074">
  <Title>SAGE : a Sentence Parsing and Generation System</Title>
  <Section position="1" start_page="0" end_page="359" type="metho">
    <SectionTitle>
SAG~ (Sentence Analysis and GEneration system)
</SectionTitle>
    <Paragraph position="0"> is an operational parsing and generating system. It is used as a Natural Language Frontend for Esprit project Eeteam-316, whose purpose is to advise a novice user through a cooperative dialogue.</Paragraph>
    <Paragraph position="1"> The aim of our system is to validate the use of a Lexice)n-Grammar (drawn from the LADL studies) ior sentence-parsing and generation, and to imple~ ~aent )~nguistic knowledge in a declarative way u~ing a formMism based upon Functional Descriptions (FD). We have ales devvloped the parser and the g~ueratio~ module so that they share informa~ tion~ and knowledge bases as much as possible: they work on the same semazLtic dictionary and the same linguistic knowledge bases, except th~ they kave their own graznmar. We have also ires p|emented a tracking of semantic objects that have been f~istantiated during a dialogue session: the so-called Token History is provided for semantic refvrence and anaphor resolution during parsing and for pronoun production during generation.</Paragraph>
    <Paragraph position="2"> After introducing to Esteam-316, this paper de,cribelJ the ~nguistic knowledge bases required by SAGE, and then foctmes on the Generation Modnleo Sv.ctlon 4 explains how pronouns are handled.</Paragraph>
    <Paragraph position="3"> The last section is a brief evaluation of our present WO1&amp;quot;k o :t ~introduction to the applicatioli of SAGE _ ne pro'sing and generating system described here ~ used as a Natural Language Frontend for F~., prlt project Esteam-316, which is an Advice-Gi~ ring system \[Decitre 871 \[Bruffaerts 86\]0 A cooperative interactive Man-Machine Interface carries out dialogue functionalities such as recognition of user queries, explanation of domaln~concepts, explanation of solutions proposed by the Problem Solver, etc. To describe it briefly, this Dialogue Manager handles the pragmatic level of a dialogue, wherea~ the Natural Language Frontend SAGE deals with linguistic inferencesdeg The chosen l~guage is }~nglish. null The Dialogue Manager and SAGS ~h~re the same lemantic objects, using a formalism b~ed upon Functional Descriptions (FD~) \[Kay 81\]. The Parser of SAGE extracts the met~uing of the user's query mad represents it with nested FDs. Ox~ the other hand, the Dialogue Manager sends the Generator FDs which describe the semantic conte, nts of the answer.</Paragraph>
    <Paragraph position="4"> Our previous work \[Lancel 86\] was based on a unique dictionary and a granunar shared by both a parser and a generation module. The grammar formalism required the mixing of syntactic and semantic informations in the same structure, which implied the complete rewriting of grautmar when changing from one application domain to another.</Paragraph>
    <Paragraph position="5"> It could not handle transformational processes such as interrogative and imperative transformationsdeg The system presented here fulfills the four follow- null ing requirements: 1. definition of linguistic knowledge bases quitdeg able for both parsing and generation; 2. integration of lexicon-grammar theory into the previous fortnalism, in order to provide precise syntactic information; 8. modulari~ation: a change of application  should not lead to a complete rewriting, but only to an extension of the semantic levels; 4. proper pronoun h~dling, both when parsing I reference resolution) and when generating pronoun synthesis). The section 2 describes the linguistic dictionaries of SAGE. The section 3 explains how those dictionaries are exploited by the generation module. In the section 4, we will detail what kind of processes are required by pronoun handling.</Paragraph>
  </Section>
  <Section position="2" start_page="359" end_page="361" type="metho">
    <SectionTitle>
2 Linguistic knowledge base
</SectionTitle>
    <Paragraph position="0"> for parsing and generation There are three linguistic levels handled by our sys~ tern: morphological, syntactic, and lastly semantic. The first one will not be explained here, since the ost innovative aspects of our linguistic knowledge es are provided by the two other levels: we are able to take into account a wide range of constructions of a given language using the lexicon~ grammar and we use a totally declarative formalism. null</Paragraph>
    <Section position="1" start_page="359" end_page="359" type="sub_section">
      <SectionTitle>
2.1 Parsing versus generation
</SectionTitle>
      <Paragraph position="0"> The main feature of SAGE is that the Parsing and Generation processed are carriedout using the same dictionaries.</Paragraph>
      <Paragraph position="1"> These dictionaries are interpreted by two separate grammars, one for parsing and one for generation, both of them being language-dependent but not lencies, with differeht levels of correctne~ for pa~ sing and generation. This allows a very wide ra~tge of sentence structures in generation, and semantic inferences to avoid ambiguities. The LADL Le~io con-Grammar covers nearly all French constr~c~ tione. As far as we know, an equlvaIent amount of work is still not available for English. There~ fore, we developed a Lexicon~Grammar cont~mg a few English verbs and nouns. The corresponding constructions are drawn from \[LONGMAN 81\].</Paragraph>
      <Paragraph position="2"> To give an idea of this lexicon-grammar, we presen~ below the information stored for the verb tuanto  o The subject must be a bureau being; theredeg fore, it is a~lowed to be a noun group, but not clause or a verb phrase; , The direct object may be a human being ~a *The mother wants her ch~d ~, a non-human eno tity as in &amp;quot;He wants tim~, or a thai-clause ~ in &amp;quot;Mary wants that John settles down in PariS; (r) The ~hat-clau~e can be reduced in the follow~ ing forms: - \]Noun group -~ Adjective\] or NAd~ if the verb hf be (e.g &amp;quot;The teacher wants the ~ze~is~ ready /or tomorroef ); - \[Verb at the complete infinitive form -~ complements\] or To Vin/O if the concept of the subdeg ject of this clause is the same as that of the subo ject of want (e.g. ~Mary wants to settle down in - \[Noun group ~ Verb at the complete infinio tive form -\[- complements\] or NTo Vinf when the two subjects are different (e.g. &amp;quot;Mary wants h~r friend~ to settle down in Pari~.~); (r) The whole cl~nse may be transformed into  the passive form.</Paragraph>
      <Paragraph position="3"> domain-dependent. This is a major conclusion drawn ~ .. - . . for she sake of read~bifity and maintenaace, vexbs from our studies: a parser and a generatton moo- are so,ted into different tables. One table epecio ule can hardly share the same grammar rules, for the heuristics required by these two processes are fundamentally different. Unlike parsing, a generation process has nothing to do with a sequence of &amp;quot;left-to-right ~ procedures \[Danlos 87a, Danloe 87b\]. Moreover, a given heuristic of clause tran~ formation is strongly dedicated to a parsing or to a generation process (see section 3).</Paragraph>
    </Section>
    <Section position="2" start_page="359" end_page="359" type="sub_section">
      <SectionTitle>
2.2 Syntactic Knowledge Base
</SectionTitle>
      <Paragraph position="0"> ties several standard features, syntactic construco tions as well as valencies that are common to evo ery verb of the same table. For inJtaace, in ot~' lexicon-grammar, want belongs to the table t~o ble_NV8 whose standard construction is ~ hur~t~ subject in a noun group, with a non-humu direct object in a noun group, construction of which may be transformed into the paasive form.</Paragraph>
      <Paragraph position="1"> Here is how one construction of the verb stunt i~ coded, using Fenctional Descriptions (FD): This syntactic level is domain-independent: constructions of verbs, predicative nouns and adjectives along with their corresponding valency are listed in a lexicon-grammar.</Paragraph>
      <Paragraph position="2"> This lexicon-grammar is based on the theory deveL oped by \[Gross 75, Gross 86\] and the studies carried out by the LADL on French constructions. It pro~ rides accurate specifications of the acceptable va~</Paragraph>
      <Paragraph position="4"> The texical codes (NTo Vin/, To VinfO and NAd3) are specified in a FD, stating the conditions of validity of the code and the consequences on the com~ ponez~Lts: To VinfO should be chosen if the subject of the current sentence and that of the main clause represent the same concept; in this case the subject shouM be omitted and the verb should be in the infinitive form.</Paragraph>
      <Paragraph position="5"> The ~umeric values ranging from 0 up to 100, is a coefficient on the correctness of the corresponding constl~ctions. When generating a syntactic component, lexical codes that axe allowed are the ones .4 with a coefficient greater than a certain value, 70 in our implementation. When parsing sentences, accepted constructions would be of a coefficient greater than another milestone, 30 for instance.</Paragraph>
      <Paragraph position="6"> The values 0, 30, 70, 100 are of course quite arbitrary. But they allow the parsing of constructions that ~xe often understood by most of the people but m'e syntactically incorrect: the corresponding lexica~ codes will have a coefllcient between 30 and 70.</Paragraph>
    </Section>
    <Section position="3" start_page="359" end_page="359" type="sub_section">
      <SectionTitle>
2.3 Semantic knowledge base
</SectionTitle>
      <Paragraph position="0"> The semantic level is highly domain-dependent since it dents with concepts. The application domain chose~ by the Esteam-316 project is financial advice-giving for non-expert users. Therefore, the Man-Machiue interface handles intention concepts such as *serface_requeet and *surface.inform which are the intention o/asklng/or something and the intention o/stating something, financial concepts such as *emergency_fend which is a certain amount of money available at ans/ time and provided for emergency c~ee, and lastly domain-independent concepts such as *t0ant ~.</Paragraph>
      <Paragraph position="1"> Those concepts are organised in a semantic network, using the links/s_a and ezample. Moreover, the semantic sub-items are specified in a aehemo.</Paragraph>
      <Paragraph position="2"> For instance, the concept *want is specified by:</Paragraph>
      <Paragraph position="4"> ZThe c~osen convention is to put a star a{ the beginning of a concept identifier, but this ls purely for the sake of readability.</Paragraph>
      <Paragraph position="6"> As seen in section 2.1, the semantic objects ac~ tually handled by the user and system during a dialogue are called token&amp; Inside the system, too kene are instances of concepts -- or more precisely d schemata ~.</Paragraph>
    </Section>
    <Section position="4" start_page="359" end_page="361" type="sub_section">
      <SectionTitle>
2.4 Link between concepts and syn-
</SectionTitle>
      <Paragraph position="0"> tactic structures Mapping between semantic schemata and syntactic structures is ,pecified in FDs named llnguiatlc def. initions, This is am important feature of our KB: it is the linguistic definitions that make explicit the correspondance between token slots and syntactic components of sentences, clauses and noun groupsdeg Using them, the same token may be synthesised as a noun phrase or a clause, according to syntactic constraints. A noun phrase or a clause require different grammar rules in the generation process.</Paragraph>
      <Paragraph position="1"> For instance, let us consider the following token :</Paragraph>
      <Paragraph position="3"> The last two FDs shown above are the syntactic structuru produced by two different linguistic defo initions linked to the same concept *tra~sasKono The choice between the two is made by the generation module either under semantic constraLnts de. clared in the semantic dictionary, .or under linguistic restrictions specified by the generation grammar, or by the lexicon-grammar.</Paragraph>
      <Paragraph position="4">  Linguistic definitions do not only allow the synthe~ sis of totally different schemata using the same generation grammar rules, but also provide the parser with extended capacities for handling complex noun phrases or sentences and for extracting a specific meaning with the specific slot identifiers (b~yer~ object, year) out of a standard syntactical construction of the noun~ or verb--predicate.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="361" end_page="363" type="metho">
    <SectionTitle>
3 Generation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="361" end_page="363" type="sub_section">
      <SectionTitle>
3.1 General heuristic of the Gener-
ation Module
</SectionTitle>
      <Paragraph position="0"> The generation process is top-down, with backtracking. The generation algorithm consists of building a complex object of several nested FDs recursively. null The highest level deals with the surface syntactic form: assertion, question, order. This level corresponds to the intention concepts like *surface_request Then comes the inner structure of the sentence: generally speaking, a subject, a verb and objects with several optional adverbials. This corresponds to doma~n-~oncepts (e.g, *smerfen. cy_/~nd} or general concepts (e.g. *~ant). LMtly there is the noun group structure with preposition, determiner, noun. There is ~ specific grammar rule for each level.</Paragraph>
      <Paragraph position="1"> Briefly, a gr~nmax rule specifies under what con~ ditions a given rule may be applied, what kinds of rules are to be chosen for the synthesis of each Syntactic Component, and what actions are to be carried out on the structure (such as choosing the number and person of a verb according to the sub-ject within a sentence}.</Paragraph>
      <Paragraph position="2"> # * The current level is built in a loop starting from its semantic contents (a token}: through the concept corresponding to the token, the interpreter chooses a linguistic definition, then a syntactic structure in the lexicon-grammar.</Paragraph>
      <Paragraph position="3"> These FDs plus the corresponding grammar rule are functionally unified ~ with the current object. Then, one syntactic code such as To Vin\]O or ~ou~- phrase is chosen according to the grmmmar rule and the validity condition of the code. The FD of th~ code is unified with the current object.</Paragraph>
      <Paragraph position="4"> This is where our declarative KBs baaed on Functional Descriptions prove to be ei~cient. The smn~ heuristic based on functional unification is used for totally different structures such as noun phrase or clause. Therefore, this loop is allowed to be totally recursive.</Paragraph>
      <Paragraph position="5"> ~n the meaning of yunctional ~nificai~on \[Kay 81\]. At this stage of the process, the generation modulo may add several modifiers to the current level~ that are adverbi~d~ in sentences, or adjectives in ~oun groups: this adjunction is a\]~o carried through f~u~co tional unification since the modifiers ~e also d~ scribed in a FD just like any grammsx rule or le)~ ical code.</Paragraph>
      <Paragraph position="6"> For instance after functional unifications, the cur~ rent syntactic component corresponding to ~\[ ~a~ ear&amp;quot; is:</Paragraph>
      <Paragraph position="8"> Then trmmfomations are processed whenewr they are needed, such as for questions (which puts the verb in the interrogative form and inserts an ~uxiliary verb before the subject), or negations or pa~ sive transformations, ~ranaformations are speci~ fled in FDs similar to grammar rules, with validity conditious and actions,,but also with a specific slot stating whether they must be applied befm'e or ~ter the standard grammar rules.</Paragraph>
      <Paragraph position="9"> Thk synthesk loop k carried out on every nynt~co tic suN-component, that is for instance on subject~ verb ud objects of a clause.</Paragraph>
      <Paragraph position="10"> If every sub-component is cmTectly synthesised in turn, the actions of the global rule are applied o~ the current component.</Paragraph>
      <Paragraph position="11"> Other tran~ormations may be cL~ri~cl out, lead~ ing only to the re, ordering of objects in a clauM, which may depend on whether the objects ere ex~ pre~ed through pronouu. A ditransitive/dat~ve transformation is a perfect example: st~rtlng from a sentence whose meaning is &amp;quot;The poslman f~ea \]t~c~'y l~.e letts~,, the final sentence may becmn~ ~The postman gi~es her the l~teP or ~The po~o.</Paragraph>
      <Paragraph position="12"> ma~ fg~ea it ~o Marl or &amp;quot; The postmaa gi~es i~ ~ heP .</Paragraph>
      <Paragraph position="13"> There ends the body of the loop. If a failure o~ curs during this loop, backtracking choose~ another linguistic definition ~d/or .another grammar ruleo  texte~ generatiox~ ~PShC/ fC/,llowing token, ~tade of several n~ted toke~, sya~heahed ~ a~nder athat delay d~ you wat~ ~oar ~t~erCen~ fund a~ailablef~ i~:</Paragraph>
      <Paragraph position="15"> The ,~ombintgio~ of the tokems %ur/aee.~cqge~ a~d '~in/o~c~.r~/producC/~ a Wh-questiono The question focut~ on the delay of the token *emercency~und ~di,~r,d by ~he special object %akno~: ~his tr~ffm~m the ~IverbiM of delay of the ~trncture ~th~ ~n~rgeney fund is available ~t dd d~y~ into gh~ ~po~nd ~nt~rrogative pronoun ~der t~lmft d~l~, ~h~ ~t~_~c~ga~ive pronoun b~ moved to th~ b~g~,~ing ~' th~ ~ntence, coming from the n~ted ae~c~ fu~ a~a~bl~ f. As the verb of the ela~ C/gpr~:a~ing the token %mtrf/sgty..fu~gd is ~o ge, the co.traction adopted for the direct object of the verb ~#s~t is NAd~: th~ verb C/0 h~ is removed. The p~s~,ive your is synthesised from the slot a~en~ of th~ token %m~rqenc~/.~endo ronoun handling par o and generatio ~th~ deg ~guh~ic ~brmation ~ needed for reference x~so\]utlOno The ehsracteristic~ stored for each token are the tm~t ~mnber within the dialogue (~ turn ie over whenever one of the two locutors ha~ fiuished speaking), the sentence number within the tm~n~ the locutor (during pa~sing, the locutor is the ~mer, where~ during gen~ation, ~t is the system), the type of the token (entity or relation)~ and the linguistic C/xpregsion (noun phrase, pronoun, demormtrative pronoun, clause).</Paragraph>
      <Paragraph position="16"> The Token History i~ updated by three proce~se~: the parsing module, the application (here the gsteam-316 Dislogue Manager) ~nd the generation ~odnledeg Of comae, it i~ vexy hnporg~nt for the Di~log~e Mauager that if one token h produced by the parsing of one sentence, then the geueratlon module would synthesi~ the s~ne sentence from the same token.</Paragraph>
      <Paragraph position="17"> After analysis of the user~ sentence, the History is updated with the tokens of the sentence, which are all fn~t com~idered as new. The Dialogue Manager receives the new tokens~ sometimea with ~ list of former tokens to which a given new one mW refer: a typical ease i~ when a pronoun/~ found in the u~r~ sentence; th~n the parser h~ to rc~olv~ ref erences on morpholo~c~ syntactic ~md semantic gronnds~ in order to prepare the dialcger~s pragmatic inference. It i~ the Di~ogue Man~.ger whir.h is in charge of defining the final sta~as of e~.ch new token through pragmatic inferences: when it corresponds to a pronoun, the toke~ to which it refer8 otherwise, whether it ~ a redefinition of token previously used, or a totally new one.</Paragraph>
      <Paragraph position="18"> If a sentence generation ~ueceeds, the generation module updates the History with the linguistic ~n~ formation of the synthesised tokens.</Paragraph>
      <Paragraph position="19"> 4o~ ~ro~oun ~ynthesis ~ro~ou~ handling requires the r~cord~ag of ~l gh~ ~;~n~:~eI~g ~C/~c~ (tokens)deg A token may be an en~y (e.g. aa instance of the concept %at) or g x'~s~io~ b~tw~en e~ties (e.g. an instance d yo~ made ~ ~on~ i~es~meW~o~ the tokens are the wrong inv~tm~nt~), ~ud also the relation introduC/,~d by Otag (C/Tou made zt wrong investments), ~nd the rC/lation cmg'esponding to the whole ~n-Du~i~g g d~bgu% the system record8 the~e token~ ~ g ~b~e~ llistory. Besides the token~ themselves, The generation grammar checks wheth~.r each item to be generated may be ~ynthesised by ~ pronoundeg The first step is to choose the appropriate prono~tno *J~he second step consists of verifying that the cho~ son pronoun will not be ambiguous for the u~er according to the History of Token~0 The computing of the morphological for~ of th~ pronoun and the checking of ambiguity ~re very complex and require the handling of ~emantic, nyw tactic and morphological constraintsdeg For precise explanations and comparison with other ~tudies, see \[Danlos 88\].</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="363" end_page="363" type="metho">
    <SectionTitle>
5 Evaluation of SAGE and its
</SectionTitle>
    <Paragraph position="0"> generation Module The parsing and generation grammar formalism are intended to support a changing from English to French. For instance, both the order of synthesis of the syntactic components of a clause or a noun phrase and the pronoun synthesis control are specified declaxatively. This allows the reusability and adaptability of this Natural Language Fron.</Paragraph>
    <Paragraph position="1"> tend through the creation of an adapted semantic dictionary and the extension of grammars, provided that the application is able to make inferences on semantic, or even pragmatical levels (which is the case of Eeteam-316 Dialogue Manager).</Paragraph>
    <Paragraph position="2"> SAGE runs on Sun workstations. It is able to parse complex assertions (I want to buy a car in rise years.), Yes/No questions (Could I put 500 dollars into my emergency-fund~), and ar.knowlegemen~ expressions ( Yes. No. OK.~.</Paragraph>
    <Paragraph position="3"> rt c~m synthesise complex assertions with infinitive clauses and adverbials, imperative sentences, Yes/No-questions, and Wh-questlons. The interrogative pronouns of Wh-questions may stem either from the main clause (as in What do you buyf) or from nested clauses (as in How much do you want to investf). As far as we know in the generation realm, it seems that the most similar work is the synthesis system PHRED citejacobs. Sentence production in PHRED is a. recursive process divided into three phases: 1) pattern-concept fetch-Lug, 2) pattern restriction, and 3) pattern interpretation. Their objectives axe similar to 1) the choice of a linguistic definition, 2) the verification of semantic distribution and the application of a lexical code on the Syntactic Component, 3) the generation of the syntact sub-components. Other studies (Danlos, McKeown, Appelt) are more related to the strategies for text production than to sentence generation heuristics.</Paragraph>
    <Paragraph position="4"> It can also synthesise complex assertions with infinitive clauses and adverbials, imperative sentences, Yes/No-questions, and Wh-questions. The interrogative pronouns of Wh-questions may stem either from the main clause (as in What do you buyf} or from nested clauses (as in How much do you t~ant to investf).</Paragraph>
    <Paragraph position="5"> Pronoun handling is currently developed.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML