File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1514_metho.xml
Size: 17,293 bytes
Last Modified: 2025-10-06 14:14:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1514"> <Title>An Object-Oriented Linguistic Engineering Environment using LFG (Lexical Functionnal Grammar) and CG (Conceptual Graphs)</Title> <Section position="4" start_page="99" end_page="101" type="metho"> <SectionTitle> 2 The LFG Environment </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="99" end_page="99" type="sub_section"> <SectionTitle> 2.1 Foundation: a LFG parser </SectionTitle> <Paragraph position="0"> According to the principles of lexical functional grammars, the process of parsing a sentence is decomposed into the construction of a constituent parts structure (c-structure) upon which a functional structure is inserted. C- structure construction is based on a chart parser, that allows the system to represent syntactic ambiguities (Kay, 1967), (Winograd, 1983). In order to be used within a LFG parser, a classical chart has to be complemented with a new concept: completed arcs (which represent a whole syntactic structure) have to be differenciated between completed arcs linked with a correct F-Structure, and those which are linked to an F- Structure that cannot be unified or that does not respect well-formedness principles.</Paragraph> </Section> <Section position="2" start_page="99" end_page="99" type="sub_section"> <SectionTitle> 2.2 Visualising the Chart </SectionTitle> <Paragraph position="0"> In the Chart interface, words are separated by nodes, numbered from 1 to numberOfWords + 1.</Paragraph> <Paragraph position="1"> Each arc is represented by a three segment polygon (larger arcs are above the narrower, for readibility reason).</Paragraph> <Paragraph position="2"> Active arcs are grey and positioned under the words. Completed arcs with uncorrect F-Structures are red and also placed under the words. Completed arcs with correct F-Structures are blue and above the words. Lastly, completed arcs with F-Structures that don't respect well formedness principles are grey and above the words. The user can select the kind of arc he is interested in. By clicking on an arc with the left button, the arc and all its daughters become green, thus showing the syntactic hierarchy. By clicking with the middle button, a iii suaJ saUdegn vldeg.,~ . (r)!</Paragraph> <Paragraph position="4"> menu appears within which one can choose to examine the applied rule or the F-Structures (see below for the corresponding interface).</Paragraph> </Section> <Section position="3" start_page="99" end_page="99" type="sub_section"> <SectionTitle> 2.3 Visttalising F-Structures </SectionTitle> <Paragraph position="0"> As shown in Figures 3 and 4, F-Structures are represented by attribute-value pairs (a value may itself be a F- Structure). In addition to such a graphical representation, a linear representation (more suitable for storing data on files or printing them) has been developed and it is possible to switch from one to the other. This allows us to keep track of previous results and to use them for testing the evolution of the system.</Paragraph> </Section> <Section position="4" start_page="99" end_page="101" type="sub_section"> <SectionTitle> 2.4 Lexicon and lexicon management </SectionTitle> <Paragraph position="0"> Since LFG is a &quot;lexical&quot; grammar, it is important to have powerful and easy to use lexicon management tools. To be as flexible as possible, we have choosen to use several lexica at the same time in the same analyser. The lexicon manager contains a list of lexica ordered by access priority. For each word analysed, the list is searched, and the first analysis encountered is returned.</Paragraph> <Paragraph position="1"> Two kinds of lexica are currently used; this kind of structuration is quite flexible: * if the user uses a big lexicon, but wants to redefine a few items for his own needs, he just has to define a new small lexicon containing the modified items, and to give it a high priority.</Paragraph> <Paragraph position="2"> * if the user has a big lexicon with a slow access, the access can be optimised by putting the</Paragraph> <Paragraph position="4"> words frequently used in a direct access lexicon stored in memory.</Paragraph> <Paragraph position="5"> Our lexicon currently contains 7000 verbs, all the closed classes words (e.g., prepositions, articles, conjunctions), 12000 nouns and about 2500 adjectives. To mitigate the consequences of some lacks of this lexicon, a set of subcategorisation frames is independently associated with the lexicon (3000 frames). The user may also define a direct access lexicon, whose equations are written in a formalism close to the standard LFG formalism. Dedicated interfaces have been developped for editing these lexica, with syntactic and coherence checking.</Paragraph> <Paragraph position="7"/> </Section> <Section position="5" start_page="101" end_page="101" type="sub_section"> <SectionTitle> 2.5 Tracking failure causes </SectionTitle> <Paragraph position="0"> A specific feature (&quot;Error &quot;) allows the system to keep a value that makes explicit the reason why the unifying process has failed. Possible situations are listed below: 1. Unifying failure. The values of a given feature are different between the two F-Structures to be unified. The generated F-Structure contains the feature Error , whose value is an association of the two uncompatible values. Example: Num = sing --+ plur.</Paragraph> <Paragraph position="1"> 2. A feature present in an equation has non value in either of the two F-Structures to be unified. Example: with the equation '~ Suj Num = ~ Num&quot; and two F-Structures without the Num feature, the generated F-Structure contains &quot;Num -- nil-+ nil&quot; . 3. While making a constrained unification (e.g., J, Num =c sing ) a feature does not exist. We obtain: Num = sing --* nil.</Paragraph> <Paragraph position="2"> 4. An obligatory feature is absent.Example: Num -- obligatoire.</Paragraph> <Paragraph position="3"> 5. A forbidden feature is present. The forbidden state for a feature is represented by adding the value &quot;tilde&quot; to the feature (e.g., Num -&quot;tilde&quot;). Therefore, this is the same situation as the simple unification. A failure results from the case when a F-Structure contains this feature. Example: Num=sing-+ &quot;tilde&quot;. 6. A feature has a forbidden w~lue. Example: Num= &quot;tilde&quot; sing.</Paragraph> <Paragraph position="4"> 7. When a disjunction of constraints is the rea null son of the failure, the block itself is set as the value of the &quot;Error&quot; feature in the resulting FStructure. null These errors can be recovered through the interface (errors are highlighted in the representation), which allows the user to track them easily. Moreover, these well defined categories make it easy to find the real cause of the error and to correct the grammar and the lexicon.</Paragraph> </Section> <Section position="6" start_page="101" end_page="101" type="sub_section"> <SectionTitle> 2.6 Structure of the rules </SectionTitle> <Paragraph position="0"> Smalltalk80 specific features (mainly the notions of &quot;image&quot; and incremental compilation) have been heavily exploited in the definition of the internal structure of the grammar rules. Basically a rule is defined as the rewriting of a given constituent (left part of the rule), equations being linked to the right constituents. Each non terminal constituent of the grammar is then defined as a Smalltalk class, whose instance methods are the rules whose left part is this constituent (e.g., NP is a class, NP --* ProperNoun and NP --~ Det Adj* Noun are instance methods of this class).</Paragraph> <Paragraph position="1"> The Smalltalk compiler has been redefined on these classes so that it handles LFG syntax. Therefore, all the standard tools for editing, searching, replacing (Browsers) may be used in a very natural way. A specific interface may also be used to consult the rules and to define sets rules to be used in the parser.</Paragraph> <Paragraph position="2"> A great interest of such a configuration is to allow the user to define his own (sub-)set or rules by defining sub-classes of a category when he wants to define different rules for this category (since a method with a given name cannot have two different definitions).</Paragraph> <Paragraph position="3"> On the use of the Envy/Manager source code manager to maintain the syntactic rules base. Envy/Manager is a source code manager for team programming in Smalltalk, proposed by OTI.</Paragraph> <Paragraph position="4"> It is based on a client-server architecture in which the source code is stored in a common database accessible by all the developpers. Envy stores all the successive versions of classes and methods, and provides tools for managing the history. Applications are defined as sets of classes, methods, and extensions of classes, that can be independently edited and versioned. Very fine grained ownership and access rights can be defined on the software components. The structuration of our syntactic rules base enables us to benefit directly of these functionalities, and hence to be able to manage versions, access rights, comparisons of versions (Figure 5)... on all our linguistic data.</Paragraph> <Paragraph position="5"> Content of the rules. The current grammar contains about 250 rules that covers most of the classical syntactic structures of French simple sentences. They have been tested on data coming from the TSNLP european project. In addition to these simple sentences, difficult problems are also handled: clitics, complex determiners, completives, various forms of questions, extraction and non limited dependancies, coordinations, comparatives. Some extensions are currently under development, including negation, support verbs, circonstant subordinate phrases and ellipses.</Paragraph> </Section> </Section> <Section position="5" start_page="101" end_page="102" type="metho"> <SectionTitle> 3 Conceptual graphs </SectionTitle> <Paragraph position="0"> Conceptual graphs (Sowa, 1984) form the basis of the semantic and encyclopedic representations used in our system. Conceptual graphs are bipartite graphs composed of concepts and relations. A conceptual graph database is generally composed of the following subparts: * a lattice of concepts and relation types * a set of canonical graphs, associated with concepts and relation types, used for example to express the selectionnal restrictions on the arguments of semantic relations.</Paragraph> <Paragraph position="1"> * a set of definitions, associated with concepts and relation types, used to define the meaning of concepts.</Paragraph> <Paragraph position="2"> * a set of schemas and prototypes.</Paragraph> <Paragraph position="3"> * a set of operations, such as join, contraction, expansion, projection...</Paragraph> <Paragraph position="4"> * a database containing the description of a situation in terms of conceptual graphs.</Paragraph> <Paragraph position="5"> The framework we describe here aims at managing all this information in a coherent manner, and at facilitating the association with the linguistic processes described above.</Paragraph> <Paragraph position="6"> Graphs can be visualized, modified, saved, searched through different interfaces, using graphical or textual representations. Operations can be performed programmatically or using the interface shown in Figure 7.</Paragraph> <Paragraph position="7"> The lattice, and the different items of information associated with concepts and relations types, can be visualized, modified, searched and saved using graphical or textual representations (Figure 10). An &quot;individual referents inspector&quot; allows to inspect the cross-references between references, concepts and graphs.</Paragraph> </Section> <Section position="6" start_page="102" end_page="103" type="metho"> <SectionTitle> 4 Analysing a sentence </SectionTitle> <Paragraph position="0"> The processus of analysis from sentence to semantic representation can be separated into three subprocesses. After the sentence has been segmented, we obtain the lexical items in LFG-compliant form via the lexieal manager. After parsing, we obtain some edges with their respective F-Structures. (Delmonte, 1990) has developed a parser which uses basic entries with mixed morphological, functionnal and semantic informations. The rules use different level information. We propose to map the semantic structure on the syntactic one in a manner that avoids too many interdependencies. We use a intermediate structure (named &quot;syntax-semantic table&quot;) that expresses the mapping between the value of a LFG Pred and a concept, as well as connected concepts and relations. Semantic data in the lexical knowledge base are defined by using conceptual graphs, as shown in the paragraph 4.1 below about some verb examples. Selectional restrictions defined with canonical graphs are then used to filter the graphs, when more than one is obtained at this level.</Paragraph> <Section position="1" start_page="102" end_page="102" type="sub_section"> <SectionTitle> 4.1 Semantic verb classification in the </SectionTitle> <Paragraph position="0"> lexical knowledge base The lexical knowledge base is based on a hierarchical representation of French verbs. We have developped a systematic and comprehensive representation of verbs in a hierarchical structure, data coming from the French dictionary &quot;Robert&quot;. Our method relies on classification method proposed by (Talmy, 1985) and (Miller, Fellbaum and Gross, 1989), (Miller and Fellbaum, 1991). We chose a description with a structure composed of a basic action (the first of the most general uperclasses, e.g. stroll and run can be associated with walk as a basic action, andwalk, ride, pass point atmoving, which is a step further in generality) associated with thematic roles that specify it (i.e., object, mean, manner, goal, and method). The basic actions are in turn defined with the same structure, based on a more general basic action.</Paragraph> <Paragraph position="1"> The hierarchy of verbs depends on the thematic relations associated with them. A verb V1 is the hyperonym (respectively a hyponym) of a verb V2 (which is noted VI~-V2, respectively VI-<V2) if they share a common basic action and if, in the thematic relations structure associated with it, we have: * absence (for the hyperonym) or presence (for the hyponym) of a particular thematic relation: e.g. for the pair divide /cut ; to cut is to divide using a sharp instrument, thus divide ~- cut * presence of a generic value thematic relation vs. a specific value (example cut (object is generic:solid object ~- behead (object is ahead)) For every verb: * the semantic description pointed out is coded in the lexical knowledge base as a definitional graph.</Paragraph> <Paragraph position="2"> type cut (*x) is \[divide: *x\]-</Paragraph> </Section> <Section position="2" start_page="102" end_page="103" type="sub_section"> <SectionTitle> 4.2 An example </SectionTitle> <Paragraph position="0"> Below, we give an example for the sentence &quot;Un avocat vole une pomme&quot; (a lawyer steals an apple), where &quot;avocaf' is ambiguous and refers to a lawyer or to an avocado. A semantic representation of this sentence is derived from its non-ambiguous FStructure. null The entries in the translation table (from LFG pred \[in French\] to conceptual graphs types \[in English\]) are as follow:</Paragraph> <Paragraph position="2"> Explanations: the first item between quotes is the Pred value, followed by a list of types of concepts (or types of relations) and their mapping definition structure in the F-Structure. ~ represents the local F-Structure. T represents the F-Structure that contains the local F-Structure. For example, Agent --* ~ Suj means that a concept of Type &quot;Steal&quot; is connected to a concept that can be found in the F-Structure of the feature &quot;Suj&quot; in the local F- Structure. From these data, the following graphs (Figure 8) are obtained.</Paragraph> <Paragraph position="3"> The &quot;Deft feature of the F-structure gives us information about the referents of concepts. For example, the F- Structure for 'apple' contains &quot;Def = indefini&quot;, which implies the use of a generic referent for the concept (corresponds to an apple, indicated by a star in Figure 8). Then, since canonical graphs express selectional restrictions, they are used to filter the results through the join operation. For ex-</Paragraph> </Section> </Section> <Section position="7" start_page="103" end_page="103" type="metho"> <SectionTitle> 2) Avocado'.* Agent SteaJ Object Apple:' </SectionTitle> <Paragraph position="0"> These principles are the bases of the system currently available, but we are working on improvements and extensions. We want to address the issue of adjunct processing, prepositional comple-</Paragraph> </Section> <Section position="8" start_page="103" end_page="104" type="metho"> <SectionTitle> IL </SectionTitle> <Paragraph position="0"/> </Section> class="xml-element"></Paper>