File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1029_metho.xml
Size: 13,671 bytes
Last Modified: 2025-10-06 14:11:24
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1029"> <Title>CONVERSION OF A FRENCH SURFACE EXPRESSION INTO ITS SEMANTIC REPRESENTATION ACCORDING TO THE RESEDA METALANGUAGE</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> THE RESEDA METALANGUAGE </SectionTitle> <Paragraph position="0"> The biographical information which constitutes the system's database is organized in the form of units called &quot;planes&quot;. There are several different types of plane ; the &quot;predicative planes&quot;, the most important, correspond to a &quot;flash&quot; which illustrates a particular moment in the &quot;life story&quot; of one or more personages. A predicative plane is made up of one of five possible &quot;predicates&quot; (BE-AFFECTED-BY, BEHAVE, BE-PRESENT, MOVE, PRODUCE) ; to each predicate, one or more &quot;modulators&quot; may be attached. The modulator's function is to specify and delimit the semantic role of the predicate. Each predicate is accompanied by &quot;case slots&quot; which introduce the predicative arguments ; dating and space location is also given within a predicative plane, as is the bibliographic authority for the statement.</Paragraph> <Paragraph position="1"> Predicative planes can be linked together in a number of ways ; one way is to use explicit links of &quot;coordination&quot;, &quot;alternative&quot;, &quot;causality&quot;, &quot;finality&quot;, &quot;condition&quot;, etc. (Zarri et al. 1977).</Paragraph> <Paragraph position="2"> For example, the data &quot;Andr4 Marchant was named provost of paris by the King's Council on 22nd September 1413 ; he lost his post on 23rd October 1414, to the benefit of Tanguy du Ch~tel, who was granted this office&quot;, will be represented in three planes - that of the nomination of Andr~ Marchant, his dismissal and the nomination of Tanguy du Ch~tel. The coding of information must be made on two distinct levels: an &quot;external&quot; coding, realized manually by the analyst, gives rise to a first type of representation, formalized according to the categories of the RESEDA metalanguage ; a second automatic step results in the &quot;internal&quot; numeric code. The external &quot;manual&quot; coding of the three events just stated will be the following: The code in capital letters indicates a predicate and its associated &quot;case slots&quot;. Every predicative plane is characterized by a pair of &quot;time references&quot; (dateldate2) which give the duration of the episode in question. In these three planes, the second date slot (date2) is empty, because their modulators (begin, end) specify a change of state associated with a punctual event.&quot;Andr~-Marchant&quot; and &quot;Tanguy-du-Ch~tel&quot; are historical personages known to the system ; &quot;provost&quot;, &quot;king's-Council&quot; and &quot;letters-of-nomination&quot; are terms of RESEDA's lexicon. The classifications associated with terms of the lexicon provide the major part of the system's socio-historical knowledge of the period. &quot;Paris&quot; is the &quot;location of the object&quot;. If the historical sources analysed gave us the exact causes of these events, we would introduce into the database the corresponding planes and associate them with these three planes by an explicit link of type &quot;CAUSE&quot;.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> CONVERSION INTO THE RESEDA METALANGUAGE 185 DESCRIPTION OF THE METHOD USED </SectionTitle> <Paragraph position="0"> In the field of the application of Artificial Intelligence techniques to natural language processing, from the very beginning, stress was put on the importance of semantic and pragmatic components. In this framework, creating a formal representation of the message carried by a surface expression is usually achieved by one of two methods.</Paragraph> <Paragraph position="1"> The first, and most traditional, respects the usual progression of the three levels of analysis, morphological, syntactic and semantic whilst combining their results in a final interpretation : for discussion, see for example Winograd (1972), Woods (1973), Marcus (1979), etc.</Paragraph> <Paragraph position="2"> Schank and Wilks, on the contrary, put forward the idea, which was subsequently taken up by many researchers, that a predominantly semantic analysis, with syntax relegated to a secondary role, was possible. The deep structure representation that is being created is thus used to make appropriate predictions about the logico-semantic function of the elements ; these expectations are progressively met during the examination of the surface structure representation, see Schank (1975), Wilks (1975), etc.</Paragraph> <Paragraph position="3"> -The hypothesis adopted for our project draws more from this second method, in that the structures of RESEDA's internal representation provide, beforehand, a very complete framework of the predictions which are to be a guide in scanning the text to be translated into the system's metalanguage.</Paragraph> <Paragraph position="4"> To describe our approach, we will utilize theabove example. The initial text in natural language is first (pre)processed to obtain its constituent structure. For this purpose, we have used the French surface grammar implemented in DEREDEC, a software package developed at the University of Quebec at Montreal by Pierre Plante (1980a, 1980b). This system, comparable to an ATN parser, permits a breakdown of the surface text into its syntactic constituents, and establishes, between these constituents, syntagmatic relationships of the type topic-comment, determination and coordination. This preliminary analysis provide a context for subsec t processing, without necessarily removing all the ambiguities : in the same w . see Boguraev and Sparck Jones (1982).</Paragraph> <Paragraph position="5"> The ~pecific tools that we intend to develop for this project are of two types : a g~neral procedure which can be likened to a sort of semantic parsing, and a system of heuristic rules.</Paragraph> <Paragraph position="6"> Semantic parsing The first stage of the general procedure consists of marking the &quot;triggers&quot;, defined as lexical units which call one or more of the predicative patterns allowed for in RESEDA's metalanguage. Thus we do not take into consideration every one of the lexical items met in thesurface text, retaining only those directly pertaining to the &quot;translation&quot; to be done ; this is not without similarity with the &quot;skimming&quot; found in DeJong (1979a, 1979b).</Paragraph> <Paragraph position="7"> However, we do not limit ourselves to a simple keyword approach. Certain lexical items are potential triggers, but their actual triggering in a given context depends on rules using both the morpho-syntactic analysis provided by DEREDEC and the socio-historical knowledge stored in the RESEDA system. These rules intervene at this stage to decide whether triggering should take place and to choose the predicative patterns. In the sentence given, belong to the list of potential triggers the verbal forms : &quot;named&quot;, &quot;lost&quot;, &quot;granted&quot; ; terms pertaining directly to the metalanguage : &quot;office&quot;, synonymous with <post> in RESEDA, and its specification &quot;provost&quot; ; date elements : &quot;september&quot;, &quot;october&quot;. After applying the rules, the following patterns have been triggered : 186 J. LI~ON et al.</Paragraph> <Paragraph position="8"> was named begin+(soc+)BE-AFFECTED-BY SUBJ <personage>-surface subject of the The second stage of this general procedure consists of examining the triggers belonging to the same morpho-syntactic environment. If there are several predicate triggers in the same environment, and if the predicates triggered are the same - which means that the predicates and case slots must be the same and that the modulators, dates and the space location information must be compatible - then it can be said that the triggers refer to the same situation. As a result, the predicative patterns are merged as to obtain the most complete description possible ; the predictions about filling the slots linked with the cases of the resulting pa~terns, together govern the search for fillers in the surface expression.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> CONVERSION INTO THE RESEDA METALANGUAGE 187 </SectionTitle> <Paragraph position="0"> Thus, the first three tri@gers of the example, recognized as relevant to the same environment, are combined in the following formula : begin+(soc+)BE-AFFECTED-BY SUBJ <personage>-surface subject of &quot;is named&quot; OBJ <post>-&quot;provost&quot; (SOURCE <personage>I<soclal-body>-surface complement of the agent of &quot;is named&quot; datel : date a date2 : prohibited blbl. : obligatory The units of the surface expression corresponding to the predictions of the pattern obtained are then retrieved and standardized according to RESEDA's categories imposed by the pattern (Andr~ Marchant : Andr~-Marchant, personage ; provost : provost, post ; King's Council : king's-council, social-body, etc.). Eventually, we obtain pla~e 1 in Andr~ Marchant's biography. The example we have shown illustrates a particularly simple case, in which it is not necessary to establish links between the planes created. If we had to process the sentence &quot;Philibert de St L~ger is nominated seneschal of Lyon on the 30th of July 1412, in lieu of the late A. de Viry&quot;, three planes should be generated : one for the nomination of Philibert de St L~ger, one for the death of A. de Viry, and another one establishing a weak causality link (&quot;CONFER&quot;, in our metalanguage) between the first two planes. Surface items such as conjunctions, prepositions and sentential adverbs can be used to infer links between planes : causality, finality, coordination, etc. More precisely, in the last example, &quot;in lieu of&quot; is a potential trigger according to the following rule : if the main noun group of the surface prepositional phrase contains a trigger, this phrase constitutes a plane environment and CONFER introduces the plane created.</Paragraph> <Paragraph position="1"> Heuristic rules The process we have sketched so far requires a corpus of heuristic rules, to solve am~guitles which are left aside by the prediction system - which cannot go beyond the 5apabilities of RESEDA's predicative patterns.</Paragraph> <Paragraph position="2"> We s:~ll say just a few words about the heuristic rules designed to solve cases of anaphora (as in our first example, &quot;he&quot;, &quot;this office&quot;, &quot;who&quot;). In the approach that we propose, marks of anaphora are identified during the general analysis procedure with unassumed predictions, triggering the appropriate heuristic rules. The actual solving, after validation of the marks, brings into play a number of criteria from simple pairing off and morphological agreement to more subtle criteria, like contextual proximity, persistance of theme, etc. Thus, morphological agreement and contextual proximity are used to replace &quot;who&quot; by &quot;Tanguy du Ch~tel&quot; in our first example ; persistance of theme enables us to make up for the missing date of Tanguy du Ch~tel's posting by date b in the list of triggers.</Paragraph> <Paragraph position="3"> We would like to integrate this approach, which has been purely empirical up to now, into the framework of a more general theory. Two directions of enquiry seem particularly interesting in order to develop our own philosophy of the subject. The PAL system of Candace Sidner, is a top-down anaphora resolution method which makes use of the notion of focus (likened to the theme of the discourse). By searching in the text for &quot;focuses&quot; which refer to a system of representation organized as a series of &quot;frames&quot;, it is able to solve references. If the reference is not found by using the frames themselves, At is inferred from other frames contained in the database (Sidner 1978, 1979).</Paragraph> <Paragraph position="4"> 188 J. LEON et al.</Paragraph> <Paragraph position="5"> The interest for our study lies in the fact that RESEDA already has, as permanent data, a certain amount of general knowledge organized in a form very similar to that of frames. Thus, in Our example, the nomination and dismissal of Andr~ Marchant refers to the context of the &quot;civil war at the beginning of the 15th century&quot; which is one of those frames.</Paragraph> <Paragraph position="6"> The approach used by Klappholz and Lockman depends on the hypothesis that there is a strong link between coreference and the cohesive links of a discourse. These links, when marked progressively in the text, become the indices of a structure of the discourse, organized as a tree structure and created dynamically (Lockman 1978~ These cohesive links (effect, cause, syllogism, exemplification, etc.) are very similar to the logical connections between planes in RESEDA (causality, finality, condition, etc.).</Paragraph> </Section> class="xml-element"></Paper>