File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/e83-1020_metho.xml

Size: 21,921 bytes

Last Modified: 2025-10-06 14:11:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="E83-1020">
  <Title>A FLEXIBLE NATURAL LANGUAGE PARSER BASED ON A TWO-LEVEL REPRESENTATION OF SYNTAX</Title>
  <Section position="3" start_page="114" end_page="117" type="metho">
    <SectionTitle>
AN EXAMPLE OF THE PARSER'S RESULT
</SectionTitle>
    <Paragraph position="0"> Before describing the parser control struc ture, it is worth having a look at the final re~ resentation of the input sentence which is prod~ ced by the parser. It consists in a tree which represents the relationships existing among the constituents of the input sentence according to the &amp;quot;head and modifier&amp;quot; approach (Winograd 83, pag.73) deg. An example of such a tree is reported in fig.l.</Paragraph>
    <Paragraph position="1"> It may be noticed that the tree is a case rePS resentation of the sentence: in the verbal nodes o This structure might be related to the &amp;quot;synta~ tic/semantic shape representation of RUS (Sidner et al. 81), but we are not sure.</Paragraph>
    <Paragraph position="2">  All the slots appearing in fig.2a are atom ic and their possible contents are exempl ! fled in the slot (LINKUP is the upward pointer which enables to traverse the tree bottom-up; this link is not depicted in fig.l); the only exception are the ROLEs, which correspond to the links shown in fig. l and whose structure is shown in fig.2b. For the meaning of the different fields refer to the example of fig.3. The TRANSL slot refers to the interpretation (in terms of data base operations) of the constituent headed by the node (see expl~ nations in the text).</Paragraph>
    <Paragraph position="3">  Actual contents of the node REL2 (SOSTENE RE) of fig.l. Five ROLEs appear in this instance of REL. In the first, fourth and fifth ROLE the ROLETYPE is &amp;quot;CASE&amp;quot;, because they refer to actual cases of the verb; the syntactic function of each case is re ported in the fourth field (SYNTFUN). The second and third ROLE have the only func tion of marking the position in the sen tence of the auxiliary (hanno - have) and of the verbal head (sostenuto - passed).</Paragraph>
    <Paragraph position="4"> The SPECIAL field is used to mark cases ~ filled by interrogatives, reflexive pro nouns, etc. (RELPRON means RELative PRO Noun). Notice that the AUX slot is used to signal the fact that the head of the verb is (or is not) an auxiliary.</Paragraph>
    <Paragraph position="5">  the name (actual and extended); the sec ond one contains the classical syntactic categories associated with the node type.</Paragraph>
    <Paragraph position="6"> (RELation) each pointer corresponds to a syntactic case associated with the verb; in the REF nodes, which roughly correspond to nouns and pronouns, the dependent structures represent the specific~ tions of the node. The H field indicates the pos! tion of the constituent's head (i.e. the verb or noun) in the surface sentence and the A fields are used in the REL nodes to indicate the position of the possible auxiliaries. The actual structure of the nodes appearing in the figure is much more com plex; for example, the protoype description of the REL nodes is reported in fig.2. In fig.3 the actu al structure of the node REL2 (SOSTENERE) is re ported. A number of remarks are required: - when a REL node is instantiated it does not con rain any ROLE slot. Whereas the other slots are &amp;quot;filled&amp;quot; when the needed piece of information is available (normally this happens when the head of the verb is scanned), the ROLE slots are d~ namically created when a given constituent is attached to the REL node (with the exception of AUX and H); - some slots are redundant, since their contents can be deduced by traversing the tree. For exam pie, the contents of the slot DEPEND and of the field SPECIAL of the ROLE slot can be obtained on the basis of the LINKUP node and of the first case of the clause respectively. They have been included for the sake of efficiency; - the sole input word of the example sentence which does not appear in a node of fig.l is the auxiliary &amp;quot;hanno&amp;quot;. Auxiliaries have been consid ered as components of the verb, so that their presence is signalled only by means of an AUX role. The actual auxiliary, its tense, its num ber, etc. are deducible from the contents of the other slots of the REL node.</Paragraph>
    <Paragraph position="7"> The different types of nodes which have been defined are listed in Table i.</Paragraph>
    <Paragraph position="8"> As stated in the introduction, the system should act a~ a natural language front-end for a relational data base. The structure reported in fig.l is the basis for performing the semantic checks and for translating the sentence in a rela tional algebra expression (Date 81) which corr~ spond to the input query. As will be described in the following sections, neither the semantic checks nor the actual translation of the query are done at the end of the syntactic analysis; in fact the semantic checks are performed when a node is filled with a content word and the translation is obtained in an incremental way from the constit~ ents occurring in the tree. For instance, the s~ mantic check procedures will be triggered when the word &amp;quot;sesso&amp;quot; (sex) is encountered and the corre spending REF node is created, linked and filled to verify that the students have a sex (or, more precisely, that the sequence &amp;quot;studente di sesso&amp;quot; is acceptable).</Paragraph>
    <Paragraph position="9"> As regards the translation, it is worth n~ ricing that it does not represent the interpret~ tion of the given node, but the data base inter pretation of the whole constituent headed by that node; for this reason it is obtained by combining the translations of all depending constituents. Let us consider, for example, the node REF2. The translation associated with CONN3 is  A detailed description of the way this translation is obtained is reported in (Lesmo, Siklossy, Tora h so 83). However, for the sake of clarity it is im portant to say that %student is the unary relation whose unique attribute is ~student and which co~ tains the names of all the students whose data are stored in the data base; &amp;sex is a binary relation (attributes Sperson and ~sex) containing the sex of all the persons known to the system; finally &amp;pass is the relation (attributes ~student, ~course, ~grade, ~date) where are stored the re suits of the tests passed by the students. The translation which have been shown are stored in the TRANSL slot of the associated nodes.</Paragraph>
  </Section>
  <Section position="4" start_page="117" end_page="117" type="metho">
    <SectionTitle>
THE CONSTRUCTION PROCESS
</SectionTitle>
    <Paragraph position="0"> The tree described in the previous section is built by means of a set of rules of the form condi tion-action. With each syntactic category a subset of these rules is associated: when an input word of the given category is encountered in the input sen tence, then the subset of rules associated with that category is activated and the conditions are evaluated. The conditions involve tests on the cur rent structure of the tree (i.e. the &amp;quot;status&amp;quot; of the analysis) and may request a one-word lookahead.</Paragraph>
    <Paragraph position="1"> If just one rule is selected (i.e. all other condi tions evaluate to false), its action part is exe cured. An action consists in the construction of new nodes, in their filling up with particular val ues (normally depending on the input word) and in their attachment to the already existing tree. In table 2 are reported as an example some of the rules of the packet associated with the category ADJECTIVE. The rules which are not reported handle the cases of predicative adjectives and adjective~ preceded by adverbs. In some of the rules a one-word lookahea~is used; it allows the parser to build the right structure in virtually all simple cases. In fact, even if the semantic knowledge source does not affect the choice of the rule, it can trigger the natural ch~l~nges, which modify the tree; these changes substitute the backup in many of the cases wher~the hypothesized syntactic struc ture does not satisfy the semantic constraints.</Paragraph>
    <Paragraph position="2"> An example of a sentence portion which otto, can be disambiguated only by inspecting the seman tic constraints is the following: ... - Determiner - Noun ~ Adjective - Noun - ...</Paragraph>
    <Paragraph position="3"> In this case the adjective may modify either the preceding or the following noun. Consider the sen tences $4 and $5deg: Per le persone anziane bevande ghiacciate ($4) sono dannose (For old people icy-cold drinks are harmful) Si arrampicano sulle montagne agili scalatori ($5) (Agile cragsmen cramble up the mountains) The strategy adopted by the parser is to attach the node representing the adjective to a newly created REF node which will be filled when the second noun is analyzed (see the action part of Rule 4 in tab. 2). In case the semantics reject this choice (se~ tence $4) a natural change is triggered; it discon nects the adjectival node and moves it back to the REF node which represents the first noun.</Paragraph>
    <Paragraph position="4"> deg The sequence of categories given in the text corresponds to &amp;quot;... le persone anziane bevande ...&amp;quot; in $4 and to &amp;quot;... le montagne agili scala  tactic category ADJECTIVE.</Paragraph>
    <Paragraph position="5"> The predicates used in the conditions are  CURRENT X: TRUE if the current node is of type X.</Paragraph>
    <Paragraph position="6"> UNFILLED X: TRUE if the current node or the node above is of type X and it is not filledyet.</Paragraph>
    <Paragraph position="7"> CURFILL X: TRUE if the current node is of type X and is filled.</Paragraph>
    <Paragraph position="8"> NEXT CAT: is a lookahead function which  returns TRUE if the category of the next word in the input string is CAT.</Paragraph>
    <Paragraph position="9"> The structure-building functions used in the actions are CRLINK XI X2: creates a new node of type XI and links it to a node of type X2.</Paragraph>
    <Paragraph position="10"> The node which must be used is located by moving up on the rightmost branch of the tree.</Paragraph>
    <Paragraph position="11"> FILL X VAL: a node of type X (located as in CRLINK) is filled with the value VAL (~ denotes the normalized form of the current word).</Paragraph>
    <Paragraph position="12"> In general, however, it is not possible to void the use of backup. The backup mechanism is needed when more than one of the conditions of the rules associated with a particular category is matched, but this case is actually restricted to very complex (and unusual) relative clauses. More often, the backup is required when the input word is ambiguous, i.e. it belongs to more than one sy~ tactic categories. In this case all conditions a~ sociated with the different categories are evalu ated an~ in some cases more than one of them is matched. In all these cases the status of the ana lysis is saved (i.e. the current tree) together with the identifiers of the matched rules and a pointer to the input sentence.</Paragraph>
    <Paragraph position="13"> As an example of sentences in which the bac h i18 up mechanism is used consider the sentences $6-$8; in them there is a lexical ambiguity for the word &amp;quot;che&amp;quot; (it acts as a relative pronoun in $6, as a conjunction in S7 and as an adjectival modifier in $8); moreover in $6 and S7 &amp;quot;pesca&amp;quot; is a form of the verb &amp;quot;pescare&amp;quot; (to fish) whereas in $8 it is a noun (the fishing).</Paragraph>
    <Paragraph position="14"> Di a quel ragazzo ehe pesca di andarsene ($6) (Tell that boy who is fishing to go away) Di a quel ragazzo che pesca male ($7) (Tell that boy that he is fishing badly) DI a quel ragazzo che pesca fantastica (s8) hai fatto (Tell that boy what a marvel lous fishing you have done).</Paragraph>
  </Section>
  <Section position="5" start_page="117" end_page="117" type="metho">
    <SectionTitle>
THE VERIFICATION PROCESS
</SectionTitle>
    <Paragraph position="0"> When a node is filled, it is supposed to be already attlched to the tree. The filling opera lion triggers some procedures associated with the type of the node which is being filled. Among them, the AGREEMENT procedures have the task of checking person, number and gender agreement between a node and its dependants. Particularly important is the agreement procedure associated with the REL node type, because it selects the REF node which can act as syntactic subject of the sentence (this suggestion may be overcome later by virtue of se mantic considerations). If the agreement con straints are violated, then the natural changes are attempted; if no restructuring of the tree is successful, then the initial status is maintained without changes and a warning message is issued.</Paragraph>
    <Paragraph position="1"> Perhaps, among the procedures triggered by the filling of a node, the one which have the most dramatic effects on the subsequent behavior of the system is the semantic check procedure. In fact, if the outcome of the semantic check procedure re ports the non-admissibility of an attachment, the parser is forced to find another alternative. This is done by first applying the natural changes and then, if all of them fail, by performing a backup.</Paragraph>
    <Paragraph position="2"> A semantic procedure refers to the semantic know ledge of the domain under consideration, which is stored in form of a two-level network (Lesmo, &amp;quot;iklossy &amp; Torasso 83); the external level allows to perform the checks, whereas the internal level carries the information necessary to perform the translation.</Paragraph>
    <Paragraph position="3"> Different checks are done depending on the type of the node. When an ADJ node is attached to a REF node, the system has to verify that the ad jective is an acceptable linguistic description of the noun stored in the REF node. In case two REF nodes are attached (this case occurs in Italian only when the lower REF contains a proper noun) the system has to verify that the lower REF con rains a possible identifier of the class represen~ ed by the noun stored in the upper REF.When two REFs are attached via a CONN node, the constituent headed by the lower REF has the purpose either of specifying a subset of the class identified by the noun stored in the upper REF or to refer to a pro~ erty of a given object. An example of the first kind is &amp;quot;the professors of the department X&amp;quot; and an example of the second kind is &amp;quot;the sex of the professors ...&amp;quot;. In this case the semantic proc~ dure accesses the net to reject incorrect specif! cations of the form &amp;quot;the sex of the department X&amp;quot;. A quite different behavior characterizes the at tachment of a role to a verb (a REF node to a REL node via a CONN node); of course, the attachment of a new case cannot trigger a simple case check, but must take into account also all the cases at tached before. A side effect of this process is the binding of the actual cases to the cases pr~ dieted in the net; this can be useful when there are two cases which have the same marker (or which are both unmarked) to determine, by using the se lectional restrictions stored in the net, the actu al role of the filler of each case (e.g. syntactic subject or syntactic object).</Paragraph>
    <Paragraph position="4"> The completion of a constituent triggers the last set of syntactic rules; they verify whether the constituent (that is the node itself and its descendants) respects the ordering constraints. In case those constraints are violated (e.g. &amp;quot;belli i bambini sono&amp;quot; - nice the babies are) a warning mes sage is issued but the sentence is considered as interpretable.</Paragraph>
    <Paragraph position="5"> A word is due to explain the meaning of the term &amp;quot;complete&amp;quot;. The constituent headed by the node ndeg is considered as complete when a new node i n. is attached to a node n k which is an ancestor gf ni; all constituents headed by the nodes b~ longing to the rightmost path of the tree are con sidered as complete when the system encounters the end of the sentence. The concept of &amp;quot;completion&amp;quot; of a constituent is particularly important because only when the constituent headed by the node n. is i complete the system translates the constituent by using different pieces of information gathered by thesemantic procedures and stores the translation in the TRANSL slot of the node n..</Paragraph>
  </Section>
  <Section position="6" start_page="117" end_page="120" type="metho">
    <SectionTitle>
NATURAL CHANGES VERSUS BACKUP
</SectionTitle>
    <Paragraph position="0"> The natural changes have the purpose of re structuring the tree by moving around constituents without requiring backup. They are represented as pattern-action rules, where the pattern part is used to select the rules which can be applied, whereas the action part implements the transforma lion of the tree. The natural changes currently im plemented are of two main types: - MOVE UP (the easiest and most common): it at  change. The semantic procedure associated with the REL node type detects that &amp;quot;sesso&amp;quot; cannot fill any of the cases of &amp;quot;sostenere&amp;quot; (a), so that the constituent headed by &amp;quot;so stenere&amp;quot; is MOVEd UP to &amp;quot;studente&amp;quot; (b). taches a constituent (i.e, a subtree) to a higher node (whose type is specified in the rule) of the current branch of the tree.</Paragraph>
    <Paragraph position="1"> - MOVE BACK: it attaches a constituent to the right most leaf of the preceding branch of the tree.</Paragraph>
    <Paragraph position="2"> For example; a MOVE UP rule is used to build the tree shown in fig.l: the relative clause &amp;quot;che hanno sostenuto ...&amp;quot; is firstly attached to the nearest REF node (&amp;quot;sesso&amp;quot;); when the verb is found the node REL2 is filled (fig.4a), the agreement and semantic check procedures are triggered and this latter re turns that &amp;quot;sesso&amp;quot; cannot fill an unmarked case of &amp;quot;sostenere&amp;quot;, so that the partially built relative clause is moved up to REF2 (&amp;quot;studente&amp;quot; - fig.4b); this new hypothesis is validated by the agreement and semantic procedures. An example of the'applic~ tion of a MOVE BACK rule has been given in the third section, in connection with the problem of attaching the adjectival nodes (see fig.5).</Paragraph>
    <Paragraph position="3"> As stated in the previous section, the natural changes do not substitute in all cases the backup mechanism; the backup is strictly connected with the concept of &amp;quot;garden path&amp;quot;. PARSIFAL (Marcus 80)  the word &amp;quot;bevande&amp;quot; (drinks) is scanned the node ADJI is MOVED BACK from REF2 (a) to the last REF node of the previous branch of the tree, i.e. REFI (b).</Paragraph>
    <Paragraph position="4"> is able to parse sentences in a deterministic way when they are not garden paths. However it has been shown (Milne 82) that: - For a pair of potential garden path sentences, it is not possible to uniquely determine which is a garden path and which is not (different people may choose in different ways).</Paragraph>
    <Paragraph position="5"> - The choice of having a n-constituent lookahead (as in PARSIFAL) does not allow to decide whether a sentence is a potential garden path in a psych~ logically plausible way.</Paragraph>
    <Paragraph position="6"> - The semantic knowledge plays a fundamental role in choosing a particular analysis.</Paragraph>
    <Paragraph position="7"> Milne argues that a one-word lookahead, with the substantial help of semantic information is what is needed to provide a model of N.L. which is psych~ logically sound (one-word lookahead plus semantics is also advocated in RUS - Braehman et al. - 79).</Paragraph>
    <Paragraph position="8"> We think that the approach adopted in our pa~ ser basically agrees with this position. In a rat~ er vague sense, the non-complete nodes of our tree correspond with the Active Node Stack, i.e. with the not yet completed constituents of the sentence. The natural changes allow to operate on these nodes on the basis of semantic information. However there is a fundamental difference: our parser has at dis posal the whole structure built previously. An e~ ample of the possibility of using non-active co~ stituents is given by the MOVE BACK natural changes where a previou$constituent (already completed) ~s used to attach a node (see REFI in fig.5). This greater flexibility has the disadvantage of not gi~ ing any cue for deciding a-priori what is a valid natural change and what is not (it is possible to devise natural changes for all possible kinds of restructuring of the tree); however, it allows to  -choose heuristics which are in agreement with the actual behavior of humans and which fit in a simple way in the proposed model.</Paragraph>
    <Paragraph position="9"> As regards the use of backup, the cited works do not give an account of what happens in the pal set when an analysis fails due to a garden path (see, however, Marcus 80, pp.202-220). Our prov! sional solution is to use the backup, a computation al tool heavier than the natural changes: it should correspond to the situation when &amp;quot;the user must ton m sciously undo this previous choice after detect ing an inconsistency&amp;quot; (woods 73, pag.133). We ac knowledge the problems associated with this choice, e.g. the need of saving at some times the status of the analysis, the possibility of interference with the natural changes, etc., but the backup is used parsimoniously (due to the condition part of the syntactic rules) and, anyway, we do not believe it is the final solution to this problem.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML