File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/85/p85-1023_metho.xml
Size: 11,355 bytes
Last Modified: 2025-10-06 14:11:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P85-1023"> <Title>ANALYSIS OF OONOUNCTIONS IN A ~JLE-~ PAKSER</Title> <Section position="4" start_page="180" end_page="181" type="metho"> <SectionTitle> ORGANIZATION OF THE PARSER </SectionTitle> <Paragraph position="0"> In this section we overview the principles that lie at the root of the syntactic analysis in FIDO. We try to focus the discussion on the issues that guided the design of the parser, rather than giving all the details about its current implen~ntation. We hope that this approach will enable the reader to realize why the system is so easily extendible. For a more detailed presentation, see (Lesmo & Torasso, 1983 and Lesmo & Torasso, 1984).</Paragraph> <Paragraph position="1"> The first issue concerns the interactions between the concept of &quot;structured representation of a sentence&quot; and &quot;status of the analysis&quot;. These t%~ concepts have usually been considered as distinct: in ATNs, to consider a well-known exa~le, the parse tree is held in a register, but the global status of the parsing process also includes t/he contents of the other registers, a set of states identifying the current position in the various transition networks, and a stack containing the data on the previous choice points. In logic grammars (Definite Clause Granmars (Pereira & Warren, 1980), Extraposition Grammars (Pereira, 1981),</Paragraph> <Section position="1" start_page="180" end_page="181" type="sub_section"> <SectionTitle> M~difier Structure Grammars (Dahl & ~L-~Drd, 1983)) </SectionTitle> <Paragraph position="0"> this book-keeping need not be completely explicit, but the interpreter of the language (usually a dialect of PROLOG) has to keep track of the binding of the variables, of the clauses that have not been used (but could be used in case of failure of the current path), and so on. On the contrary, ~e tried to organize the parser in such a way that the two concepts mentioned above coincide: the portion of the tree that has been built so far &quot;is&quot; the sta~/s of the analysis. Tne implicit assunlDtion is that the parser, in order to go on wi~/~ the analysis does not need to know how the tree was built (what rules have been applied, what alternatives there were), but just what the result of the previous processing steps is 4.</Paragraph> <Paragraph position="1"> Of course, this assumption implies that all information present in the input sentence must also be AWe must confess that this assumption has not been pushed to its extreme consequences. In some cases (see (Lesm~ & Torasso, 1983) for a more detailed discussion) the backtracking mechanism is still needed, but, although we are not unable to provide experimental evidence, we believe that it cou/d be substituted by diagnostic procedures of the type discussed, with different purposes and within a different fomTalism, in (Weischedel & Black, 1980).</Paragraph> <Paragraph position="2"> present in its struct-ttred representation; actually, what happens is that new pieces of information, which were implicit in the &quot;linear&quot; input form, are made explicit in the result of the analysis. These pieces of information are extracted using the syntactic knowledge (how the constituents are structured) and the lexical knowledge (inflectional data).</Paragraph> <Paragraph position="3"> The main advantage of such an approach is that the whole interpretation process is centered around a single structure: the deL~ndency structure of the constituents composing the sentence. This enhances the modularity of ~he systam: the mutual independence of the various knowledge sources can be stated clearly, at least as regards the pieces of knowledge contained in each of t_~; on the c~ntrary, the control flow can be designed in such a way that all knowledge sources contribute, by cooperating in a more or less synchronized way, to the overall goal of comprehension (see fig.l).</Paragraph> <Paragraph position="4"> A side-effect of the independence of knowledge sources n~_ntioned above is that there is no strict coupling between syntactic analysis and s~T~%ntic interpretation, contrarily to what happens, for instance, in Augmented Phrase Structure Grammars (Robinson, 1982). This moans that there is no one-to-one association between syntactic and semantic rules, a further advantage if we succeed in making the structured representation of the sentence reasonably uniform. This result has been achieved by distinguishing between &quot;syntactic categories&quot;, which are used in the syntactic rules to build the tree, and &quot;node types&quot;, whose instantiations are the ele_,~nts the tree is built of. z Since the number of syntactic categories (and of syntactic rules) is considerably larger than the ntm~ber of node types (6 node types, 22 syntactic categories, 61 rules), then so,~ general constraints and interpretation tales may be expressed in a more compact form.</Paragraph> <Paragraph position="5"> WiL-hout entering into a discussion on semantic interpretation, we can give an exile using the rules that validate the tree from a syntactic point of view (SY~IC RULES 2 in fig.l). One of these rules specifies that the subject and the verb of the sentence must agree in nun~r. On the other hand, the subject can be a noun, a pronoun, an interrogative pro~)un, a relative pro~m~n: each of them is associated with a different syntactic category, but all of them will finally be stored in a node of type REF (standing for REFerent) ; independently of the category, a single rule is used to specify the agreement constraint mentioned above.</Paragraph> <Paragraph position="6"> let us now have a look at the box in fig.l labelled &quot;~IC RULES i: EXTENDING THE \[~a~&quot;. ~Six node types have been introduced (each node is actually a o~91ex data structure): REL (~ations, mainly verbs), REF (R\]~Ferents, no~s, pronouns, etc. ), CO~ (CONNectors, e.g. prepositions), OET (DETerminers), ADJ (ADJectives), and MOD (MCOifiers, ~ainly adverbs). Be~nd these six types, a special node (TOP) has been included to identi~ Z the main verb(s) of the sentence.</Paragraph> <Paragraph position="7"> FiE.l: A single structure is the basis of the whole interpretation process.</Paragraph> <Paragraph position="8"> The rules that are logically contained in that box are the primary tool for performing the syntactic analysis of a sentence. Each of them has the form:</Paragraph> </Section> </Section> <Section position="5" start_page="181" end_page="181" type="metho"> <SectionTitle> ~ITION ---> ACTION </SectionTitle> <Paragraph position="0"> where PR~ONDITION is a boolean expression ~nose ter~tg are elementary conditions; their predicates allow the system to inspect the current status of the analysis, i.e. the tree (for instance: '&quot;~hat is the type of the current node?&quot;, &quot;Is t.here an en~pty node of type X?&quot;) ; a look-ahead can also be included in the preconditions (maxirman 2 words).</Paragraph> <Paragraph position="1"> The right-hand side of a rule (ACTION) consists in a sequence of operations; there are two operators:</Paragraph> </Section> <Section position="6" start_page="181" end_page="182" type="metho"> <SectionTitle> CRLINK (X,Y) </SectionTitle> <Paragraph position="0"> which creates a new instance of the type X and links it to the nearest node of type Y existing in the rightn~Dst path of the tree (and moving only</Paragraph> <Paragraph position="2"> which fills the nearest node (see above) of type X with the value V (which in most cases coincides with the lexical date about the current input word).</Paragraph> <Paragraph position="3"> '\]\[he rules are grouped in packets, each of which is associated with a lexical category. It is worth noting that the choice of the rule to fire is non-deterministic, since different rules can be executed at a given stage. On the other hand, the non-determinism has been reduced by making the preconditions of the rules belonging to the same packet mutually e~uzlusive; consequently, the status is saved on the stack only (but not always) if the input word is syntactically ambiguous. Note that nothing prevents there being exceptions to this rule. For e~le, in ~glish the past indicative and the past participle u.~ually have the same form: in this case, ~ different rules of the V~ packet could be activated if the context allows for both interpretations.</Paragraph> <Paragraph position="4"> Currently, the syntactic categories of an ambiguous word are ordered manually in the lexicon; since the &quot;first&quot; rule is deten~ined by that order, the selection of the rule to execute depends Only on the choices made by the designer of the lexicon. Same experiments :,a~e been made to include a weighting mechanism, which should depend both on the syntactic context and on the semantic knowledge (Lesmo & Torasso, 1985).</Paragraph> <Paragraph position="5"> A second &quot;syntactic&quot; box appears in fig.l. It refers to rules that are, in a sense, weaker than the rules of the set discussed above. The rules of the first set are aimed at defining acceptable syntactic structures, where &quot;acceptable&quot; is used to maan that the resulting structure is semantically interpretable (for instance, a determiner cannot be used to modify an adjective). On the contrary, the rules of t~he second set specify which of the meaningful sentences are well formed; in particular, they are used to check gender and number agreement and the ordering of constituents (e.g. the fact that in ~glish an adjective should occur before the noun it refers to, whereas this is not always the case in Italian). The separation between the rules of the two sets is the feature that makes the system robust from a syntactic point of view (see (Lesmo & Torasso, 1984) for further details).</Paragraph> <Paragraph position="6"> It may be noticed that, in fig. i, both the second set of syntactic rules we have just discussed and a part of the semantic knowledge have the purpose of '~alidating the tree&quot;, independently of t.he fact that the second-level syntactic constraints can be broken (they are &quot;weak&quot; constraints), whilst the semantic constraints can not (they are &quot;strong&quot; constraints), sane action must be performed when the structure hypothesized by the first-level rules does not match those constraints.</Paragraph> <Paragraph position="7"> The task of the rules called &quot;natural changes&quot; (see fig.l) is to restructure the tree in order to provide the parser with a new, &quot;correct&quot; structure. We will not go into further details here, since the natural changes (in particular t_he one concerning the treatn~nt of conjunctions) will be discussed in a following section; however, in order to give a complete picture of the behavior of the parser, we must point out ~.hat the natural changes can fail (no correct structure can be built) . In this case, the parser returns to the original structure and issues a warning m~ssage, if the trigger of the natural changes ~as a weak constraint; otherwise (semantic failure) it backtracks to a previous choice point.</Paragraph> </Section> class="xml-element"></Paper>