File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/w91-0106_metho.xml
Size: 10,015 bytes
Last Modified: 2025-10-06 14:12:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W91-0106"> <Title>REVERSIBLE NLP BY DERIVING THE GRAMMARS FROM THE KNOWLEDGE BASE</Title> <Section position="4" start_page="40" end_page="41" type="metho"> <SectionTitle> A SIMPLE EXAMPLE </SectionTitle> <Paragraph position="0"> To be concrete, consider the example in Figure One below, a simple definition of the object class (category) for generic months. This expression says that a month is a kind of time stuff that can be viewed either as a point or an interval; that it has a specific number of days, a position within the year, and, especially, that it has a name and abbreviations-the primary options for realizing references to the month in natural language.</Paragraph> <Paragraph position="1"> In our system, CTI-1, the evaluation of this expression causes a number of different things to be constructed: the object representing the category, indexing and printing functions for objects with that category (instances of the class), and a defining form. One then uses the form to create objects for the twelve months, as shown in Figure Two for AS a result of evaluating this form, we get the object for December (Figure Two). When referring to this object in generation we will look it its name field and use the word object there, or perhaps the abbreviated word.</Paragraph> <Paragraph position="2"> When parsing, we will see an instance of the word &quot;December&quot; or the phrase &quot;Dec.&quot; and want to know what object it the application program's model it refers to. In CTI-1 this is done by the phrase structure rules in Figure Three. These rules were written automatically (compiled) as one of the side-effects of defining the object for December; the code for constructing the rules was incorporated into the Define-month form by following the annotation in the expression that defined the category month, the same annotation that control where the the generator looks when it wants to realize a month object.</Paragraph> <Paragraph position="3"> These parsing rules are part of CTI-I's semantic grammar. They are rewrite rules. When the word &quot;December&quot; or the two word phrase &quot;Dec .... .&quot; is scanned, the text segment is spanned with an edge of the chart (a parse node), and the edge receives a three part label: (1) the category &quot;month&quot;, which participates in the semantic grammar, (2) the &quot;form&quot; category &quot;proper-noun&quot;, which is available to the syntactic grammar, and (3) the referent the edge picks out in the application model, i.e. the very object #<month December> that was defined by the form in</Paragraph> </Section> <Section position="5" start_page="41" end_page="41" type="metho"> <SectionTitle> SUMMARY OF THE APPROACH </SectionTitle> <Paragraph position="0"> Before going into a more elaborate example we can briefly summarize the reversible NIP architecture we have adopted. The grammar is developed on the generation side by the linguist/semantic modeler as part of defining the classes and individuals that comprise the application's domain model. They include with the definitions annotations about how such objects can be realized in natural language.</Paragraph> <Paragraph position="1"> A side-effect of definition is the automatic inversion of the generation rules specified by the annotation to construct the equivalent set of parsing rules. Parsimony and uniformity of coverage, the practical goals of reversible systems, are achieved by having the parsing grammar constructed automatically from the original forms that the linguist enters rather than having them redundantly entered by hand.</Paragraph> <Paragraph position="2"> Note that what we are projecting from as we invert &quot;the generator's rules&quot; is the generator's representation of the form-meaning relationship---its rules for mapping from specific objects in the underlying application's domain model to their (set of) surface linguistic forms by warrant of how the model has characterized them semantically. This is not the same as a representation of the principles that constrain the valid compositional forms of the language: the constraints on how individual lexicai items and syntactic constructions can be combined, what elements are required if others are present, a formal vocabulary of linguistic categories, and so on.</Paragraph> <Paragraph position="3"> That representation provides the framework in which the form-meaning relationship is couched, and it is developed by hand. For CTI-1 the design choices as to its categories and relations are taken from the theory of Tree Adjoining Grammar (Joshi 1985).</Paragraph> <Paragraph position="4"> The simplicity and immediacy of the automatic inversion is possible because in our approach the task of parsing (determining a text's form) has been integrated with the task of understanding/semantic interpretation (determining the denotations of the text and its elements in some model). This integration is brought about by using a semantic grammar. A semantic grammar brings the categories of analysis used by the parser into the same realm as those used by the generator, namely the categories of the application domain (in the present case personnel changes), for example people, companies, dates, ages, job titles, relations such as former, new, or has-title, and event types such as appoint, succeed, retire, etc.</Paragraph> <Paragraph position="5"> If the parser had been intended only to produce syntactic structural descriptions of the text, then projecting its rules from the generator would have been either impossible or trivial. An application supports a potentially vast number of categories; the syntactic categories of natural languages are fixed and relatively small. Collapsing the different kinds of things that can be realized as noun phrases down to that single category would lose the epistemological structure of the application's model and provide only minimal information to constrain or define the grammar.</Paragraph> </Section> <Section position="6" start_page="41" end_page="43" type="metho"> <SectionTitle> TREES FOR GENERATION, BINARY RULES FOR PARSING </SectionTitle> <Paragraph position="0"> Consider the definition of the event type &quot;appoint-to-position, shown in Figure Four. It's linguistic annotation amounts to the specification of a tree family in a TAG. The features given in the annotation are consulted to establish what trees the family should contain, building on the basic subcategorization frame of a verb that takes a subject and two NP complements, e.g. that it includes a passivized tree, one in participial form without its subject, and so on.</Paragraph> <Paragraph position="1"> The annotation is equivalent to binding a specific lexeme to the verb position in the trees of the family (this is a lexicalized TAG), as well as restrictions on the de~otations of the phrases that will be substituted for thel other constituents of the clause, e.g. that the subjectlpicks out a company, the first object a person, etc.</Paragraph> <Paragraph position="2"> A tree family plus its bindings is how this annotation looks from the generator's perspective. For the parser, this same information is represented quite differently, i.e. as a set of binary phrase structure rules. Such rules are the more appropriate representation for parsing (given the algorithm in CTI-1) since parsing is a process of serial scanning rather than the top-down refinement done in generation. During the scan, constituents will emerge successively bottom up, and the parser's most frequent operation and reason for consulting the grammar will be to judge whether two adjacent constituents can compose to form a phrase. (The rules are binary for efficiency concerns: CTI-1 does the multiplication operation for determining whether two adjacent constituents form a phrase in constant time regardless of the size of the grammar.) The tree family defines a set of rules that are applicable to any verb and semantic bindings that share the same subCategodzation frame, such as &quot;name&quot; or &quot;elect&quot;. In projecting the annotation on the definition of appoint-to-position into parsing rules, the compilation! process will create the rules of the family if it does ngt already exist, and also create a set of unary rules for the immediate non-terminals of the verb, one for'k each different morphological variant. One of these' rules is shown in Figure Five, along with the general rule for the object-promotion aspect of passivizafion.</Paragraph> <Paragraph position="3"> The first phrase structure rule, part of the semantic grammar, ties the past participial form of the verb into the family of rules. The long category name is a convenient mnemonic for the family of rules, since it shows by its spelling what semantic categories of constituents are expected as sibling constituents in the clause as a whole.</Paragraph> <Paragraph position="4"> The object promotion rule is a syntactic (&quot;form&quot;) rule that makes reference to the form label on an edge rather than theft semantic label. The rule for &quot;appointed&quot; has a form label showing that it is a main verb in past participle form, which is what the syntactic rule is looking for. When a segment like &quot;was appointed&quot; is scanned, the label &quot;be&quot; on the edge spanning &quot;was&quot; will be checked against the label &quot;main-verb/-ed&quot; on the edge over &quot;appointed&quot; and the resulting edge will carry the semantic label and referent of the phrase's head, i.e. the main verb.</Paragraph> <Paragraph position="5"> Figure Six shows some of the other rules in the family so that one can get an idea about how the parsing of the whole clause will be done. The rules are given just by their print forms.</Paragraph> </Section> class="xml-element"></Paper>