File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/w90-0123_metho.xml
Size: 19,710 bytes
Last Modified: 2025-10-06 14:12:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W90-0123"> <Title>Relational-Grammar-Based Generation in the JETS Japanese-English Machine Translation System</Title> <Section position="3" start_page="174" end_page="176" type="metho"> <SectionTitle> 2 For discussion of parsing in JETS, see Maruyama, Watanabe and Ogino (1989). </SectionTitle> <Paragraph position="0"> Japanese CS for (1) English CS for (2) & (3) rashii seem itta go(tense, past) /loci 1/ loci karera Tookyoo they Tokyo (topic. wa) (pp. e) (topic. T) (prep. to) Given the English CS, it is up to the GENIE English generator to generate either (2) or (3). Based on the information that they in the English CS is marked as the topic of the sentence, GENIE will map the CS into the superficial (unordered) relational structure shown in Figure 2 via the relational rule of Subject-to-Subject Raising (so-called A-raising). Subsequent rules of Tense-Spelling and Linearization (including the spelling out of verbal forms and prepositions) will result in the string They seem to have gone to Tokyo, as shown in Note that &quot;6&quot; means &quot;complement&quot;. --- Lineanzation, etc .... > They seem to have gone to Tokyo As illustrated above, RG, like TG, is a &quot;multistratal&quot; theory, i.e., clauses typically have more than one level of syntactic analysis, and these levels/strata are mediated by clause-level rules. In the case of TG, the structures are phrase-structure trees, and transformations map trees into trees; in the case of RG, the structures are edge-labelled trees (called relational structures (RS)), where the edge labels represent primitive relations, and the rules map RSs into RSs.</Paragraph> <Paragraph position="1"> The use of multiple strata sets RG apart from functional frameworks such as FUG (Kay 1979) and LFG (Bresnan 1982), which also use primitive relations (functions), and from all other monostratal frameworks such as GPSG (Gazdar, et. al. 1985), whether functional or not. The manipulation of explicitly marked relations in unordered relational structures sets RG apart from TG. In our work on Japanese-English MT, the RG concept of multiple relational strata has proven to be of significant practical use -- facilitating the design and development of a limited transfer component and a robust generation component, enhancing modularity, and allowing the linguistic processing to be conceptualized in a uniform fashion.</Paragraph> </Section> <Section position="4" start_page="176" end_page="176" type="metho"> <SectionTitle> 3- The RG Rule Writing Language: GEAR </SectionTitle> <Paragraph position="0"> One key aspect of our implementation of an RG generator is the GEAR rule-writing language. GEAR permits a grammar developer to write computationally powerful RG rules in a linguistically natural manner. GEAR rules identify grammatical objects via path specifications, of which there are two types: (1) node-specifier, consisting of a sequence of one or more relation names, and (2) property-specifier, consisting of a node-specifier followed by a property name. For instance, 1:1 indicates a node that is the subject of a node that is the subject of the node currently being processed (the focus) and 2.tense denotes the value of the property tense of a node that is the direct object of the focus.</Paragraph> <Paragraph position="1"> GEAR path expressions are superficially similar to the expressions used in unification-based frameworks such as FUG and PATR (Shieber, et. al. (1983)). However, GEAR is not unification based, rather it provides a number of procedural operations, including node deletion and node creation.</Paragraph> <Paragraph position="2"> Each rule consists of a sequence of statements, of which there are several types, e.g., IF-THEN-ELSE, CALL, ON and restructuring statements. IF-THEN-ELSE statements control the rule internal processing flow. CALL statements are used to invoke rules by name. An ON statement invokes a specified rule on a node reachable from the focus via a node-specifier.</Paragraph> <Paragraph position="3"> There are several types of restructuring statement, e.g., ASSIGN, CREATE, DELETE and COPY.</Paragraph> <Paragraph position="4"> An ASSIGN statement is used to alter the relations of a node identified via a node-specifier; the new relation is also specified by a node-specifier. The core of GENIE's A-raising rule, whose relational changes are illustrated in Figure 2 above, is (using 6 for &quot;complement&quot;): null</Paragraph> <Paragraph position="6"> The complete rule is shown in Figure 4.</Paragraph> <Paragraph position="7"> Creation, copying and deletion of nodes are also specifiable but space limitations preclude discussion.</Paragraph> </Section> <Section position="5" start_page="176" end_page="180" type="metho"> <SectionTitle> 4- The GENSHELL generator shell </SectionTitle> <Paragraph position="0"> Building on our experience with an earlier prototype developed by Schindler (1988), we have developed an NL-independent generator shell, GENSHELL, to facilitate the development of RG generators. For any given generator, grammar developers need only specify the designated grammatical relations, parts of speech, a part-of-speech hierarchy, dictionaries and grammars.</Paragraph> <Paragraph position="1"> GENSHELL takes this information and constructs a runtime generator.</Paragraph> <Paragraph position="2"> One of the distinctive aspects of GENSHELL, due to Sehindler (1988), is the concept of category-driven processing. In category-driven processing, parts of speech are represented as categories in a category hierarchy (POSH) and nodes in RSs are represented as objects which are instances of categories and thus can inherit properties via the POSH, Among the inheritable properties are grammar rules. For instance, the rules for Passive and Subject-to-Object Raising (so-called B-Raising; discussed later) would be associated with the class Transitive Verb, A-raising would be associated with the class Intransitive Verb, and Subject-Verb Agreement would be associated with the superordinate class Verb.</Paragraph> <Paragraph position="3"> In our implementation, all rules are defined with respect to named rule bundles, and rule bundles are associated either with categories in the POSH, the general/default eases, or with lexical entries, the special cases. Rule definitions have the form:</Paragraph> <Paragraph position="5"> (As shown in Figure 4 above, a default rule bundle associated with a POS class is given the same name as that class.) When a node N associated with category C and lexical entry L is being processed, the rule search routine, given a rule named R -- the'latter comes from so-called agenda rules which are also associated with C D uses inheritance to first search for R among any rule bundles named in L, then searches for R among C's rules, then C's parent's rules and so on up to the top of the hierarchy until either some rule named R is found or the top category is reached and the process fails. In short, in category-driven processing, the grammar invoked on N is constructed as appropriate at processing time on the basis of lexically activated rules and the rules accessible to N's category using the POSH and inheritance.</Paragraph> <Paragraph position="6"> One example is the ordering of adjectives and nouns. The class Noun is associated with a general/default lineanzation rule which orders adjectives before nouns, generating phrases like tall woman. Nouns like someone, anyone, etc. are associated with a lexically triggered lineafization rule which places the adjective after the head noun. These two rules are both named Linearize. Thus, if the focus is someone and it is modified by tall, the search routine, looking for Linearize, will first find the special rule, correctly generating someone tall.</Paragraph> <Paragraph position="7"> A category-driven system has two advantages over more conventional rule systems: (i) it provides a natural mechanism for dealing with special cases triggered by lexical items, while providing a fail-soft mechanism in the form of the general rules inherited from the POSH and (ii) only rules that in principle could be relevant to processing a given node in an RS will be tested for application. That is, the POSH provides a linguistically motivated means for organizing a large grammar into subgrammars. 3 5- GENIE: the English generator Generating from CSs requires a robust generation grammar of the target language, as well as a decision-making component that decides which surface form is to be generated. The generation grammar employed in GENIE is a (deterministic) relational grammar having a substantial number of clause-level rules which alter grammatical relations, e.g., Passive, A-raising and B-raising, as well as minor rules such as Tense-Spelling and Linearization (the latter of which does not alter grammatical relations).</Paragraph> <Paragraph position="8"> As illustrated in Figure 1 above, CSs typically do not correspond directly to grammatical sentences.</Paragraph> <Paragraph position="9"> Further, any given CS typically constitutes the basis for the generation of a number of superficial forms, e.g., (2) and (3) above. This control problem has been addressed by splitting generation into two phases: a syntax planning phase and an execution phase. The function of GENIE's planner is quite different from that of other generators. Typically, generator planners decide &quot;what to say&quot;, constructing some sort of internal representation that is then processed by a realization component. Typical planners will be concerned with chunking into sentences, topic selection and word choice (see, e.g., Appelt(1985), Danlos (1984), Hovy(1985), Kukich (1983), McKeown (1985), McDonald (1984)), and Mann (1983)).</Paragraph> <Paragraph position="10"> In the case of JETS, however, since we are in the domain of transfer-based MT, all of these &quot;high level&quot; considerations are decided by the analysis and transfer components. In GENIE's case, the planner must, on the basis of a given CS, deal with a myriad of low-level syntactic conditions and their interactions (most of which have not been discussed or even recognized in the generation literature). Internal to GENIE, this means deciding which of the rules in the deterministic execution grammar should be applied. For instance, CSs with seem have a disjunctive grammatical condition: they must either be raised, yielding the pattern NP seem to VP (as in (2) above) , or extraposed, yielding the pattern It seems that S (as in (3) above). Failure to apply either A-raising or so-called It-Extraposition 3 Earlier work using a lexical hierarchy and inheritance in natural language processing includes Wilensky (1981), Jacobs (1985) and Zernik and Dyer (1987). These works make heavy use of phrasal patterns (so-called pattern-concept pairs) and so the conception of grammar and lexicon and hence the notion of what is inherited in these works differ greatly from ours, which is part of the generative-linguistic tradition.</Paragraph> <Paragraph position="11"> would result in the ungrammatical pattern *That S seems (in the case of Figure 1 above: *That they went to Tokyo seems). The decision to apply A-raising in the above example is stylistic (&quot;make the topic the main clause subject, if possible&quot;), but the disjunctive requirement (&quot;apply either A-raising or It-Extraposition&quot;) is grammatical. Having no control over &quot;what to say&quot;, GENIE's planner is conceptually part of the realization phase and not part of the typical &quot;planning phase&quot;. GENIE's planner communicates which rules should be applied to the execution grammar via a set of so-called rule switches, which are simply binary-valued properties whose property names are the names of execution rules, e.g., (A-raise . Yes), (Passive . No). As shown in Figure 4 above, IF statements are often used to test for a rule-switch value, which value is either set by a planning rule or comes from a lexical entry. Rule switches are a generalization of the earlier concept of transformational rule features (cf. Lakoff 1970); the generalization is that rule switches can be dynamically set by planning rules, based on lexicul, syntactic, semantic and stylistic considerations (see Johnson 1988a for more examples and further discussion).'* For example, in (1) above, based on the information that they is the topic (this information comes from transfer), a syntax planning rule which is partly responsible for making topics surface subjects sets the switch (A-raise . Yes), turning on A-raising, and the switch (It-Extra. No), turning off It- extraposition, resulting in (2) rather than (3). GENIE's architecture is shown in Figure 5.</Paragraph> <Paragraph position="12"> Planning rules insure that a multitude of lexico-syntactic and stylistic conditions are met, e.g., that clauses with modals do not undergo A-raising, preventing the generation of, e.g., *They seem to can swim; that clauses with verbs like force have passivized subordinate clauses where required to meet coreferential deletion conditions (cf. She forced him to be examined by the doctor, *She forced him (for) the doctor to examine him); and that verbs like teach undergo dative alternation if there is no specified direct object, generating He taught her rather than *He taught to her (cf. sing, which has the opposite condition - He sang to her but *He sang her).</Paragraph> <Paragraph position="13"> It is also the responsibility of the planner to make sure island constraints are not violated. For instance, if a wh-nominal is in a sentential subject, then planning rules turn on execution rules such as A-raising resulting in sentences like Who is likely to win (via A-Raising) rather than *Who is to win likely? or the stylistically marginal ?Who is it likely (that) will win?. This heuristic planning rule also insures that in the case of so-called Tough-Movement sentences, GENIE will generate sentences like Who is easy to please?, (via Tough-Movement) rather than either *Who is to #ease easy? or ?Who is it easy to please?.</Paragraph> <Paragraph position="14"> contains the agenda rules and the default planning and execution rules organized by POS.</Paragraph> <Paragraph position="15"> 4 After completing this work, we discovered that Bates and Ingria (1981) also used a mechanism similar to our &quot;rule switches&quot; to control generation within a TG framework. Their transformational constraints, however, were set by a human who wished to test what a given set of constraints would produce. That is, their system had no syntax planner which would evaluate a given base structure via a set of planning rules and set constraints insuring the generation of only grammatical sentences.</Paragraph> <Paragraph position="16"> Execution rules are turned on (or off) either by syntax planning rules or by lexical entries. To illustrate the use of lexical rule-switches, consider the following example from JETS involving verbs of prevention: 4. kanojo wa kare ga iku no o habanda she top he pp go nm pp prevent 5. She prevented him from going.</Paragraph> <Paragraph position="17"> On the Japanese side, the postposition ga marks the subject of the embedded clause kate ga iku, which has been nominalized with the dummy noun no, which carries the direct object marker o. Following the arguments given in Postal (1974), we assume that prevent is a so-called B-raising trigger (B-raising is the controversial rule which relates sentences such as He believes that she knows (not raised) and He believes her to know, in which her is raised up as direct object of believe). The CS for (5) is as shown to the fight in Figure 6 and the CS of the Japanese sentence (4) is shown to the left: 5 he Figure 6. Canonical Structures for (4) and (5) GENIE's rule of B-raising, given in Figure 7, maps the English CS into a superficial RS, as shown in Figure 8.</Paragraph> <Paragraph position="18"> As shown in Figure 6, the English and the Japanese CSs are isomorphic, i.e., there are no structural changes in transfer.</Paragraph> <Paragraph position="19"> To produce (5) from the English CS in Figure 6, as illustrated in Figure 8, merely requires the dictionary entry depicted in Figure 9.</Paragraph> <Paragraph position="20"> This lexical entry states that prevent is a transitive verb, hence has access to the rules defined for transitive verbs s Postal's English-internal arguments were based on the fact that the direct object of prevent could be existential there, weather it and idiom chunks (cf. She prevented there from being a riot/it from raining/the cat from being let out of the bag). in the POSH, e.g., Passive and B-raising (and the rules of superordinate classes), and that among its properties are the rule switch setting (B-Raise . Yes), which triggers Subject-to-Object raising, the feature (ccomp .</Paragraph> <Paragraph position="21"> from), which determines that the complement clause (fragment) will be flagged with from via a general rule, and the feature (cvform . ing), which Make-Infinitive will use when called by B-Raising to determine the verb form going in the example. Prevent has no rep(lacement)-lexical-form, which is used, e.g., to map a single input form such as look-up into a verb look and a particle up, or more generally to map senses into lexical strings. &quot;Rep-cat&quot;, also nil here, can be used to map one category system into another (not used in GENIE).</Paragraph> <Paragraph position="22"> &quot;Additional-rule-sets&quot;, also nil, is the repository for the names of any rule bundles associated with a lexical entry (e.g., easy, hard, etc. would have the additionalrule-set name tough-movement, which contains the Tough Movement rule and the planning rule that turns Tough Movement on).</Paragraph> <Paragraph position="23"> As depicted in Figure 5 above, the execution component consists of three relation-changing phases, called &quot;pre-cycle&quot;, &quot;cycle&quot; and &quot;post-cycle&quot;, in which execution rules are applied bottom-to-top, followed by a top-down linearization phase, which builds an output string that is then sent to the morphological component (not shown). Each phase has its own set of agenda rules, whose functions are to either call grammatical rules or shift control, i.e., agenda rules are a sequence of CALL statements. Agenda rules, like grammatical rules, are defined for classes, so that, e.g., the cyclic agendas for adjectives, nouns and verbs are different. For instance, part of the agenda for the cyclic phase of transitive verbs is: ... (Call B-raising) (Call Dative) (Call Passive) .... but none of these rules are relevant to adjectives, nouns or intransitive verbs. It should be noted that rules called by a particular agenda might be accessed via inheritance. E.g., Reflexivization is called in the cyclic agenda for transitive verbs, but it is associated with the class Predicate so that it is available to adjectives in cases like He is proud of himself (it is assumed that Reflexivization is executed on the proud clause before A-Raising applies on be).</Paragraph> <Paragraph position="24"> The grammar implemented in GENIE to date includes many of the important rules for English clause structure, including Yes/No questions, Wh-questions, relative clauses, subordinate clauses of various types, verb-particle combinations, raisings of various sorts, passives, and extrapositions.</Paragraph> </Section> class="xml-element"></Paper>