File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1057_metho.xml

Size: 23,123 bytes

Last Modified: 2025-10-06 14:11:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="P84-1057">
  <Title>Analysts Grammar or Japanese tn the Nu-ProJect - A Procedural Approach to Analysts Grammar -</Title>
  <Section position="3" start_page="0" end_page="267" type="metho">
    <SectionTitle>
2. Procedural Grammar
</SectionTitle>
    <Paragraph position="0"> There has been a prominent tendency tn recent computational linguistics to re-evaluate CFG and use tt dtrectly or augment tt to analyze sentences\[3.4.5\]. In these systems(frameworks), CFG rules Independently describe constraints on stngle linguistic structures, and a universal rule application mechanism automatically produces a set of posstble structures which satisfy the given constraints. It ts well-known, however, that such sets of posstble structures often become unmanageably large.</Paragraph>
    <Paragraph position="1"> Because two separate rules such as  NP ..... * NP PREP-P VP ..... * VP PREP-P  are usually prepared tn CFG grammars tn order to analyze noun and verb phrases modifted by prepositional phrases. CFG grammars provide two syntactic analyses for She was given flowers by her uncle.</Paragraph>
    <Paragraph position="2"> Furthermore. the ambiguity of the sentence ts doubled by the lexlcal ambiguity of &amp;quot;by&amp;quot;. which can be read as etther a locattve or an agenttve preposition. Since the two syntactic structures are recognized by compZetely independent ru\]es and the semantic interpretations of &amp;quot;by&amp;quot; are given by independent processes tn the \]ater stages. It ts difficult to compare these four readings during the anaZysts to gtve a preference to one of these four readings.</Paragraph>
    <Paragraph position="3"> A rule such as &amp;quot;If a sentence ts passlve and there ts a &amp;quot;by&amp;quot;-prepostttonal phrase, tt ts often the case that the prepositional phrase ftlls the deep agenttve case. (try thts ana\]ysts first)&amp;quot; seems reasonable and quite useful for choosing the most preferable interpretation, but tt cannot be expressed by refining the ordinary CFG rules. Thts ktnd of ru\]e ts quite different In nature from a CFG ru\]e. It ts not a rule of constraint on a stng\]e \]tngutsttc structure(in fact. the above four readings are a\]l \]tngulsttcal\]y posstb\]e), but tt ts a &amp;quot;heuristic&amp;quot; ru\]e concerned with preference of readings, which compares several alternative analysts paths and chooses the most feastble one. Human translaters (or humans tn general) have many  such preference rules based on vartous sorts of cue such as morphological forms of words, collocations of words, text styles, word semantics, etc. These heuristic rules are quite useful not only for increasing efficiency but also for preventing proliferation of analysts results. As Wllks\[6\] potnted out, we cannot use semanttc Information as constraints on stngle linguistic structures, but Just as preference cues to choose the most feastble Interpretations among linguistically posstble Interpretations. We clatm that many sorts of preference cues other than semanttc ones exist tn real texts whtch cannot be captured by CFG rules. We will show tn thts paper that. by utilizing vartous sorts of preference cues. our analysts grammar of Japanese can work almost determtntsttcally to gtve the most preferable Interpretation as the ftrst output, wtthout any extensive semanttc processing (note that even &amp;quot;semant|c&amp;quot; processing cannot dtsambtguate the above sentence. The four readings are semantically possible. It requtres deep understanding of contexts or situations, whtch we cannot expect tn a practical MT system).</Paragraph>
    <Paragraph position="4"> In order to Integrate heuristic rules based on var|ous levels of cues tnto a untfted analysts grammar, we have developed a programming langauage. GRADE. GRADE provtdes us wtth the following facilities.</Paragraph>
    <Section position="1" start_page="267" end_page="267" type="sub_section">
      <SectionTitle>
Expllctt Control of Rule Appl|cattons :
</SectionTitle>
      <Paragraph position="0"> Heuristic rules can be ordered according to thetr strength(See 4-2).</Paragraph>
      <Paragraph position="1"> - Nulttple Relatton Representation : Vartous levels of Informer|on Including morphological. syntactic, semantic, logtcal etc. are expressed tn a s|ngle annotated tree and can be manipulated at any ttme durtng the analysts. Thts ts requtred not only because many heuristic rules are based on heterogeneous levels of cues. but also because the analysts grammar should perform semantic/logical Interpretation of sentences at the same ttme and the rules for these phases should be wrttten tn the same framework as syntactic analysis rules (See 4-2. 4-4).</Paragraph>
      <Paragraph position="2"> - Lextcon Drtven Processing : We can wrtte heuristic rules spectftc to a stngle or a 11mtted number of words such as rules concerned wtth collocations among words. These rules are strong tn the sense that they almost always succeed. They are stored tn the lextcon and tnvoked at appropriate times durtng the analysts wtthout decreasing efficiency (See 4-1).</Paragraph>
      <Paragraph position="3"> - Expltct% Definition of Analysts Strategies : The whole analysts phase can be dtvtded into steps. Thts makes the whole grammar efficient, natural and easy %o read. Furthermore. strategic consideration plays an essential role tn preventing undesirable interpretations from betng generated (See 4-3).</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="267" end_page="267" type="metho">
    <SectionTitle>
3 Organization of Grammar
</SectionTitle>
    <Paragraph position="0"> In thts sectton, we will give the organization of the grammar necessary for understanding the discuss|on |n the follow|ng sections. The matn components of the grammar are as follows.</Paragraph>
    <Paragraph position="1">  (1) Post-Morphological Analysts (2) Determination of Scopes (3) Analysts of Stmple Noun Phrases (4) Analysts of Stmple Sentences (5) Analysts of Embedded Sentences (Relative Clauses) (6) Analysts of Relationships of SentenCes (7) Analysts of Outer Cases (8) Contextual Processing (Processing of Omttted case elements. Interpretation of 'Ha' . etc.) (9) Reduction of Structures for Transfer Phase  Each component conststs of from 60 to 120 GRADE rules.</Paragraph>
    <Paragraph position="2"> 47 morpho-syntacttc categories are provtded for Japanese analysts, each of whtch has tts own lextcal description format. 12.000 lextcal entrtes have already been prepared according to the formats. In thts classification. Japanese nouns are categorized |nto 8 sub-classes according to thetr morpho-syntacttc behavtour, and 53 semanttc markers are used to characterize thetr semanttc behaviour. Each verb has a set of case frame descriptions (CFD) whtch correspond to different usages of the verb. A CFD g|ves mapping rules between surface case markers (SCN - postpostttonal case particles are used as SCN's tn Japanese) and thetr deep case interpretations (DCZ 33 deep cases are used). DC! of an SCM often depends on verbs so that the mapping rules are given %o CFD's of Individual verbs. A CFO also gtves a normal collocation between the verb and SCM's(postpositonal case particles). Oetatled lextcal descriptions are gtven and discussed tn another paper\[7\].</Paragraph>
    <Paragraph position="3"> The analysts results are dependency trees whtch show the semanttc relationships among tnput words.</Paragraph>
  </Section>
  <Section position="5" start_page="267" end_page="267" type="metho">
    <SectionTitle>
4. Typtcal Steps of Analysts Grammar
</SectionTitle>
    <Paragraph position="0"> In the following, we w111 take some sample rules to Illustrate our points of discussion.</Paragraph>
  </Section>
  <Section position="6" start_page="267" end_page="273" type="metho">
    <SectionTitle>
4-; Relative Clauses
</SectionTitle>
    <Paragraph position="0"> Relative clause constructions in Japanese express several different relationships between modifying clauses (relative clauses) and thelr antecedents. Some relattve clause constructions  cannot be translated as relative clauses tn Engltsh. Me classified Japanese relattve clauses Into the followtn 9 four types, according to the relationships between clauses and their  antecedents.</Paragraph>
    <Paragraph position="1"> (1) Type 1 : Gaps In Cases  One of the case elements of the relattve clause ts deleted and the antecedent fills the gap. (2) Type 2 : Gaps In Case Elements The antecedent modifies a case element tn the clause. That ts. a gap exists tn a noun phrase tn the clause.</Paragraph>
    <Paragraph position="2">  (3) Type 3 : Apposition  The clause describes the content of the antecedent as the Engltsh &amp;quot;that&amp;quot;-clause tn 'the tdea that the earth ts round'.</Paragraph>
    <Paragraph position="3"> (4) Type 4 : Partlal Apposltlon The antecedent and the clause are related by certain semantic/pragmatic relationships. The relative clause of thts type doesn't have any gaps. This type cannot be translated dtrectly lnto English relative clauses. Me have to Interpolate In English appropriate phrases or clauses whtch are Implicit tn Japanese. tn order to express the semantic/pragmatic relationships between the antecedents and relative clauses explicitly. In other words, gaps extst tn the Interpolated phrases or clauses.</Paragraph>
    <Paragraph position="4"> Because the above four types of relattve clauses have the same surface forms fn Japanese ......... (verb) (noun).</Paragraph>
    <Section position="1" start_page="268" end_page="269" type="sub_section">
      <SectionTitle>
RelattvefClause Antecedent
</SectionTitle>
      <Paragraph position="0"> careful processing ts requtred to d|sttngutsh them (note that the &amp;quot;antecedents' -modified nouns- ape located after the relat|ve clauses tn Japanese). A sophisticated analysis procedure has already been developed, which fully ut|ltzes vartous levels of heuristic cues as follows.</Paragraph>
      <Paragraph position="1"> (Rule 1) There are a 11mtted number of nouns whtch are often used as antecedents of Type 3 clauses.</Paragraph>
      <Paragraph position="2"> (Rule 2) Vhen nouns with certa|n semanttc markers appear tn the relattve clauses and those nouns are followed by one of spectflc postpostttonal case part4cles, there ts a htgh possibility that the relattve clauses are Type 2. In the following example, the word &amp;quot;SHORISOKUDO&amp;quot;(processtn 9 speed) has the semanttc marker AO (attribute).</Paragraph>
      <Paragraph position="3">  &amp;quot;GEN ZN&amp;quot;(reason), &amp;quot;SHUDAN&amp;quot;(method) etc. express deep case relationships by themselves, and. when these nouns appear as antecedents. |t is often the case that they ft11 the gaps of the corresponding deep cases tn the relattve clauses.</Paragraph>
      <Paragraph position="4">  The purpose for wh|ch (someone) used thts devtce The purpose of ustn9 thts devtce (Rule 4) There ts a 11mtted number of nouns whtch are often used as antecedents In Type 4 relattve clauses. Each of such nouns requtres a specific phrase or clause to be Interpolated tn Engltsh.</Paragraph>
      <Paragraph position="5">  The result which was obtatned by ustng thts dev|ce In the above example, the clause &amp;quot;the result whtch someone obtatned (the result : gap)&amp;quot; ts onmitted tn Japanese. whtch relates the antecedent &amp;quot;KEKKA&amp;quot;(result) and the relattve clause &amp;quot;KONO SOUCHI 0 TSUKAT_TA&amp;quot;(someone used thts devtce).  A set of lextcal rules ts defined for &amp;quot;KEKKA&amp;quot;(resulL). which basically works as follows : tt examines first whether the deep object case has already been filled by a noun phrase tn the relattve clause. If so, the relattve clause ts taken as type 4 and an appropriate phrase ts Interpolated as tn \[ex-3\]. If not, the relattve clause ts taken as type 1 as tn the following example where the noun *KEKKA&amp;quot; (result) ftlls the gap of object case tn the relattve clause.</Paragraph>
      <Paragraph position="6">  The result whtch thts experiment used Such lextcal rules are Invoked at the beginning of the relattve clause analysts by a rule tn the math flow of processing. The noun &amp;quot;KEKKA * (result) is given a mark as a lexlcal property which Indicates the noun has special rules to be Invoked when tt appears as an antecedent of a relatlve clause. A11 the nouns which requlre speclal treatments In the relative clause analysts are given the same marker. The rule tn the matn flow only checks thts mark and Invokes the lextcal rules defined tn the lextcon. (Rule 5) Only the cases marked by postpostttonal case particles 'GA'. 'WO&amp;quot; and 'NI&amp;quot; can be deleted tn Type 1 relattve clauses, when the antecedents are ordtnary nouns. Gaps tn Type 1 relative clauses can have other surface case marks, only when the antecedents are spectal nouns such as described tn Rule (3).</Paragraph>
    </Section>
    <Section position="2" start_page="269" end_page="271" type="sub_section">
      <SectionTitle>
4-2 ConJuncted Noun Phrases
</SectionTitle>
      <Paragraph position="0"> ConJuncted noun phrases often appear in abstracts of scientific and technological papers. It ts Important to analyze them correctly.</Paragraph>
      <Paragraph position="1"> especially to determine scopes of conjunctions correctly, because they often lead to proliferation of analysis results. The particle &amp;quot;TO&amp;quot; plays almost the same role as the Engllsh &amp;quot;and&amp;quot; to conjunct noun phrases. There are several heuristic rules based on various levels of information to determine the scopes.</Paragraph>
      <Paragraph position="2">  there are two posstble Interpretations. one tn whlch &amp;quot;TO&amp;quot; Is a case parttcle and &amp;quot;noun TO adjective(verb)' forms a relattve clause that modifies the second noun. and the other one tn which &amp;quot;TO&amp;quot; ts a conjunctive particle to form a conJuncted noun phrase. However. it ts very 11kely that the parttcle 'TO' ts not 8 conjunctive parttcle but a post-positional case particle, if the adjective (verb) ts one of adjectives (verbs) which requtre case elements wtth surface case mark &amp;quot;TO' and there are no extra words between &amp;quot;TO * end the adjective (verb). In the following example.</Paragraph>
      <Paragraph position="3"> &amp;quot;KOTONARU(to be different)&amp;quot; ts an adjective which ts often collocated wtth a noun phrase followed by case particle &amp;quot;TO&amp;quot;.</Paragraph>
      <Paragraph position="4">  the right boundary of the scope of the conJuctton ts almost always Noun-2. The second 'TO&amp;quot; plays a role of a delimiter which deltmtts the right boundary of the conjunction. Thts 'TO&amp;quot; tS optional, but tn real texts one often places tt to make the scope unambiguous, especially when the second conjunct IS a long noun phrase and the scope is highly ambiguous without tt. Because the second 'TO' can be Interpreted as a case parttcle (not as a delimiter of the conjunction) and 'NO' following a case parttcle turns the preceding phrase to a  modlfter of s noun. on Interpretation tn whtch &amp;quot;NOUN-2 TO NO&amp;quot; ts taken as o modtrter of NOUN-3 and NOUN-3 ts token as the hood noun of the second conJunt ts also linguistically possible. However, In most cases, when two 'TO&amp;quot; particles appear tn the above position, the second &amp;quot;TO' Is Just a delimiter of the scope(see \[ex-6\]).</Paragraph>
      <Paragraph position="5">  tf Noun-! and Noun-2 are not exactly the some but nouns wtth the same morphemes, the rtght boundary  ts often Noun-2. In \[ex-7\] above, both of the heed nouns of the conJuncts. JISSOKUdegCHI(actual value) and YOSOKU-CH\[(predtcted value), have the same morpheme &amp;quot;CH\[&amp;quot; (whtch meams &amp;quot;value&amp;quot;). Thus, thts rule can correctly determine the scope, even tf the spectal word &amp;quot;KANKE1&amp;quot;(relattonshtp) does not extst. (Rule 6) If some spectal words (11ke 'SONO&amp;quot; 'SORE-NO' etc. whtch roughly correspond to 'the'. '1iS' tn Engllsh) appear tn the position: Phrases whtchlNoun-1 &amp;quot;TO' &lt;spectal word&gt; Noun-2. modtfy noun phrases the modifiers preceding Noun-1 modtfy only Noun*l but not the whole conJuncted noun phrase.</Paragraph>
      <Paragraph position="6"> (Rule 7) \[n ...... Noun-1 'TO' . ........... Noun-2.</Paragraph>
      <Paragraph position="7"> tf Noun-1 and flour-2 belong to the same spectftc semanttc categories, 11Le actton nouns, abstract nouns etc, the rtght boundary ts often Noun-2.</Paragraph>
      <Paragraph position="8"> (Rule 8) \[n most conJuncted noun phrases, the structures of conJuncts are well-balanced.</Paragraph>
      <Paragraph position="9"> Therefore, tf a relattve clause precedes the first conjunct and the length of the second conjunct (the number of words between 'TO&amp;quot; and Noun-2) ts short 11ke \[Relative Clause\] Noun-1 'TO&amp;quot; . ....... Noun-2 the relattve clause modtftes both conJuncts, that ts. the antecedent of the relattve clause ts the whole conJuncted phrase.</Paragraph>
      <Paragraph position="10"> These heuristic rules are based on different levels of Information (some are based on surface lexlcal Items. some are based on morphemes of words, some on semanttc |nformatton) and may lead to different decisions about scopes. However. we can distinguish strong heuristic rules (t.e. rules whtch almost always give correct scopes when they are applled) from others. In fact. there extsts some ordertng of heuristic rules according to thetr strength. Rules (1). (2). (3), (4) and (6). for example, almost always succeed, and rules like (7) and (8) often lead to wrong decisions. Rules 11ke (7) and (8) should be treated as default rules whtch are applted only when the other stronger rules cannot dectde the scopes. We can deftne tn GRADE an arbitrary ordertng of rule applications. Thts capability of contro114ng the sequences of rule applications ts essential tn Integrating heuristic rules based on heterogeneous levels of Information tnto a untried set of rules.</Paragraph>
      <Paragraph position="11"> Note that most of these rules cannot be naturally expressed by ordtnary CFG rules. Rule (2). for example, ts a rule whtch blocks the application of the ordtnary CFG rule such as NP ---&gt; NP &lt;case-particle&gt; NO N when the &lt;case-particle&gt; ts 'TO' and a conjunctive parttcle 'TO' precedes thts sequence of words.</Paragraph>
    </Section>
    <Section position="3" start_page="271" end_page="273" type="sub_section">
      <SectionTitle>
4-3 Determination of Scopes
</SectionTitle>
      <Paragraph position="0"> Scopes of conJuncted noun phrases often overlap wtth scopes of relattve clauses, whtch males the problem of scope determination more complicated. For the surface sequence of phrases 11ke NP-1 'TO' NP-2 &lt;case-particle&gt; ..... &lt;verb&gt; NP-3 there are two passable scopes of conJuncted noun clause 11ke relationships between the phrase and the relattve  Thts ambiguity together with genutne ambtgu|ttes tn scopes of conJuncted noun phrases tn 4-2 produces combinatorial Interpretations tn CFG grammars, most of whtch are linguistically posstble but practically unth|nkable. It Is not only Inefficient but also almost Impossible to compare such an enormous number of linguistically posstble structures after they have been generated. In our analys|s grammar, a set of scope dectston rules are applted in the early stages of processing tn order to block the generation of combinatorial Interpretations. \]n fact. the structure (2) tn whtch a relsttve clause extsts wtthtn the scope of * conJuncted noun phrase is relatively rare tn real texts, especially when the relattve clause ts rather long. Such constructions wtth long relattve clauses are a ktnd or garden path sentence.</Paragraph>
      <Paragraph position="1"> Therefore. unless strong heuristic rules like (2). (3) and (4) tn 4-2 suggest the structure (2). the structure (1) ts adopted as the ftrst chotce (Note that, tn \[ex-7\] tn 4-2, the strong heuristic rule\[rule (3)\] suggests the structure (2)). Stnce  the result of such a decision ts explicitly expressed tn the tree: SCOPE-OF-CONUN~CTI~ and the grammar rules in the later stages of processing work on thts structure, the other interpretations of scopes will not be tried unless the ftrst choice fatls at e later stage for some reason or alternative interpretations are explicitly requested by a human operator. Note that a structure llke</Paragraph>
      <Paragraph position="3"> conJunct~d noun phrase which ts linguistically posstble but extremely rare tn real texts, is naturally blocked.</Paragraph>
      <Paragraph position="4"> 4-4 Sentence Relationships and Outer Case Analysts Corresponding to Engltsh sub-ordinators and co-ordinators like 'although'. 'tn order to'. 'and' etc.. we have several different syntactic constructions as follows.</Paragraph>
      <Paragraph position="5">  constructions, end (2) end (3) to Engltsh sub-ordinate constructions. However. the correspondence between the forms of Japanese end Engltsh sentence connections ts not so straightforward. Some postposttional particles tn (2). for example, are used to express several different semantic relationships between sentences. and therefore, should he translated tnto different sub-ordtnators in Engltsh according to the semantic relationships. The postpostttonal parttcle 'TAME' expresses either 'purpose-action&amp;quot; relationships or 'cause-effect' relationships. In order to dtsambtguate the semantic relationships expressed by 'TAME'. a set of lextcal rules ts defined in the dictionary of &amp;quot;TAME'. The rules are roughly as follows.</Paragraph>
      <Paragraph position="6"> (1) If S1 expresses a completed actton or a stative assertion, the relationship ts &amp;quot;cause-effect'.</Paragraph>
      <Paragraph position="7"> (2) If $1 expresses neither a completed event nor e statIve assertion and $2 expresses s controllable action, the relationship ts 'purpose- null * In order to go to Tokyo. I bought a ticket.</Paragraph>
      <Paragraph position="8"> Note that whether S1 expresses a completed action or not is determined tn the preceding phases  by ustng rules whtch uttllze espectual features of verbs described tn the dictionary and aspect formattves following the verbs (The classification of Japanese verbs based on thetr aspectual features and related toptcs are discussed tn \[8\]). Ve have already wrttten rules (some of whtch are heuristic ones) for 57 postpostttonal particles for conJucttons of sentences 11ke 'TAME'.</Paragraph>
      <Paragraph position="9"> Postpostttonal particles for cases, whtch follow noun phrases and express case relationships, are also very ambiguous In the sense that they express several different deep cases. Vhtle the Interpretation of tnner case elements are dtrectly given tn the verb dictionary as the form of mapping between surface case part|cles and thetr deep case Interpretations. the outer case elements should be semantically Interpreted by referring to semanttc categories or noun phrases and properties of verbs. Lextcal rules for 62 case particles have also been Implemented and tested.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML