XML Viewer - w04-0907

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0907_metho.xml
Size: 14,957 bytes
Last Modified: 2025-10-06 14:09:11
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0907">
  <Title>Making Sense of Japanese Relative Clause Constructions</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Parameter description
</SectionTitle>
    <Paragraph position="0"> Features used in the interpretation of RCCs include a generalised case frame description, a verb class characterisation, head noun semantics, morphological analysis of the head verb, and various constructional templates. These combine to form the 49feature parameter signature of each RCC. Unless otherwise mentioned, all features are binary.</Paragraph>
    <Paragraph position="1"> Case frames are applied in determining which argument case slots are subcategorised by the head verb of the relative clause and instantiated--hence making them unavailable for case-slot gapping-and conversely which case slots are subcategorised by the head verb and uninstantiated--making them available for case slot gapping. The range of argument case slots coincides exactly with the set of argument case-slot gapping RCC types from a2 2 (8 features in total).</Paragraph>
    <Paragraph position="2"> Argument case slot instantiation features are set by comparing a given case frame to the actual input, and aligning case slots between the two according to case marker correspondence. In the case frame dictionary, a single generalised case frame is given for each verb stem. Case frames were generated from the Goi-Taikei pattern-based valency dictionary (Ikehara et al., 1997) by manually merging the major senses for each distinct verb stem. In essence, case frames are simply a list of the argument case slots for the verb in question in their canonical ordering (case frames include no modifier case slots).</Paragraph>
    <Paragraph position="3"> Each case slot is marked for canonical case marking and case slot type.</Paragraph>
    <Paragraph position="4"> Case frames can contain lexicalised case slots, which must be overtly realised for that case frame to be triggered. Examples of fixed expressions are ki-o tukeru (mind-ACC fix/attach) &amp;quot;to be careful/keep an eye out for (something)&amp;quot; and yume-o miru (dream-ACC see) &amp;quot;to dream&amp;quot;. We manually annotated each fixed argument for &amp;quot;gapability&amp;quot;, i.e. the potential for extraposition to the head NP position such as with the RCC kin-o mita yume &amp;quot;the dream I had last night&amp;quot;. If a gapable fixed argument occurs (unmodified) in head NP position, we use the &amp;quot;gapped fixed argument head NP&amp;quot; feature to return the argument type of gapped fixed argument (e.g. DIRECT OBJECT). null The unique case frame description is complemented by verb classes. Verb classes are used to describe such effects as: (1) modifier case slot compatibility, e.g. PROXIMAL verbs such as kaeru &amp;quot;return&amp;quot; are compatible with target locative modifier case slots; (2) case slot interaction, e.g. INTER-PERSONAL verbs such as au &amp;quot;meet&amp;quot; have two co-indexed argument slots to indicate the interacting parties; and (3) potential for valency-modifying alternation, e.g. INCHOATIVE verbs such as kaisi-suru &amp;quot;start&amp;quot; are listed with the (unaccusative) intansitive case frame but undergo the causative-inchoative alternation to produce transitive case frames (Jacobsen, 1992). A total of 27 verb classes are used in this research, which incorporate a subset of the verbal semantic attributes (VSAs) of Nakaiwa and Ikehara (1997) as well as classes independently developed for the purposes of this research.</Paragraph>
    <Paragraph position="5"> Head noun semantics are used to morphosemantically classify the head noun (of the head NP) into 14 classes (e.g. AGENTIVE, TEMPORAL, FIRST-PERSON PRONOUN), based on the Goi-Taikei noun taxonomy. Rather than attempting to disambiguate noun sense, the head noun semantic features are determined as the union of all senses of the head noun of the head NP. For coordinated head NPs, we take the intersection of the head noun feature vectors. One head noun semantic feature particular to RCCs is the class of functional nouns (e.g. riy-u &amp;quot;reason&amp;quot;, kekka &amp;quot;result&amp;quot; and mokuteki &amp;quot;objective&amp;quot;) which generally give rise to attributive RCCs.</Paragraph>
    <Paragraph position="6"> In processing each unit relative clause, we carry out morphological analysis of the head verb of the relative clause, returning a listing of verb morphemes and tense/aspect affixes: e.g.</Paragraph>
    <Paragraph position="7"> the verb okonawareteita &amp;quot;to have been held&amp;quot; is analysed as okona-ware-te-ita &amp;quot;to hold-PASSIVE-PROGRESSIVE-PAST&amp;quot;. This has applications in case frame transformation (e.g. passivisation), as trigger conditions in constructional templates, and in the resolution of case frame ambiguity. Case frame transformation is carried out prior to matching case slots between the input and case frame, producing a description of the surface realisation of the case frame which reflects the voice, causality, etc. of the main verb. Case frame transformation can potentially produce fan-out in the number of clause analyses, particularly in the case of the (r)are verb morpheme, which has passive, potential/spontaneous and honorific readings (Jacobsen, 1992). We produce all legal case frames in this case, and leave the selection of the correct verb interpretation for later processing. Note that the only morphological verb feature to make an appearance as an independent feature is POTENTIALITY, as it combines with nominalised adjectives to produce COMPARATIVE RCCs such as tob-eru hirosa (jump-POT size) &amp;quot;(of) size big enough to jump (in)&amp;quot;.</Paragraph>
    <Paragraph position="8"> In addition to simple features, there are a number of constructional templates, namely two features for the attributive RCC types of EXCLUSIVE and IN-CLUSIVE, and also one feature for idiomatic RCCs.</Paragraph>
    <Paragraph position="9"> The constructional template for EXCLUSIVE RCCs operates over the EXCLUDING verb class (containing nozoku &amp;quot;to exclude&amp;quot;, for example), and stipulates simple past or non-past main verb conugation and the occurrence of only an accusatively-marked case slot within the relative clause. The satisfaction of these constraints results in the EXCLUSIVE RCC compatibility feature being set, as occurs for:  and modifiability of the head NP, verbal conjugation, case marker alternation and modifier case slots/adverbials. A total of 11 templates are utilised in the current system, which are mapped onto a single feature value.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Analytical ambiguity and
</SectionTitle>
    <Paragraph position="0"> disambiguation As with any NLP task, ambiguity occurs at various levels in the data. In this section, we outline sources of ambiguity and propose disambiguation methods for each.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Analytical ambiguity
</SectionTitle>
      <Paragraph position="0"> Analytical ambiguity arises when multiple clause analyses exist, as a result of verb homophony/homography or fixed expression compatibility. null For the purposes of our system, verb homophony occurs when multiple verb entries in the case frame dictionary share the same kana content (and hence pronunciation), such that a kana-based orthography will lead to ambiguity between the different entries. Verb homography, on the other hand, occurs when multiple verb entries coincide in kanji content, leading to ambiguity for a kanji-based orthography. Both verb homophony and homography can be either full or partial, i.e. all forms of a given verb pair can be homophonous/homographic, or there can be partial overlap for particular types of verb inflection. For example, the verbs a0a2a1a4a3 kawaru &amp;quot;change&amp;quot; anda5a6a1a7a3 kawaru &amp;quot;replace&amp;quot; are fully homophonous, whereas a8a9a3 kiru &amp;quot;wear&amp;quot; and</Paragraph>
      <Paragraph position="2"> the simple non-past they diverge in kana orthography, producing kita and kitta, respectively). For verb homography, a11a9a12a2a3 tomeru &amp;quot;stop&amp;quot; and a11a13a12 a3 yameru &amp;quot;quit&amp;quot; are fully homographic, whereas a14a16a15 okonau &amp;quot;carry out&amp;quot; anda14a16a17 iku &amp;quot;go&amp;quot; are partially homographic (with overlap produced for the simple past tense, e.g., in the form ofa14a19a18a21a20 , which can be read as either okonatta or itta). Such overlap in lexical form leads to the situation of multiple verb entries being triggered, producing independent analyses for the RCC input.</Paragraph>
      <Paragraph position="3"> Fixed expressions lead to analytical ambiguity as, in most cases, the main verb of the expression will also be compatible with productive usages, by way of a generalised case frame entry. For example, in addition to the fixed expression asi-o arau (foot-ACC wash) &amp;quot;quit&amp;quot;, arau &amp;quot;wash&amp;quot; has a (unique) non-lexicalised case frame entry, which will be compatible with any lexical context satisfying the lexical constraints on the fixed expression.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Resolving analytical ambiguity
</SectionTitle>
      <Paragraph position="0"> Here, we present a cascaded system of heuristics which resolves analytical ambiguity arising from multiple verb entries, producing a unique feature vector characterisation.</Paragraph>
      <Paragraph position="1"> We select between multiple analyses for a given relative clause in the first by preferring analyses stemming from fixed expressions, over those conforming to constructional templates, in turn over those generated through generalised techniques. We define each such stratum as comprising a distinct expressional type, similarly to Ikehara et al. (1996).</Paragraph>
      <Paragraph position="2"> Expressional type is on the whole a simple but powerful disambiguation mechanism, but is not infallible. The main area in which it comes unstuck is in giving fixed expressions absolute priority over other analyses. Many fixed expressions can also be interpreted compositionally: e.g. asi-o arau (foot-ACC wash) &amp;quot;quit&amp;quot; can mean simply &amp;quot;wash (one's) feet&amp;quot;. In the case of asi-o arau, the case frame is identical between the fixed and generalised expression, but the verb classes are significanly different, potentially leading to unfortunate side-effects when trying to interpret an RCC involving the non-idiomatic sense of the verb.</Paragraph>
      <Paragraph position="3"> Fixed expressions and RCCs compatible with constructional templates tend to be relatively rare, so in most cases, ambiguity is not resolved through expressional type preferences. In this case, we apply a succession of heuristics of decreasing reliability, until we produce a unique analysis and feature vector characterisation. These heuristics are, in order of application: minimum verb morpheme content, best case frame match and representational preference.</Paragraph>
      <Paragraph position="4"> Minimum verb morpheme content involves determining the morphemic content of the head verb of the relative clause for each verb stem it is compatible with, and selecting the verb stem(s) which are morphologically least complex. Morphological complexity is determined by simply counting the number of morphemes, auxiliary verbs and affixes in the verb composite. Given the verb composite a22a4a23a24a3 mieru e.g., we would generate two analyses: mie-ru &amp;quot;can see-PRES&amp;quot; and mi-e-ru &amp;quot;see-POT-PRES&amp;quot;, of which we would (correctly) select the first. In essence, this methodology picks up on more highly stem-lexicalised verb entries, and effectively blocks more compositional verb entries.</Paragraph>
      <Paragraph position="5"> With best case frame match, we analyse the degree of correspondence between the case frame listed for each dictionary entry, and the actual case slot content of the input. In following with the shallow processing objective of this research, we simply calculate the number of case slots in the input which align with case slots in each case frame (based on case marker overlap), and divide this by the sum of the case slots in the case frame and in the input. We additionally add one to the numerator to give preference to case frames of lower valency (i.e. fewer case slots) in the case that there is no overlap with the input. This can be formalised as:</Paragraph>
      <Paragraph position="7"> where a29a31a30 is the set of case slots in the input, a32a34a33 the set of case slots in the current case frame, and a35a36 the case slot overlap operator. Note that the ordering of the case slots plays no part in calculations, in an attempt to capture the relative freedom of case slot order in Japanese.</Paragraph>
      <Paragraph position="8"> The final heuristic is of high recall but lesser precision, to resolve any remaining ambiguity. It is based on the representational preference for the current verb to take different lexical forms. The representational preference (a37a39a38 ) of lexical form a40 of verb entry a41 (i.e. a40a43a42 ) is defined as the likelihood of</Paragraph>
      <Paragraph position="10"> This is normalised over the representational preference for all source entries a40 a0 , producing the verb</Paragraph>
      <Paragraph position="12"> All frequencies are calculated based on the EDR corpus (EDR, 1995), a 2m morpheme corpus of largely technical Japanese prose.</Paragraph>
      <Paragraph position="13"> In the case of a tie in representational preference, we select one of the tied analyses randomly.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Clause cosubordination and
</SectionTitle>
      <Paragraph position="0"> disambiguation Japanese cosubordinated clauses (i.e. dependent but not embedded clauses, as indicated by the use of a conjunction such as nagara, te, tutu or si, or through continuative type conjugation: Van Valin (1984)) offer an additional avenue for disambiguation:  &amp;quot;things which were invented and gained popularity last year&amp;quot; As is apparent in (6) and (7), a consistent RCC interpretation is maintained across cosubordinated clauses, e.g. in (6), kikai &amp;quot;machine&amp;quot; is the DIRECT OBJECT of both k-oaN-si and seisaku-sita.2 It is possible to put this observation to use when interpreting cosubordinated RCCs, by coordinating the feature vectors for the unit clauses to produce a unique, coherent interpretation for the overall RCC. We apply this in two ways: byOR'ing andAND'ing the feature vectors together.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML