File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0513_metho.xml

Size: 13,105 bytes

Last Modified: 2025-10-06 14:15:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0513">
  <Title>I I I I i I i: AN ANNOTATED CORPUS IN JAPANESE USING TESNI\] RE'S STRUCTURAL SYNTAX</Title>
  <Section position="3" start_page="0" end_page="109" type="metho">
    <SectionTitle>
1 WORDS
</SectionTitle>
    <Paragraph position="0"> We have taken the character (kana or kanji), which is the physical unit of a Japanese text, as the unit of measure of the length of a section of text. With the convention of starting at position O, we locate any piece of text, and hence words, using an interval notation. Note that there is no word separator (or blank spaces) in Japanese. In the following sentence 1, the word t~m is located by the interval \[3_5\] and the word ~.'C by \[6.9\]. This notation will be used in correspondences (Section 2.2).</Paragraph>
    <Paragraph position="1"> 11 ~ 12 ~ 13o Could I get a room upstairs C/.</Paragraph>
    <Section position="1" start_page="0" end_page="109" type="sub_section">
      <SectionTitle>
1.1 Species and Categories of Words
</SectionTitle>
      <Paragraph position="0"> The differentiation between: content words, which are associated with a concept, and function words, which express syntactical information was not difficult to apply to Japanese.</Paragraph>
      <Paragraph position="1">  Some examples of content words include :~.~ (yoyaku, reservation), ~L~ (okureru, to be late), ~ (takai, expensive), ~ (tyokusetu, directly). Tesni~re distinguishes between two categories of content words: processes and substances, which are, for explanation purposes, usually exemplified by verbs and nouns, respectively, in Indo-European languages. This is also consistent with Japanese.</Paragraph>
      <Paragraph position="2"> These two categories are in turn divided into: concrete and abstract categories, which opposes the concrete notion of processes and substances to their abstract attributes, and gives rise to the following categorisation for content words (see also (Starosta 88), Tesni~re's notations is shown in capitals).</Paragraph>
      <Paragraph position="3">  It is to be noted that, in the case of Japanese, two categories of words are variable in relation to aspect and negation: abstract substances (A) and concrete processes (I), which are respectively (i-)adjectives and verbs in terms of Japanese grammars.</Paragraph>
      <Paragraph position="4"> Now, some classes of words, which pose problems in Japanese grammar books written in Engush, such as the so-called na-adjectives (W~&amp;quot; (sizuka, quiet)), and the Sino-Japanese nounsverbs formed in conjunction with use of the Japanese verb -J-70 (suru, to do), can easily be categorised as nouns (O). This is consistent with w'hat is taught in Japanese schools, (see Appendix B), their syntactical behaviour being prefectly described by transference (see Section 4).</Paragraph>
      <Paragraph position="5">  Grammatical tools, the role of which is to either make explicit, or change the category of a content word, or to define relationships between words, are called function words. These words will appear in eztenso in structural representations. null In Japanese, many can be easily identified, such as, 7~ (ga, t~, nominative case postdeg particule), 69 (no, J~{~'J, genitive case postparticle), 69&amp;quot;~ (node, ~l~'J, equivalent to subordinate conjunction), 7)~ (ka, ~.~, end of interrogative sentence particle), 3&amp;quot;70 (suru, +)&amp;quot; &amp;quot;~B~, support verb for Sino-Japanese nouns), etc.</Paragraph>
      <Paragraph position="6"> Of course, some function words can also be content words in a different context. For instance, the verb &amp;quot;J-70 (suru), is either the support verb for Sino-Japanese nouns, (a function word in that case), or the verb &amp;quot;to do&amp;quot; (a content word).</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="109" end_page="112" type="metho">
    <SectionTitle>
2 CONNECTION
</SectionTitle>
    <Paragraph position="0"> Tesni~re speaks of connection to describe the relations between words in a sentence in terms of their subordination relations. This concept includes predicate-argument or governor-modifier relations as w-eU as predicate-circumstantial relations (Eldments, p. 14).</Paragraph>
    <Paragraph position="1"> The study of sentences, which is the proper object of structural syntaz is essentially the study of its structure, i.e.</Paragraph>
    <Paragraph position="2"> the hierarchy of its connections.</Paragraph>
    <Section position="1" start_page="109" end_page="109" type="sub_section">
      <SectionTitle>
2.1 Tree Representation: Stemmas
</SectionTitle>
      <Paragraph position="0"> Tesni~re was the first to propose, in 1934, to systematically use graphical representations 2 which he called sternmas, for representing this hierarchy (Tesni~re 34). However, these stemmas were more than simple trees. Although, we will show that the introduction of correspondence makes it possible to encode Tesni~re's representations using just trees.</Paragraph>
      <Paragraph position="1"> Basic connections are those which link concrete notions with their abstract attributes (Figure 1).</Paragraph>
      <Paragraph position="2">  By replacing content words with their class (O, I, A, E) &amp;quot;virtual&amp;quot; stemmas (on the right) can be derived from the &amp;quot;real&amp;quot; ones (on the left).</Paragraph>
    </Section>
    <Section position="2" start_page="109" end_page="111" type="sub_section">
      <SectionTitle>
2.2 Correspondences
</SectionTitle>
      <Paragraph position="0"> To explicitly indicate which word, or more specifically, especially in the case of Japanese, which chunk of text corresponds to which node in the stemma, we adopted the use of correspondences (Boitet and Zaharin 88).</Paragraph>
      <Paragraph position="1"> We note two kinds of correspondence: * words-to-node, and * sentence parts-to-complete substring-to-subtree.</Paragraph>
      <Paragraph position="2"> subtree, or Constraints Correspondences are noted by intervals, as introduced above, and are governed by three constraints (Lepage 94).</Paragraph>
      <Paragraph position="3"> * global correspondence: an entire tree corresponds to an entire sentence; 2He acknowledged that two Russian linguists used trees in 1930 to explain some syntactic phenomenon, but, unlike Tesni~re, the use of trees was not pivotal in their explanations.</Paragraph>
      <Paragraph position="5"> * inclusion: a subtree which is part of another subtree T, must correspond to a sub-string in the substring corresponding to T; * membership: a node in a subtree T, must  correspond to words members of the sub-string corresponding to T.</Paragraph>
      <Paragraph position="6">  In Figure 2, on each node of the stemma, two intervals stand for the words-node and the substring-subtree correspondences, in that order. The entire sentence 3 extends from 0 to 11, as indicated by the root. This root is a verb, denoted as I, and is located in position 7 to 11: f.C/ 19 ~ 3&amp;quot;- (narimasu). Similarly, the node labelled t: (hi) corresponds as a word to the case-maker ~:, which extends from 6 to 7 in the sentence. The entire subtree dominated by the node corresponds to the phrase ~lJ~: (beturyoukin hi) which extends from 3 to 7.</Paragraph>
      <Paragraph position="7">  tervals are possible. In Figure 3, the deverbative noun ~b~ (negai, request) from ~li') (negau, to ask for) takes an accusative argument extending from 0 to 4, ~3~1~&amp;quot; (o+namae wo, your &amp;quot;~Refer to Table A in Appendix for notations used in glosses.</Paragraph>
      <Paragraph position="8">  name). Because the honorific prefix ~3 + (o+) can only be applied to a noun, obtained by attaching the suffix + b~ (+i) (transference, see Section 4), the subtree dominated by the verbal root corresponds to a non-connex substring</Paragraph>
    </Section>
    <Section position="3" start_page="111" end_page="112" type="sub_section">
      <SectionTitle>
2.3 Predicate-Argument Structures
</SectionTitle>
      <Paragraph position="0"> Free-Order- Subject A main feature of dependency structures, to which Tesni~re's representations pertains, is that they do not provide any preferred position to the subject (see Fourquet's foreword to (Grdciano and Schureacher 96), and (Zemb 78), p. 393, for a discussion). This corresponds particularly well with our data because the free ordering of case-marked phrases (not words) is a property of Japanese, which makes dependency grammars more adequate in its description 4. For exam4(Mel'~uk 88) and (Starv6ta 88), among others have already commented that constituency structures are English-oriented representations into which some linguists try desperately to cast other languages. An illustration is (Gunji 87). After a ten-page discussion, and despite an honest acknowledgment that there is absolutely no basis for this, he draws the conclusion that a preferred position for the subject, as a left sister of the pie, the two following propositions are equally valid, where location and subject have been exchanged. null rokunin ga hitoheya ni ireru '6-people' NOM 'l-room' LOC 'can-enter' a room that can accommodate 6 people</Paragraph>
      <Paragraph position="2"> a room that can accommodate 6 people Omission Moreover, in Japanese, the omission of any of the case-marked phrases is possible. One can perfectly imagine a situation where a traveler first announces that he is in a group of 6 people, and then merely utters the following sentence: --~ ~z X.~,~ hitoheya ni ireru 'one-room' LOC 'can-enter' a room that can accommodate 6 people This sentence has no subject, and yet it is unambiguously understood as a request for a room which can accommodate 6 people altogether. Ergative Constructions Moreover, the search for the &amp;quot;real subject&amp;quot;, as opposed to the syntactical subject, is meaningless in dependency representations of ergative constructions. Such constructions exist in Japanese 5 with a range of adjectives, such as, ~ L.~ ~ (hosii) (20 occurrences in our corpus), or verbal forms in -t~,~ (tai) (around 310 occurrences in the corpus), or the so-calhd &amp;quot;passive&amp;quot; or &amp;quot;medio-passive&amp;quot; verbs, such as, gP.. 1o (mieru, c.f. Fr. se voir).</Paragraph>
      <Paragraph position="3"> verb, has to be postulated for Japanese, because.., it is so in English.</Paragraph>
      <Paragraph position="4"> 5However, the ergative case does not exist in Japanese, and it would be difficult to call Japanese an ergative language (see (Mel'euk 88), p. 250-253, for definitions concerning ergativity).</Paragraph>
      <Paragraph position="5">  ject and the object of a French passd composdof a transitive verb, do not both link to the past participle. He shows that some clues indicate that the subject links to the auxiliary, while the object should be linked with the past participle. Similar analysis seems particularly well adapted to some Japanese constructions too, not because of the agreement in gender-number, but because of case semantics.</Paragraph>
      <Paragraph position="6"> For instance, in the following sentence, the subject, postal code, cannot be considered the subject of the verb, to write deg.</Paragraph>
      <Paragraph position="7"> yuubinbangou ga kaite aru 'postal code' NOM 'write' 'is' The postal code is written (e.g. on an envelope) null However, changing the auxiliary, ab~ (aru) into b~ 7o (iru) implies a change in the case of postal code.</Paragraph>
      <Paragraph position="8"> yuubinbangou wo kaite iru 'postal code' ACC 'write' 'is' The postal code is being written (e.g. by Lucien) - Somebody is writing the postal code. This convinced us to adopt Tesni~re's analysis, where the subject is linked with the auxiliary (Figure 5).</Paragraph>
      <Paragraph position="9"> C/1~'~ (kaite) is a non-conclusive, pending, form of the verb 8 &lt; (kaku), which is translated in English by &amp;quot;writing ~ or &amp;quot;written&amp;quot; according to the context.</Paragraph>
      <Paragraph position="11"> yuubinbangou ga kaite aru 'postal code' NOM 'write' 'is' The postal code is written</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="112" end_page="113" type="metho">
    <SectionTitle>
3 JUNCTION
</SectionTitle>
    <Paragraph position="0"> Junction gathers the facts of coordination, and factorisation.</Paragraph>
    <Paragraph position="1"> Junction words in Japanese include words such as ~ (to, and for nouns), ~ (ya, or for nouns), L (si, or for verbs), ~;t 2&amp;quot; (kedo, but). We propose to represent them with one node  bearing a special label: we prefix and suffix by - the function word. Accordingly, we can easily represent cap junctions as in Figure 6.</Paragraph>
    <Paragraph position="2">  On the other hand, in cup cases, the same dependent shares several governors. A tree can be '~factored&amp;quot; by using a special node, V, bearing  the same correspondences as its root. Figure 7 is a slightly modified corpus sentence.  Because of junctions, a structure representing a sentence may be a forest. This is a significant difference to constituency representations, but conforms with Tesni~re's description (e.g. p. 649). Figure 7 is such an example.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML