File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2404_intro.xml

Size: 9,637 bytes

Last Modified: 2025-10-06 14:04:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2404">
  <Title>Chunking Japanese Compound Functional Expressions by Machine Learning</Title>
  <Section position="3" start_page="25" end_page="28" type="intro">
    <SectionTitle>
(337 expressions
</SectionTitle>
    <Paragraph position="0"> in total), and collects example sentences of those expressions. As a first step of developing a tool for identifying Japanese compound functional expressions, we start with those 125 major functional expressions and their variants. In this paper, we take an approach of regarding each of those variants as a fixed expression, rather than a semi-fixed expression or a syntactically-flexible expression (Sag et al., 2002). Then, we focus on evaluating the effectiveness of straightforwardly applying a stan- null For each of those 125 major expressions, the differences between it and its variants are summarized as below: i) insertion/deletion/alternation of certain particles, ii) alternation of synonymous words, iii) normal/honorific/conversational forms, iv) base/adnominal/negative forms.</Paragraph>
    <Paragraph position="1">  = nice because [?]) dard chunking technique to the task of identifying Japanese compound functional expressions.</Paragraph>
    <Paragraph position="2"> As in Table 2, according to their grammatical functions, those 337 expressions in total are roughly classified into post-positional particle type, and auxiliary verb type. Functional expressions of post-positional particle type are further classified into three subtypes: i) those subsequent to a predicate and modifying a predicate, which mainly function as conjunctive particles and are used for constructing subordinate clauses, ii) those subsequent to a nominal, and modifying a predicate, which mainly function as case-marking particles, iii) those subsequent to a nominal, and modifying a nominal, which mainly function as adnominal particles and are used for constructing adnominal clauses. For each of those types, Table 2 also shows the number of major expressions as well as that of their variants listed in GHY, and an example expression. Furthermore, Table 3 gives example sentences of those example expressions as well as the description of their usages.</Paragraph>
    <Section position="1" start_page="26" end_page="28" type="sub_section">
      <SectionTitle>
2.2 Issues on Identifying Compound
</SectionTitle>
      <Paragraph position="0"> Functional Expressions in a Text The task of identifying Japanese compound functional expressions roughly consists of detecting candidates of compound functional expressions in a text and of judging the usages of those candidate expressions. The class of Japanese compound functional expressions can be regarded as closed and their number is at most a few thousand.</Paragraph>
      <Paragraph position="1">  Therefore, it is easy to enumerate all the compound functional expressions and their morpheme sequences. Then, in the process of detecting candidates of compound functional expressions in a text, the text are matched against the morpheme sequences of the compound functional expressions considered.</Paragraph>
      <Paragraph position="2"> Here, most of the 125 major functional expressions we consider in this paper are compound expressions which consist of one or more content words as well as functional words. As we introduced with the examples of Table 1, it is often the case that they have both a compositional content word usage as well as a non-compositional functional usage. For example, in Table 3, the expression &amp;quot;qsq(to-naru-to)&amp;quot; in the sentence (2) has the meaning &amp;quot; that (something) becomes [?]&amp;quot;, which corresponds to a literal concatenation of the usages of the constituents: the post-positional particle &amp;quot;q&amp;quot;, the verb &amp;quot;s&amp;quot;, and the post-positional particle &amp;quot;q&amp;quot;, and can be regarded as a content word usage. On the other hand, in the case of the sentence (1), the expression &amp;quot;qs q(to-naru-to)&amp;quot; has a non-compositional functional meaning &amp;quot;if&amp;quot;. Based on this discussion, we classify the usages of those expressions into two classes: functional and content. Here, functional usages include both non-compositional and compositional functional usages, although most of the functional usages of those 125 major expressions can be regarded as non-compositional. On the other hand, content usages include compositional content word usages only.</Paragraph>
      <Paragraph position="3"> More practically, in the process of detecting candidates of compound functional expressions in a text, it can happen that more than one candidate expression is detected. For example, in Table 4, both of the candidate compound functional expressions &amp;quot;qMO(to-iu)&amp;quot; and &amp;quot;qMO ww(to-iu-mono-no)&amp;quot; are detected in the sentence (9). This is because the sequence of the two morphemes &amp;quot;q(to)&amp;quot; and &amp;quot;MO(iu)&amp;quot; constituting the candidate expression &amp;quot;qMO(to-iu)&amp;quot; is a sub-sequence of the four morphemes constituting the candidate expression &amp;quot;qMOww(to-iu-monono)&amp;quot; as below:</Paragraph>
      <Paragraph position="5"> This is also the case with the sentence (10).</Paragraph>
      <Paragraph position="6"> Here, however, as indicated in Table 4, the sentence (9) is an example of the functional usage of the compound functional expression &amp;quot;qMO(toiu)&amp;quot;, where the sequence of the two morphemes &amp;quot; q(to)&amp;quot; and &amp;quot;MO(iu)&amp;quot; should be identified and chunked into a compound functional expression.</Paragraph>
      <Paragraph position="7"> On the other hand, the sentence (10) is an example of the functional usage of the compound functional expression &amp;quot;qMOww(to-iu-monono)&amp;quot;, where the sequence of the four morphemes &amp;quot; q(to)&amp;quot;, &amp;quot;MO(iu)&amp;quot;, &amp;quot;w(mono)&amp;quot;, and &amp;quot;w(no)&amp;quot; should be identified and chunked into a compound functional expression. Actually, in the result of our preliminary corpus study, at least in about 20% of the occurrences of Japanese compound functional expressions, more than one candidate expression can be detected. This result indicates that it is necessary to consider more than one candidate expression in the task of identifying a Japanese compound functional expression, and also in the task of classifying the functional/content usage of a candidate expression. Thus, in this paper, based on this observation, we formalize the task of identifying Japanese compound functional expressions as a chunking problem, rather than a classification problem.</Paragraph>
    </Section>
    <Section position="2" start_page="28" end_page="28" type="sub_section">
      <SectionTitle>
2.3 Developing an Example Database
</SectionTitle>
      <Paragraph position="0"> We developed an example database of Japanese compound functional expressions, which is used for training/testing a chunker of Japanese compound functional expressions (Tsuchiya et al., 2005). The corpus from which we collect example sentences is 1995 Mainichi newspaper text corpus (1,294,794 sentences, 47,355,330 bytes). For each of the 337 expressions, 50 sentences are collected and chunk labels are annotated according to the following procedure.</Paragraph>
      <Paragraph position="1">  1. The expression is morphologically analyzed by ChaSen, and its morpheme sequence 6 is obtained.</Paragraph>
      <Paragraph position="2"> 2. The corpus is morphologically analyzed by ChaSen, and 50 sentences which include the morpheme sequence of the expression are collected.</Paragraph>
      <Paragraph position="3"> 3. For each sentence, every occurrence of the 337 expressions is annotated with one of the usages functional/content by an annotator  .</Paragraph>
      <Paragraph position="4"> Table 5 classifies the 337 expressions according to the number of sentences collected from the 1995 Mainichi newspaper text corpus. For more than half of the 337 expressions, more than 50 sentences are collected, although about 10% of the 377 expressions do not appear in the whole corpus. Out of those 187 expressions with more than 50 sentences, 52 are those with balanced distribution of the functional/content usages in the newspaper text corpus. Those 52 expressions can be regarded as among the most difficult ones in the task of identifying and classifying functional/content  For those expressions whose constituent has conjugation and the conjugated form also has the same usage as the expression with the original form, the morpheme sequence is expanded so that the expanded morpheme sequences include those with conjugated forms.</Paragraph>
      <Paragraph position="5">  For the most frequent 184 expressions, on the average, the agreement rate between two human annotators is 0.93 and the Kappa value is 0.73, which means allowing tentative conclusions to be drawn (Carletta, 1996; Ng et al., 1999). For 65% of the 184 expressions, the Kappa value is above 0.8, which means good reliability.</Paragraph>
      <Paragraph position="6"> usages. Thus, this paper focuses on those 52 expressions in the training/testing of chunking compound functional expressions. We extract 2,600 sentences (= 52 expressions x 50 sentences) from the whole example database and use them for training/testing the chunker. The number of the morphemes for the 2,600 sentences is 92,899. We ignore the chunk labels for the expressions other than the 52 expressions, resulting in 2,482/701 chunk labels for the functional/content usages, respectively. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML