File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2034_metho.xml

Size: 9,975 bytes

Last Modified: 2025-10-06 14:10:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2034">
  <Title>Using Phrasal Patterns to Identify Discourse Relations</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Discourse Relation Definitions
</SectionTitle>
    <Paragraph position="0"> There have been many definitions of discourse relation, for example (Wolf 2005) and (Ichikawa 1987) in Japanese. We basically used Ichikawa's classes and categorized 167 cue phrases in the ChaSen dictionary (IPADIC, Ver.2.7.0), as shown in Table 1. Ambiguous cue phrases were categorized into multiple classes. There are 7 classes, but the OTHER class will be ignored in the following experiment, as its frequency is very small.</Paragraph>
    <Paragraph position="1">  by the way, incidentally, and now, meanwhile, well  EXAMPLE for example, for instance 1.5 OTHER most of all, in general 0.2</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="133" type="metho">
    <SectionTitle>
3 Identification using Lexical Information
</SectionTitle>
    <Paragraph position="0"> The system has two components; one is to identify the discourse relation using lexical information, described in this section, and the other is to identify it using phrasal patterns, described in the next section.</Paragraph>
    <Paragraph position="1"> A pair of words in two consecutive sentences can be a clue to identify the discourse relation of those sentences. For example, the CONTRAST relation may hold between two sentences which  have antonyms, such as &amp;quot;ideal&amp;quot; and &amp;quot;reality&amp;quot; in Example 1. Also, the EXAMPLE relation may hold when the second sentence has hyponyms of a word in the first sentence. For example, &amp;quot;gift shop&amp;quot;, &amp;quot;department store&amp;quot;, and &amp;quot;supermarket&amp;quot; are hyponyms of &amp;quot;store&amp;quot; in Example 2.</Paragraph>
    <Paragraph position="2"> Ex1) a. It is ideal that people all over the world accept independence and associate on an equal footing with each other.</Paragraph>
    <Paragraph position="3"> b. (However,) Reality is not that simple.</Paragraph>
    <Paragraph position="4"> Ex2) a. Every town has many stores.</Paragraph>
    <Paragraph position="5"> b. (For example,) Gift shops, department stores, and supermarkets are the main stores.</Paragraph>
    <Paragraph position="6"> In our experiment, we used a corpus from the Web (about 20G of text) and 38 years of newspapers. We extracted pairs of sentences in which an unambiguous discourse cue phrase appears at the beginning of the second sentence. We extracted about 1,300,000 sentence pairs from the Web and about 150,000 pairs from newspapers. 300 pairs (50 of each discourse relation) were set aside as a test corpus.</Paragraph>
    <Section position="1" start_page="133" end_page="133" type="sub_section">
      <SectionTitle>
3.1 Extracting Word Pairs
</SectionTitle>
      <Paragraph position="0"> Word pairs are extracted from two sentences; i.e.</Paragraph>
      <Paragraph position="1"> one word from each sentence. In order to reduce noise, the words are restricted to common nouns, verbal nouns, verbs, and adjectives. Also, the word pairs are restricted to particular kinds of POS combinations in order to reduce the impact of word pairs which are not expected to be useful in discourse relation identification. We confined the combinations to the pairs involving the same part of speech and those between verb and adjective, and between verb and verbal noun.</Paragraph>
      <Paragraph position="2"> All of the extracted word pairs are used in base form. In addition, each word is annotated with a positive or negative label. If a phrase segment includes negative words like &amp;quot;not&amp;quot;, the words in the same segment are annotated with a negative label. Otherwise, words are annotated with a positive label. We don't consider double negatives.</Paragraph>
      <Paragraph position="3"> In Example 1-b, &amp;quot;simple&amp;quot; is annotated with a negative, as it includes &amp;quot;not&amp;quot; in the same segment.</Paragraph>
    </Section>
    <Section position="2" start_page="133" end_page="133" type="sub_section">
      <SectionTitle>
3.2 Score Calculation
</SectionTitle>
      <Paragraph position="0"> All possible word pairs are extracted from the sentence pairs and the frequencies of pairs are counted for each discourse relation. For a new (test) sentence pair, two types of score are calculated for each discourse relation based on all of the word pairs found in the two sentences. The scores are given by formulas (1) and (2). Here Freq(dr, wp) is the frequency of word pair (wp) in the discourse relation (dr). Score  is the fraction of the given discourse relation among all the word pairs in the sentences. Score  incorporates an adjustment based on the rate (Rate</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="133" end_page="134" type="metho">
    <SectionTitle>
DR
</SectionTitle>
    <Paragraph position="0"> ) of the discourse relation in the corpus, i.e. the third column in Table 1. The score actually compares the ratio of a discourse relation in the particular word pairs against the ratio in the entire corpus. It helps the low frequency discourse relations get  We can sometimes identify the discourse relation between two sentences from fragments of the two sentences. For example, the CONTRAST relation is likely to hold between the pair of fragments &amp;quot;... should have done ....&amp;quot; and &amp;quot;... did ....&amp;quot;, and the EXAMPLE relation is likely to hold between the pair of fragments &amp;quot;There is...&amp;quot; and &amp;quot;Those are ... and so on.&amp;quot;. Here &amp;quot;...&amp;quot; represents any sequence of words. The above examples indicate that the discourse relation between two sentences can be recognized using fragments of the sentences even if there are no clues based on the sort of content words involved in the word pairs. Accumulating such fragments in Japanese, we observe that these fragments actually form a phrasal pattern. A phrase (bunsetsu) in Japanese is a basic component of sentences, and consists of one or more content words and zero or more function words. We  specify that a phrasal pattern contain at least three subphrases, with at least one from each sentence.</Paragraph>
    <Paragraph position="1"> Each subphrase contains the function words of the phrase, and may also include accompanying content words. We describe the method to create patterns in three steps using an example sentence pair (Example 3) which actually has the  Noun modifiers using &amp;quot;no&amp;quot; (a typical particle for a noun modifier) are excised from the sentences, as they are generally not useful to identify a discourse relation. For example, in the compound phrase &amp;quot;kanozyo-no (her) kokoro (mind)&amp;quot; in Example 3, the first phrase (her), which just modifies a noun (mind), is excised. Also, all of the phrases which modify excised phrases, and all but the last phrase in a conjunctive clause are excised.</Paragraph>
    <Paragraph position="2"> 2) Restricting phrasal pattern In order to avoid meaningless phrases, we restrict the phrase participants to components matching the following regular expression pattern. Here, noun-x means all types of nouns except common nouns, i.e. verbal nouns, proper nouns, pronouns, etc.</Paragraph>
    <Paragraph position="4"> 3) Combining phrases and selecting words in a phrase All possible combinations of phrases including at least one phrase from each sentence and at least three phrases in total are extracted from a pair of sentences in order to build up phrasal patterns. For each phrase which satisfies the regular expression in 2), the subphrases to be used in phrasal patterns are selected based on the following four criteria (A to D). In each criterion, a sample of the result pattern (using all the phrases in Example 3) is expressed in bold face. Note that it is quite difficult to translate those patterns into English as many function words in Japanese are encoded as a position in English. We hope readers understand the procedure intuitively.</Paragraph>
    <Paragraph position="5"> A) Use all components in each phrase kanojo-no kokoro-ni donna omoi-ga at-ta-ka-ha wakara-nai. sore-ha totemo yuuki-ga iru koto-dat-ta-ni-chigai-nai. B) Remove verbal noun and proper noun kanojo-no kokoro-ni donna omoi-ga at-ta-ka-ha wakara-nai. sore-ha totemo yuuki-ga iru koto-dat-ta-ni-chigai-nai. C) In addition, remove verb and adjective kanojo-no kokoro-ni donna omoi-ga at-ta-ka-ha wakara-nai. sore-ha totemo yuuki-ga iru koto-dat-ta-ni-chigai-nai. D) In addition, remove adverb and remaining noun kanojo-no kokoro-ni donna omoi-ga at-ta-ka-ha wakara-nai. sore-ha totemo yuuki-ga iru koto-dat-ta-ni-chigai-nai.</Paragraph>
    <Section position="1" start_page="134" end_page="134" type="sub_section">
      <SectionTitle>
4.1 Score Calculation
</SectionTitle>
      <Paragraph position="0"> By taking combinations of 3 or more subphrases produced as described above, 348 distinct patterns can be created for the sentences in Example 3; all of them are counted with frequency 1 for the CONTRAST relation. Like the score calculation using lexical information, we count the frequency of patterns for each discourse relation over the entire corpus. Patterns appearing more than 1000 times are not used, as those are found not useful to distinguish discourse relations.</Paragraph>
      <Paragraph position="1"> The scores are calculated replacing Freq(dr, wp) in formulas (1) and (2) by Freq(dr, pp). Here, pp is a phrasal pattern and Freq(dr, pp) is the number of times discourse relation dr connects sentences for which phrasal pattern pp is matched.</Paragraph>
      <Paragraph position="2"> These scores will be called Score</Paragraph>
    </Section>
    <Section position="2" start_page="134" end_page="134" type="sub_section">
      <SectionTitle>
and Score
</SectionTitle>
      <Paragraph position="0"> , respectively.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML