XML Viewer - p06-1031

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1031_metho.xml
Size: 16,114 bytes
Last Modified: 2025-10-06 14:10:18
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1031">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English</Title>
  <Section position="4" start_page="241" end_page="242" type="metho">
    <SectionTitle>
2 Method for detecting the target errors
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="241" end_page="241" type="sub_section">
      <SectionTitle>
2.1 Generating training data
</SectionTitle>
      <Paragraph position="0"> First, instances of the target noun that head their noun phrase (NP) are collected from a corpus with their surrounding words. This can be simply done by an existing chunker or parser.</Paragraph>
      <Paragraph position="1"> Then, the collected instances are tagged with mass or count by the following tagging rules. For example, the underlined chicken: ... are a lot of chickens in the roost ...</Paragraph>
      <Paragraph position="2"> is tagged as ... are a lot of chickens/count in the roost ... because it is in plural form.</Paragraph>
      <Paragraph position="3"> We have made tagging rules based on linguistic knowledge (Huddleston and Pullum, 2002). Figure 1 and Table 1 represent the tagging rules. Figure 1 shows the framework of the tagging rules. Each node in Figure 1 represents a question applied to the instance in question. For example, the root node reads Is the instance in question plural? . Each leaf represents a result of the classication. For example, if the answer is yes at the root node, the instance in question is tagged with count. Otherwise, the question at the lower node is applied and so on. The tagging rules do not classify instances as mass or count in some cases. These unclassi ed instances are tagged with the symbol ? . Unfortunately, they cannot readily be included in training data. For simplicity of implementation, they are excluded from training data1. Note that the tagging rules can be used only for generating training data. They cannot be used to distinguish mass and count nouns in the writing of learners of English for the purpose of detecting 1According to experiments we have conducted, approximately 30% of instances are tagged with ? on average. It is highly possible that performance of the proposed method will improve if these instances are included in the training data. the target errors since they are based on the articles and the distinction between singular and plural.</Paragraph>
      <Paragraph position="4"> Finally, the tagged instances are stored in a le with their surrounding words. Each line of it consists of one of the tagged instances and its surrounding words as in the above chicken example.</Paragraph>
    </Section>
    <Section position="2" start_page="241" end_page="242" type="sub_section">
      <SectionTitle>
2.2 Learning Decision Lists
</SectionTitle>
      <Paragraph position="0"> In the proposed method, decision lists are used for distinguishing mass and count nouns. One of the reasons for the use of decision lists is that they have been shown to be effective to the word sense disambiguation task and the mass count distinction is highly related to word sense as we will see in this section. Another reason is that rules for distinguishing mass and count nouns are observable in decision lists, which helps understand and improve the proposed method.</Paragraph>
      <Paragraph position="1"> A decision list consists of a set of rules. Each rule matches the template as follows: If a condition is true, then a decisiona0 (1) To de ne the template in the proposed method, let us have a look at the following two examples:  1. I read the paper.</Paragraph>
      <Paragraph position="2"> 2. The paper is made of hemp pulp.</Paragraph>
      <Paragraph position="3">  The underlined papers in both sentences cannot simply be classi ed as mass or count by the tagging rules presented in Section 2.1 because both are singular and modi ed by the de nite article. Nevertheless, we can tell that the former is a count noun and the latter is a mass noun from the contexts. This suggests that the mass count distinction is often determined by words surrounding the target noun. In example 1, we can tell that the paper refers to something that can be read such as a newspaper or a scienti c paper from read, and therefore it is a count noun. Likewise, in example 2, we can tell that the paper refers to a certain substance from made and pulp, and therefore it is a mass noun.</Paragraph>
      <Paragraph position="4"> Taking this observation into account, we de ne the template based on words surrounding the target noun. To formalize the template, we will use a random variable a1a3a2 that takes either a4a6a5a8a7a9a7 or a10a12a11a9a13a15a14a17a16 to denote that the target noun is a mass noun or a count noun, respectively. We will also use a18 and a2 to denote a word and a certain context around the target noun, respectively. We de ne</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="242" end_page="244" type="metho">
    <SectionTitle>
? MASS
</SectionTitle>
    <Paragraph position="0"> plural? modi ed by one of the words in Table 1(a)? modi ed by one of the words in Table 1(b)? modi ed by one of the words in Table 1(c)?  the inde nite article much the de nite article another less demonstrative adjectives one enough possessive adjectives each suf cient interrogative adjectives quanti ers 's genitives three types of a2 : a14a8a22 , a23a25a24 , and a26a27a24 that denote the contexts consisting of the noun phrase that the target noun heads, a24 words to the left of the noun phrase, and a24 words to its right, respectively. Then the template is formalized by: If word a18 appears in context a2 of the target noun, then it is distinguished as a1a3a2 a0 Hereafter, to keep the notation simple, it will be abbreviated to</Paragraph>
    <Paragraph position="2"> Now rules that match the template can be obtained from the training data. All we need to do is to collect words in a2 from the training data.</Paragraph>
    <Paragraph position="3"> Here, the words in Table 1 are excluded. Also, function words (except prepositions), cardinal and quasi-cardinal numerals, and the target noun are excluded. All words are reduced to their morphological stem and converted entirely to lower case when collected. For example, the following tagged instance: She ate fried chicken/mass for dinner.</Paragraph>
    <Paragraph position="4"> would give a set of rules that match the template:</Paragraph>
    <Paragraph position="6"> for the target noun chicken when a24a52a51a54a53 .</Paragraph>
    <Paragraph position="7"> In addition, a default rule is de ned. It is based on the target noun itself and used when no other applicable rules are found in the decision list for the target noun. It is de ned by</Paragraph>
    <Paragraph position="9"> where a16 and a1a3a2 major denote the target noun and the majority of a1a56a2 in the training data, respectively. Equation (3) reads If the target noun appears, then it is distinguished by the majority .</Paragraph>
    <Paragraph position="10"> The log-likelihood ratio (Yarowsky, 1995) decides in which order rules are applied to the target noun in novel context. It is de ned by2</Paragraph>
    <Paragraph position="12"> a18 a28a55a63 is the probability that the target noun is used as a1a56a2 when a18 appears in the context a2 .</Paragraph>
    <Paragraph position="13"> It is important to exercise some care in estimating a22a64a59 a1a56a2a62a61a18a29a28 a63 . In principle, we could simply 2For the default rule, the log-likelihood ratio is de ned by replacing a66a50a67 and a68a70a69 with a71 and a68a70a69 major, respectively.  count the number of times that a18 appears in the context a2 of the target noun used as a1a3a2 in the training data. However, this estimate can be unreliable, when a18 does not appear often in the context. To solve this problem, using a smoothing pa-</Paragraph>
    <Paragraph position="15"> where a36 a59a73a18 a28a55a63 and a36 a59a73a18 a28 a74 a1a3a2 a63 are occurrences of a18 appearing in a2 and those in a2 of the target noun used as a1a56a2 , respectively. The constant a4 is the number of possible classes, that is, a4a78a51a3a79 (a4a6a5a80a7a9a7 or a10a12a11a9a13a15a14a17a16 ) in our case, and introduced to satisfy</Paragraph>
    <Paragraph position="17"> set to 1.</Paragraph>
    <Paragraph position="18"> Rules in a decision list are sorted in descending order by the log-likelihood ratio. They are tested on the target noun in novel context in this order. Rules sorted below the default rule are discarded4 because they are never used as we will see in Section 2.3.</Paragraph>
    <Paragraph position="19"> Table 2 shows part of a decision list for the target noun chicken that was learned from a subset of the BNC (British National Corpus) (Burnard, 1995). Note that the rules are divided into two columns for the purpose of illustration in Table 2; in practice, they are merged into one.</Paragraph>
    <Paragraph position="20">  On one hand, we associate the words in the left half with food or cooking. On the other hand, we associate those in the right half with animals or birds. From this observation, we can say that chicken in the sense of an animal or a bird is a count noun but a mass noun when referring to food  carded.</Paragraph>
    <Paragraph position="21"> or cooking, which agrees with the knowledge presented in previous work (Ostler and Atkins, 1991).</Paragraph>
    <Section position="1" start_page="243" end_page="243" type="sub_section">
      <SectionTitle>
2.3 Distinguishing mass and count nouns
</SectionTitle>
      <Paragraph position="0"> To distinguish the target noun in novel context, each rule in the decision list is tested on it in the sorted order until the rst applicable one is found.</Paragraph>
      <Paragraph position="1"> It is distinguished according to the rst applicable one. Ties are broken by the rules below.</Paragraph>
      <Paragraph position="2"> It should be noted that rules sorted below the default rule are never used because the default rule is always applicable to the target noun. This is the reason why rules sorted below the default rule are discarded as mentioned in Section 2.2.</Paragraph>
    </Section>
    <Section position="2" start_page="243" end_page="244" type="sub_section">
      <SectionTitle>
2.4 Detecting the target errors
</SectionTitle>
      <Paragraph position="0"> The target errors are detected by the following three steps. Rules in each step are examined on each target noun in the target text.</Paragraph>
      <Paragraph position="1"> In the rst step, any mass noun in plural form is detected as an error5. If an error is detected in this step, the rest of the steps are not applied.</Paragraph>
      <Paragraph position="2"> In the second step, errors are detected by the rules described in Table 3. The symbol a90 in Table 3 denotes that the combination of the corresponding row and column is erroneous. For example, the fth row denotes that singular and plural count nouns modi ed by much are erroneous. The symbol denotes that no error can be detected by the table. If one of the rules in Table 3 is applied to the target noun, the third step is not applied.</Paragraph>
      <Paragraph position="3"> In the third step, errors are detected by the rules described in Table 4. The symbols a90 and are the same as in Table 3.</Paragraph>
      <Paragraph position="4"> In addition, the inde nite article that modi es other than the head noun is judged to be erroneous</Paragraph>
      <Paragraph position="6"> cardinal numbers exc. one a90 a90 5Mass nouns can be used in plural in some cases. However, they are rare especially in the writing of learners of English. null  (e.g., *an expensive). Likewise, the de nite article that modi es other than the head noun or adjective is judged to be erroneous (e.g., *the them). Also, we have made exceptions to the rules. The following combinations are excluded from the detection in the second and third steps: head nouns modi ed by interrogative adjectives (e.g., what), possessive adjectives (e.g., my), 's genitives, some , any , or no .</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="244" end_page="245" type="metho">
    <SectionTitle>
3 Feedback-augmented method
</SectionTitle>
    <Paragraph position="0"> As mentioned in Section 1, the proposed method takes the feedback corpus6 as feedback to improve its performance. In essence, decision lists could be learned from a corpus consisting of a general corpus and the feedback corpus. However, since the size of the feedback corpus is normally far smaller than that of general corpora, so is the effect of the feedback corpus on a22a64a59 a1a3a2a65a61a18a94a28 a63 . This means that the feedback corpus hardly has effect on the performance. null Instead, a22a64a59 a1a3a2a65a61a18 a28a55a63 can be estimated by interpolating the probabilities estimated from the feedback corpus and the general corpus according to con dences of their estimates. It is favorable that the interpolated probability approaches to the probability estimated from the feedback corpus as its con dence increases; the more con dent its estimate is, the more effect it has on the interpolated probability. Here, con dence a10 of ratio a22 is measured by the reciprocal of variance of the ratio (Tanaka, 1977). Variance is calculated by</Paragraph>
    <Paragraph position="2"> where a14 denotes the number of samples used for calculating the ratio. Therefore, con dence of the estimate of the conditional probability used in the proposed method is measured by</Paragraph>
    <Paragraph position="4"> rors are corrected as mentioned in Section 1.</Paragraph>
    <Paragraph position="5"> To formalize the interpolated probability, we will use the symbols a22a17a97a83a98 , a22a100a99 , a10a84a97a83a98 , and a10a12a99 to denote the conditional probabilities estimated from the feedback corpus and the general corpus, and their con dences, respectively. Then, the interpolated probability a22a38a101 is estimated by7</Paragraph>
    <Paragraph position="7"> In Equation (8), the effect of a22a115a97a84a98 on a22a15a101 becomes large as its con dence increases. It should also be noted that when its con dence exceeds that of a22 a99 , the general corpus is no longer used in the interpolated probability.</Paragraph>
    <Paragraph position="8"> A problem that arises in Equation (8) is thata22a50a97a84a98 hardly has effect ona22a38a101 when a much larger general corpus is used than the feedback corpus even ifa22a116a97a84a98 is estimated with a suf cient con dence. For example,a22a38a97a83a98 estimated from 100 samples, which are a relatively large number for estimating a probability, hardly has effect on a22a117a101 whena22 a99 is estimated from 10000 samples; roughly, a22a115a97a83a98 has a a82a86a118a80a82a84a119a42a119 effect of a22 a99 on a22a15a101 .</Paragraph>
    <Paragraph position="9"> One way to prevent this is to limit the effect of a10 a99 to some extent. It can be realized by taking the log of a10a44a99 in Equation (8). That is, the interpolated probability is estimated by</Paragraph>
    <Paragraph position="11"> It is arguable what base of the log should be used.</Paragraph>
    <Paragraph position="12"> In this paper, it is set to 2 so that the effect ofa22 a99 on the interpolated probability becomes large when the con dence of the estimate of the conditional probability estimated from the feedback corpus is small (that is, when there is little data in the feed-back corpus for the estimate)8.</Paragraph>
    <Paragraph position="13"> In summary, Equation (9) interpolates between the conditional probabilities estimated from the feedback corpus and the general corpus in the feedback-augmented method. The interpolated probability is then used to calculate the log-likelihood ratio. Doing so, the proposed method takes the feedback corpus as feedback to improve its performance.</Paragraph>
    <Paragraph position="14"> 7In general, the interpolated probability needs to be normalized to satisfy a131a133a132a42a134a115a135a137a136 . In our case, however, it is always satis ed without normalization since a132</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML