Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 241–248,
Sydney, July 2006. c©2006 Association for Computational Linguistics
A Feedback-Augmented Method for Detecting Errors in the Writing of
Learners of English
Ryo Nagata
Hyogo University of Teacher Education
6731494, Japan
rnagata@hyogo-u.ac.jp
Atsuo Kawai
Mie University
5148507, Japan
kawai@ai.info.mie-u.ac.jp
Koichiro Morihiro
Hyogo University of Teacher Education
6731494, Japan
mori@hyogo-u.ac.jp
Naoki Isu
Mie University
5148507, Japan
isu@ai.info.mie-u.ac.jp
Abstract
This paper proposes a method for detect-
ing errors in article usage and singular plu-
ral usage based on the mass count distinc-
tion. First, it learns decision lists from
training data generated automatically to
distinguish mass and count nouns. Then,
in order to improve its performance, it is
augmented by feedback that is obtained
from the writing of learners. Finally, it de-
tects errors by applying rules to the mass
count distinction. Experiments show that
it achieves a recall of 0.71 and a preci-
sion of 0.72 and outperforms other meth-
ods used for comparison when augmented
by feedback.
1 Introduction
Although several researchers (Kawai et al., 1984;
McCoy et al., 1996; Schneider and McCoy, 1998;
Tschichold et al., 1997) have shown that rule-
based methods are effective to detecting gram-
matical errors in the writing of learners of En-
glish, it has been pointed out that it is hard to
write rules for detecting errors concerning the ar-
ticles and singular plural usage. To be precise, it
is hard to write rules for distinguishing mass and
count nouns which are particularly important in
detecting these errors (Kawai et al., 1984). The
major reason for this is that whether a noun is a
mass noun or a count noun greatly depends on its
meaning or its surrounding context (refer to Al-
lan (1980) and Bond (2005) for details of the mass
count distinction).
The above errors are very common among
Japanese learners of English (Kawai et al., 1984;
Izumi et al., 2003). This is perhaps because the
Japanese language does not have a mass count dis-
tinction system similar to that of English. Thus, it
is favorable for error detection systems aiming at
Japanese learners to be capable of detecting these
errors. In other words, such systems need to some-
how distinguish mass and count nouns.
This paper proposes a method for distinguishing
mass and count nouns in context to complement
the conventional rules for detecting grammatical
errors. In this method,  rst, training data, which
consist of instances of mass and count nouns, are
automatically generated from a corpus. Then,
decision lists for distinguishing mass and count
nouns are learned from the training data. Finally,
the decision lists are used with the conventional
rules to detect the target errors.
The proposed method requires a corpus to learn
decision lists for distinguishing mass and count
nouns. General corpora such as newspaper ar-
ticles can be used for the purpose. However,
a drawback to it is that there are differences in
character between general corpora and the writ-
ing of non-native learners of English (Granger,
1998; Chodorow and Leacock, 2000). For in-
stance, Chodorow and Leacock (2000) point out
that the word concentrate is usually used as a noun
in a general corpus whereas it is a verb 91% of
the time in essays written by non-native learners
of English. Consequently, the differences affect
the performance of the proposed method.
In order to reduce the drawback, the proposed
method is augmented by feedback; it takes as feed-
back learners’ essays whose errors are corrected
by a teacher of English (hereafter, referred to as
the feedback corpus). In essence, the feedback
corpus could be added to a general corpus to gen-
erate training data. Or, ideally training data could
be generated only from the feedback corpus just as
241
from a general corpus. However, this causes a se-
rious problem in practice since the size of the feed-
back corpus is normally far smaller than that of a
general corpus. To make it practical, this paper
discusses the problem and explores its solution.
The rest of this paper is structured as follows.
Section 2 describes the method for detecting the
target errors based on the mass count distinction.
Section 3 explains how the method is augmented
by feedback. Section 4 discusses experiments con-
ducted to evaluate the proposed method.
2 Method for detecting the target errors
2.1 Generating training data
First, instances of the target noun that head their
noun phrase (NP) are collected from a corpus with
their surrounding words. This can be simply done
by an existing chunker or parser.
Then, the collected instances are tagged with
mass or count by the following tagging rules. For
example, the underlined chicken:
... are a lot of chickens in the roost ...
is tagged as
... are a lot of chickens/count in the roost ...
because it is in plural form.
We have made tagging rules based on linguistic
knowledge (Huddleston and Pullum, 2002). Fig-
ure 1 and Table 1 represent the tagging rules. Fig-
ure 1 shows the framework of the tagging rules.
Each node in Figure 1 represents a question ap-
plied to the instance in question. For example, the
root node reads  Is the instance in question plu-
ral? . Each leaf represents a result of the classi-
 cation. For example, if the answer is yes at the
root node, the instance in question is tagged with
count. Otherwise, the question at the lower node
is applied and so on. The tagging rules do not
classify instances as mass or count in some cases.
These unclassi ed instances are tagged with the
symbol  ? . Unfortunately, they cannot readily be
included in training data. For simplicity of imple-
mentation, they are excluded from training data1.
Note that the tagging rules can be used only for
generating training data. They cannot be used to
distinguish mass and count nouns in the writing
of learners of English for the purpose of detecting
1According to experiments we have conducted, approxi-
mately 30% of instances are tagged with  ? on average. It is
highly possible that performance of the proposed method will
improve if these instances are included in the training data.
the target errors since they are based on the articles
and the distinction between singular and plural.
Finally, the tagged instances are stored in a  le
with their surrounding words. Each line of it con-
sists of one of the tagged instances and its sur-
rounding words as in the above chicken example.
2.2 Learning Decision Lists
In the proposed method, decision lists are used for
distinguishing mass and count nouns. One of the
reasons for the use of decision lists is that they
have been shown to be effective to the word sense
disambiguation task and the mass count distinc-
tion is highly related to word sense as we will see
in this section. Another reason is that rules for dis-
tinguishing mass and count nouns are observable
in decision lists, which helps understand and im-
prove the proposed method.
A decision list consists of a set of rules. Each
rule matches the template as follows:
If a condition is true, then a decisiona0 (1)
To de ne the template in the proposed method,
let us have a look at the following two examples:
1. I read the paper.
2. The paper is made of hemp pulp.
The underlined papers in both sentences cannot
simply be classi ed as mass or count by the tag-
ging rules presented in Section 2.1 because both
are singular and modi ed by the de nite article.
Nevertheless, we can tell that the former is a count
noun and the latter is a mass noun from the con-
texts. This suggests that the mass count distinc-
tion is often determined by words surrounding the
target noun. In example 1, we can tell that the pa-
per refers to something that can be read such as
a newspaper or a scienti c paper from read, and
therefore it is a count noun. Likewise, in exam-
ple 2, we can tell that the paper refers to a certain
substance from made and pulp, and therefore it is
a mass noun.
Taking this observation into account, we de ne
the template based on words surrounding the tar-
get noun. To formalize the template, we will use
a random variable a1a3a2 that takes either a4a6a5a8a7a9a7 or
a10a12a11a9a13a15a14a17a16 to denote that the target noun is a mass noun
or a count noun, respectively. We will also use
a18 and
a2 to denote a word and a certain context
around the target noun, respectively. We de ne
242
a19
a19
a19
a19
a20
a20
a20
a20
a20
a20
a20
a20
a20
a20
a20
a20
a21
a21
a21
a21
a21
a21
a21
a21
a21
a21
a21
a21
yes
yes
yes
yes
no
no
no
no
a19
a21
a21
a21
a21
a21
a21
a20
a20
a20
yes no
COUNT
modi ed by a little?
?
COUNT
MASS
? MASS
plural?
modi ed by one of the words
in Table 1(a)?
modi ed by one of the words
in Table 1(b)?
modi ed by one of the words
in Table 1(c)?
Figure 1: Framework of the tagging rules
Table 1: Words used in the tagging rules
(a) (b) (c)
the inde nite article much the de nite article
another less demonstrative adjectives
one enough possessive adjectives
each suf cient interrogative adjectives
  quanti ers
  ’s genitives
three types of a2 : a14a8a22 , a23a25a24 , and a26a27a24 that denote the
contexts consisting of the noun phrase that the tar-
get noun heads, a24 words to the left of the noun
phrase, and a24 words to its right, respectively. Then
the template is formalized by:
If word a18 appears in context a2 of the target noun,
then it is distinguished as a1a3a2 a0
Hereafter, to keep the notation simple, it will be
abbreviated to
a18a29a28a31a30
a1a3a2
a0 (2)
Now rules that match the template can be ob-
tained from the training data. All we need to do
is to collect words in a2 from the training data.
Here, the words in Table 1 are excluded. Also,
function words (except prepositions), cardinal and
quasi-cardinal numerals, and the target noun are
excluded. All words are reduced to their mor-
phological stem and converted entirely to lower
case when collected. For example, the following
tagged instance:
She ate fried chicken/mass for dinner.
would give a set of rules that match the template:
a32
a5
a16a34a33a15a35a29a30
a4a6a5a8a7a9a7
a36a38a37a40a39a42a41a44a43
a30
a4a6a5a8a7a9a7
a36
a11
a37a46a45
a35 a30
a4a6a5a8a7a46a7
a47a49a48
a14a17a14a50a32
a37a46a45
a35 a30
a4a6a5a8a7a9a7
for the target noun chicken when a24a52a51a54a53 .
In addition, a default rule is de ned. It is based
on the target noun itself and used when no other
applicable rules are found in the decision list for
the target noun. It is de ned by
a16a55a30
a1a56a2
major (3)
where a16 and a1a3a2 major denote the target noun and
the majority of a1a56a2 in the training data, respec-
tively. Equation (3) reads  If the target noun ap-
pears, then it is distinguished by the majority .
The log-likelihood ratio (Yarowsky, 1995) de-
cides in which order rules are applied to the target
noun in novel context. It is de ned by2
a57
a11a46a58
a22a60a59
a1a56a2a62a61
a18 a28a60a63
a22a60a59
a1a56a2a62a61
a18 a28a60a63
(4)
where a1a56a2 is the exclusive event of a1a3a2 and
a22a64a59
a1a3a2a65a61
a18 a28a55a63 is the probability that the target noun
is used as a1a56a2 when a18 appears in the context a2 .
It is important to exercise some care in estimat-
ing a22a64a59 a1a56a2a62a61a18a29a28 a63 . In principle, we could simply
2For the default rule, the log-likelihood ratio is de ned by
replacing a66a50a67 and a68a70a69 with a71 and a68a70a69 major, respectively.
243
count the number of times that a18 appears in the
context a2 of the target noun used as a1a3a2 in the
training data. However, this estimate can be unre-
liable, when a18 does not appear often in the con-
text. To solve this problem, using a smoothing pa-
rameter a72 (Yarowsky, 1996), a22a60a59 a1a56a2a62a61a18 a28a55a63 is esti-
mated by3
a22a60a59
a1a56a2a62a61
a18 a28a60a63
a51
a36
a59a73a18a29a28a75a74
a1a3a2
a63
a26a76a72
a36
a59a73a18 a28a60a63
a26a77a4a70a72
(5)
where a36 a59a73a18 a28a55a63 and a36 a59a73a18 a28 a74 a1a3a2 a63 are occurrences of
a18 appearing in
a2 and those in a2 of the target noun
used as a1a56a2 , respectively. The constant a4 is the
number of possible classes, that is, a4a78a51a3a79 (a4a6a5a80a7a9a7
or a10a12a11a9a13a15a14a17a16 ) in our case, and introduced to satisfy
a22a64a59
a1a3a2a65a61
a18 a28a55a63
a26
a22a64a59
a1a3a2a65a61
a18 a28a81a63
a51a3a82 . In this paper, a72 is
set to 1.
Rules in a decision list are sorted in descending
order by the log-likelihood ratio. They are tested
on the target noun in novel context in this order.
Rules sorted below the default rule are discarded4
because they are never used as we will see in Sec-
tion 2.3.
Table 2 shows part of a decision list for the tar-
get noun chicken that was learned from a subset
of the BNC (British National Corpus) (Burnard,
1995). Note that the rules are divided into two
columns for the purpose of illustration in Table 2;
in practice, they are merged into one.
Table 2: Rules in a decision list
Mass Count
a18 a28 LLR a18 a28 LLR
a22
a48
a32a83a10a84a32 a33a15a35 1.49 a10a12a11a9a13a15a14a17a16 a33a15a35 1.49
a36 a48
a7a46a85
a33a15a35 1.28 a22a38a32a86a10
a24
a45
a35 1.32
a47a49a48
a7a9a85
a33a15a35 1.23 a22
a48
a58
a41a12a43 1.23
a7a46a24
a48
a14
a45
a35 1.23
a37
a13a15a14 a33a15a35 1.23
a7
a32
a37a9a87
a32
a45
a35 1.18 a32a12a58a88a58
a41a44a43 1.18
target noun: chicken, a24a52a51a89a53
LLR (Log-Likelihood Ratio)
On one hand, we associate the words in the left
half with food or cooking. On the other hand,
we associate those in the right half with animals
or birds. From this observation, we can say that
chicken in the sense of an animal or a bird is a
count noun but a mass noun when referring to food
3The probability for the default rule is estimated just as
the log-likelihood ratio for the default rule above.
4It depends on the target noun how many rules are dis-
carded.
or cooking, which agrees with the knowledge pre-
sented in previous work (Ostler and Atkins, 1991).
2.3 Distinguishing mass and count nouns
To distinguish the target noun in novel context,
each rule in the decision list is tested on it in the
sorted order until the  rst applicable one is found.
It is distinguished according to the  rst applicable
one. Ties are broken by the rules below.
It should be noted that rules sorted below the
default rule are never used because the default rule
is always applicable to the target noun. This is the
reason why rules sorted below the default rule are
discarded as mentioned in Section 2.2.
2.4 Detecting the target errors
The target errors are detected by the following
three steps. Rules in each step are examined on
each target noun in the target text.
In the  rst step, any mass noun in plural form is
detected as an error5. If an error is detected in this
step, the rest of the steps are not applied.
In the second step, errors are detected by the
rules described in Table 3. The symbol  a90  in Ta-
ble 3 denotes that the combination of the corre-
sponding row and column is erroneous. For exam-
ple, the  fth row denotes that singular and plural
count nouns modi ed by much are erroneous. The
symbol    denotes that no error can be detected
by the table. If one of the rules in Table 3 is applied
to the target noun, the third step is not applied.
In the third step, errors are detected by the rules
described in Table 4. The symbols  a90  and    
are the same as in Table 3.
In addition, the inde nite article that modi es
other than the head noun is judged to be erroneous
Table 3: Detection rules (i)
Count Mass
Pattern Sing. Pl. Sing.
a91 another, each, one
a92  a90 a90
a91 all, enough, suf cient
a92 a90   
a91 much
a92 a90 a90  
a91 that, this
a92  a90  
a91 few, many, several
a92 a90  a90
a91 these, those
a92 a90  a90
a91 various, numerous
a92 a90  a90
cardinal numbers exc. one a90  a90
5Mass nouns can be used in plural in some cases. How-
ever, they are rare especially in the writing of learners of En-
glish.
244
Table 4: Detection rules (ii)
Singular Plural
a/an the a93 a/an the a93
Mass a90   a90 a90 a90
Count   a90 a90   
(e.g., *an expensive). Likewise, the de nite article
that modi es other than the head noun or adjective
is judged to be erroneous (e.g., *the them). Also,
we have made exceptions to the rules. The follow-
ing combinations are excluded from the detection
in the second and third steps: head nouns modi ed
by interrogative adjectives (e.g., what), possessive
adjectives (e.g., my), ’s genitives,  some ,  any ,
or  no .
3 Feedback-augmented method
As mentioned in Section 1, the proposed method
takes the feedback corpus6 as feedback to improve
its performance. In essence, decision lists could be
learned from a corpus consisting of a general cor-
pus and the feedback corpus. However, since the
size of the feedback corpus is normally far smaller
than that of general corpora, so is the effect of the
feedback corpus on a22a64a59 a1a3a2a65a61a18a94a28 a63 . This means that
the feedback corpus hardly has effect on the per-
formance.
Instead, a22a64a59 a1a3a2a65a61a18 a28a55a63 can be estimated by in-
terpolating the probabilities estimated from the
feedback corpus and the general corpus accord-
ing to con dences of their estimates. It is favor-
able that the interpolated probability approaches
to the probability estimated from the feedback cor-
pus as its con dence increases; the more con dent
its estimate is, the more effect it has on the inter-
polated probability. Here, con dence a10 of ratio a22
is measured by the reciprocal of variance of the
ratio (Tanaka, 1977). Variance is calculated by
a22a64a59
a82a95a23
a22 a63
a14
(6)
where a14 denotes the number of samples used for
calculating the ratio. Therefore, con dence of the
estimate of the conditional probability used in the
proposed method is measured by
a10
a51
a36
a59a73a18a29a28 a63
a22a64a59
a1a56a2a62a61
a18 a28a55a63 a59
a82a95a23
a22a64a59
a1a3a2a65a61
a18 a28a81a63a96a63
a0 (7)
6The feedback corpus refers to learners’ essays whose er-
rors are corrected as mentioned in Section 1.
To formalize the interpolated probability, we
will use the symbols a22a17a97a83a98 , a22a100a99 , a10a84a97a83a98 , and a10a12a99 to de-
note the conditional probabilities estimated from
the feedback corpus and the general corpus, and
their con dences, respectively. Then, the interpo-
lated probability a22a38a101 is estimated by7
a22a15a101
a51
a102
a22a15a99
a26a3a103a105a104a107a106
a103a109a108
a59a110a22a38a97a84a98
a23
a22a15a99 a63 a74 a10a84a97a84a98a95a111a112a10a113a99
a22a38a97a83a98a113a74 a10a84a97a84a98a95a114a112a10 a99
a0 (8)
In Equation (8), the effect of a22a115a97a84a98 on a22a15a101 becomes
large as its con dence increases. It should also be
noted that when its con dence exceeds that of a22 a99 ,
the general corpus is no longer used in the inter-
polated probability.
A problem that arises in Equation (8) is thata22a50a97a84a98
hardly has effect ona22a38a101 when a much larger general
corpus is used than the feedback corpus even ifa22a116a97a84a98
is estimated with a suf cient con dence. For ex-
ample,a22a38a97a83a98 estimated from 100 samples, which are
a relatively large number for estimating a proba-
bility, hardly has effect on a22a117a101 whena22 a99 is estimated
from 10000 samples; roughly, a22a115a97a83a98 has a a82a86a118a80a82a84a119a42a119 ef-
fect of a22 a99 on a22a15a101 .
One way to prevent this is to limit the effect of
a10 a99 to some extent. It can be realized by taking the
log of a10a44a99 in Equation (8). That is, the interpolated
probability is estimated by
a22a15a101
a51
a102
a22 a99
a26a120a103a105a104a96a106
a121a123a122a96a124
a103 a108
a59a110a22a38a97a84a98
a23
a22 a99a46a63 a74a125a10a84a97a84a98a126a111a52a127a129a128a42a130a25a10 a99
a22a38a97a83a98a113a74a125a10a84a97a84a98a126a114a52a127a129a128a42a130a25a10a113a99
a0 (9)
It is arguable what base of the log should be used.
In this paper, it is set to 2 so that the effect ofa22 a99 on
the interpolated probability becomes large when
the con dence of the estimate of the conditional
probability estimated from the feedback corpus is
small (that is, when there is little data in the feed-
back corpus for the estimate)8.
In summary, Equation (9) interpolates between
the conditional probabilities estimated from the
feedback corpus and the general corpus in the
feedback-augmented method. The interpolated
probability is then used to calculate the log-
likelihood ratio. Doing so, the proposed method
takes the feedback corpus as feedback to improve
its performance.
7In general, the interpolated probability needs to be nor-
malized to satisfy a131a133a132a42a134a115a135a137a136 . In our case, however, it is al-
ways satis ed without normalization since a132
a104a96a106a139a138
a68a70a69a141a140a66 a67a15a142a126a143
a132
a104a96a106a139a138
a68a144a69a145a140a66 a67a146a142 a135a133a136 and a132
a108 a138
a68a70a69a141a140a66 a67a15a142a126a143 a132
a108 a138
a68a144a69a141a140a66 a67a15a142 a135a133a136
are satis ed.
8We tested several bases in the experiments and found
there were little difference in performance between them.
245
4 Experiments
4.1 Experimental Conditions
A set of essays9 written by Japanese learners of
English was used as the target essays in the exper-
iments. It consisted of 47 essays (3180 words) on
the topic traveling. A native speaker of English
who was a professional rewriter of English recog-
nized 105 target errors in it.
The written part of the British National Corpus
(BNC) (Burnard, 1995) was used to learn deci-
sion lists. Sentences the OAK system10, which
was used to extract NPs from the corpus, failed
to analyze were excluded. After these operations,
the size of the corpus approximately amounted to
80 million words. Hereafter, the corpus will be
referred to as the BNC.
As another corpus, the English concept explica-
tion in the EDR English-Japanese Bilingual dic-
tionary and the EDR corpus (1993) were used; it
will be referred to as the EDR corpus, hereafter.
Its size amounted to about 3 million words.
Performance of the proposed method was eval-
uated by recall and precision. Recall is de ned by
No. of target errors detected correctly
No. of target errors in the target essays a0 (10)
Precision is de ned by
No. of target errors detected correctly
No. of detected errors a0 (11)
4.2 Experimental Procedures
First, decision lists for each target noun in the tar-
get essays were learned from the BNC11. To ex-
tract noun phrases and their head nouns, the OAK
system was used. An optimal value for a24 (window
size of context) was estimated as follows. For 25
nouns shown in (Huddleston and Pullum, 2002) as
examples of nouns used as both mass and count
nouns, accuracy on the BNC was calculated us-
ing ten-fold cross validation. As a result of set-
ting small (a24a77a51a147a53 ), medium (a24a77a51a78a82a84a119 ), and large
(a24a77a51a149a148a40a119 ) window sizes, it turned out that a24a150a51a151a53
maximized the average accuracy. Following this
result, a24a65a51a89a53 was selected in the experiments.
Second, the target nouns were distinguished
whether they were mass or count by the learned
9http://www.eng.ritsumei.ac.jp/lcorpus/.
10OAK System Homepage: http://nlp.cs.nyu.edu/oak/.
11If no instance of the target noun is found in the gen-
eral corpora (and also in the feedback corpus in case of the
feedback-augmented method), the target noun is ignored in
the error detection procedure.
decision lists, and then the target errors were de-
tected by applying the detection rules to the mass
count distinction. As a preprocessing, spelling er-
rors were corrected using a spell checker. The re-
sults of the detection were compared to those done
by the native-speaker of English. From the com-
parison, recall and precision were calculated.
Then, the feedback-augmented method was
evaluated on the same target essays. Each target
essay in turn was left out, and all the remaining
target essays were used as a feedback corpus. The
target errors in the left-out essay were detected us-
ing the feedback-augmented method. The results
of all 47 detections were integrated into one to cal-
culate overall performance. This way of feedback
can be regarded as that one uses revised essays
previously written in a class to detect errors in es-
says on the same topic written in other classes.
Finally, the above two methods were compared
with their seven variants shown in Table 5.  DL 
in Table 5 refers to the nine decision list based
methods (the above two methods and their seven
variants). The words in brackets denote the cor-
pora used to learn decision lists; the symbol  +FB 
means that the feedback corpus was simply added
to the general corpus. The subscripts a36a17a152a42a153 and
a36a17a152a44a154 indicate that the feedback was done by using
Equation (8) and Equation (9), respectively.
In addition to the seven variants, two kinds of
earlier method were used for comparison. One
was one (Kawai et al., 1984) of the rule-based
methods. It judges singular head nouns with no
determiner to be erroneous since missing articles
are most common in the writing of Japanese learn-
ers of English. In the experiments, this was imple-
mented by treating all nouns as count nouns and
applying the same detection rules as in the pro-
posed method to the countability.
The other was a web-based method (Lapata and
Keller, 2005)12 for generating articles. It retrieves
web counts for queries consisting of two words
preceding the NP that the target noun head, one
of the articles (a91 a/an, the, a93a17a92 ), and the core NP
to generate articles. All queries are performed as
exact matches using quotation marks and submit-
ted to the Google search engine in lower case. For
example, in the case of  *She is good student. , it
retrieves web counts for  she is a good student ,
12There are other statistical methods that can be used for
comparison including Lee (2004) and Minnen (2000). Lapata
and Keller (2005) report that the web-based method is the
best performing article generation method.
246
 she is the good student , and  she is good stu-
dent . Then, it generates the article that maxi-
mizes the web counts. We extended it to make
it capable of detecting our target errors. First, the
singular/plural distinction was taken into account
in the queries (e.g.,  she is a good students ,  she
is the good students , and  she is good students 
in addition to the above three queries). The one(s)
that maximized the web counts was judged to be
correct; the rest were judged to be erroneous. Sec-
ond, if determiners other than the articles modify
head nouns, only the distinction between singu-
lar and plural was taken into account (e.g.,  he
has some book vs  he has some books ). In the
case of  much/many , the target noun in singular
form modi ed by  much and that in plural form
modi ed by  many were compared (e.g.,  he has
much furniture vs  he has many furnitures). Fi-
nally, some rules were used to detect literal errors.
For example, plural head nouns modi ed by  this 
were judged to be erroneous.
4.3 Experimental Results and Discussion
Table 5 shows the experimental results.  Rule-
based and  Web-based in Table 5 refer to the
rule-based method and the web-based method, re-
spectively. The other symbols are as already ex-
plained in Section 4.2.
As we can see from Table 5, all the decision
list based methods outperform the earlier methods.
The rule-based method treated all nouns as count
nouns, and thus it did not work well at all on mass
nouns. This caused a lot of false-positives and
false-negatives. The web-based method suffered
a lot from other errors than the target errors since
Table 5: Experimental results
Method Recall Precision
DL (BNC) 0.66 0.65
DL (BNC+FB) 0.66 0.65
DLa97a84a98 a153 (BNC) 0.66 0.65
DLa97a84a98 a154 (BNC) 0.69 0.70
DL (EDR) 0.70 0.68
DL (EDR+FB) 0.71 0.69
DLa97a84a98 a153 (EDR) 0.71 0.70
DLa97a84a98 a154 (EDR) 0.71 0.72
DL (FB) 0.43 0.76
Rule-based 0.59 0.39
Web-based 0.49 0.53
it implicitly assumed that there were no errors ex-
cept the target errors. Contrary to this assumption,
not only did the target essays contain the target er-
rors but also other errors since they were written
by Japanese learners of English. This indicate that
the queries often contained the other errors when
web counts were retrieved. These errors made the
web counts useless, and thus it did not perform
well. By contrast, the decision list based meth-
ods did because they distinguished mass and count
nouns by one of the words around the target noun
that was most likely to be effective according to
the log-likelihood ratio13; the best performing de-
cision list based method (DLa97a84a98 a154 (EDR)) is sig-
ni cantly superior to the best performing14 non-
decision list based method (Web-based) in both re-
call and precision at the 99% con dence level.
Table 5 also shows that the feedback-augmented
methods bene t from feedback. Only an exception
is  DLa97a84a98 a153 (BNC) . The reason is that the size of
BNC is far larger than that of the feedback cor-
pus and thus it did not affect the performance.
This also explains that simply adding the feed-
back corpus to the general corpus achieved little
or no improvement as  DL (EDR+FB) and  DL
(BNC+FB) show. Unlike these, both  DLa97a84a98 a154
(BNC) and  DLa97a84a98 a154 (EDR) bene t from feed-
back since the effect of the general corpus is lim-
ited to some extent by the log function in Equa-
tion (9). Because of this, both bene t from feed-
back despite the differences in size between the
feedback corpus and the general corpus.
Although the experimental results have shown
that the feedback-augmented method is effective
to detecting the target errors in the writing of
Japanese learners of English, even the best per-
forming method (DLa97a84a98 a154 (EDR)) made 30 false-
negatives and 29 false-positives. About 70% of
the false-negatives were errors that required other
sources of information than the mass count dis-
tinction to be detected. For example, extra def-
inite articles (e.g., *the traveling) cannot be de-
tected even if the correct mass count distinction is
given. Thus, only a little improvement is expected
in recall however much feedback corpus data be-
come available. On the other hand, most of the
13Indeed, words around the target noun were effective. The
default rules were used about 60% and 30% of the time in
 DL (EDR) and  DL (BNC) , respectively; when only the
default rules were used,  DL (EDR) ( DL (BNC) ) achieved
0.66 (0.56) in recall and 0.58 (0.53) in precision.
14 Best performing here means best performing in terms
of a155 -measure.
247
false-positives were due to the decision lists them-
selves. Considering this, it is highly possible that
precision will improve as the size of the feedback
corpus increases.
5 Conclusions
This paper has proposed a feedback-augmented
method for distinguishing mass and count nouns
to complement the conventional rules for detect-
ing grammatical errors. The experiments have
shown that the proposed method detected 71% of
the target errors in the writing of Japanese learn-
ers of English with a precision of 72% when it
was augmented by feedback. From the results,
we conclude that the feedback-augmented method
is effective to detecting errors concerning the ar-
ticles and singular plural usage in the writing of
Japanese learners of English.
Although it is not taken into account in this pa-
per, the feedback corpus contains further useful in-
formation. For example, we can obtain training
data consisting of instances of errors by compar-
ing the feedback corpus with its original corpus.
Also, comparing it with the results of detection,
we can know performance of each rule used in
the detection, which make it possible to increase
or decrease their log-likelihood ratios according to
their performance. We will investigate how to ex-
ploit these sources of information in future work.
Acknowledgments
The authors would like to thank Sekine Satoshi
who has developed the OAK System. The authors
also would like to thank three anonymous review-
ers for their useful comments on this paper.
References
K. Allan. 1980. Nouns and countability. J. Linguistic
Society of America, 56(3):541 567.
F. Bond. 2005. Translating the Untranslatable. CSLI
publications, Stanford.
L. Burnard. 1995. Users Reference Guide for the
British National Corpus. version 1.0. Oxford Uni-
versity Computing Services, Oxford.
M. Chodorow and C. Leacock. 2000. An unsupervised
method for detecting grammatical errors. In Proc. of
1st Meeting of the North America Chapter of ACL,
pages 140 147.
Japan electronic dictionary research institute ltd. 1993.
EDR electronic dictionary speci cations guide.
Japan electronic dictionary research institute ltd,
Tokyo.
S. Granger. 1998. Prefabricated patterns in advanced
EFL writing: collocations and formulae. In A. P.
Cowie, editor, Phraseology: theory, analysis, and
applications, pages 145 160. Clarendon Press.
R. Huddleston and G.K. Pullum. 2002. The Cam-
bridge Grammar of the English Language. Cam-
bridge University Press, Cambridge.
E. Izumi, K. Uchimoto, T. Saiga, T. Supnithi, and
H. Isahara. 2003. Automatic error detection in the
Japanese learners’ English spoken data. In Proc. of
41st Annual Meeting of ACL, pages 145 148.
A. Kawai, K. Sugihara, and N. Sugie. 1984. ASPEC-I:
An error detection system for English composition.
IPSJ Journal (in Japanese), 25(6):1072 1079.
M. Lapata and F. Keller. 2005. Web-based models for
natural language processing. ACM Transactions on
Speech and Language Processing, 2(1):1 31.
J. Lee. 2004. Automatic article restoration. In Proc. of
the Human Language Technology Conference of the
North American Chapter of ACL, pages 31 36.
K.F. McCoy, C.A. Pennington, and L.Z. Suri. 1996.
English error correction: A syntactic user model
based on principled  mal-rule scoring. In Proc.
of 5th International Conference on User Modeling,
pages 69 66.
G. Minnen, F. Bond, and A. Copestake. 2000.
Memory-based learning for article generation. In
Proc. of CoNLL-2000 and LLL-2000 workshop,
pages 43 48.
N. Ostler and B.T.S Atkins. 1991. Predictable mean-
ing shift: Some linguistic properties of lexical impli-
cation rules. In Proc. of 1st SIGLEX Workshop on
Lexical Semantics and Knowledge Representation,
pages 87 100.
D. Schneider and K.F. McCoy. 1998. Recognizing
syntactic errors in the writing of second language
learners. In Proc. of 17th International Conference
on Computational Linguistics, pages 1198 1205.
Y. Tanaka. 1977. Psychological methods (in
Japanese). University of Tokyo Press.
C. Tschichold, F. Bodmer, E. Cornu, F. Grosjean,
L. Grosjean, N. Ka156ubler, N. La157ewy, and C. Tschumi.
1997. Developing a new grammar checker for En-
glish as a second language. In Proc. of the From Re-
search to Commercial Applications Workshop, pages
7 12.
D. Yarowsky. 1995. Unsupervised word sense disam-
biguation rivaling supervised methods. In Proc. of
33rd Annual Meeting of ACL, pages 189 196.
D. Yarowsky. 1996. Homograph Disambiguation in
Speech Synthesis. Springer-Verlag.
248
