Corpus-Based Linguistic Indicators for Aspectual Classification 
Eric V. Siegel 
Department of Computer Science 
Columbia University 
New York, NY 10027 
Abstract 
Fourteen indicators that measure the frequency 
of lexico-syntactic phenomena linguistically re- 
lated to aspectual class are applied to aspec- 
tual classification. This group of indicators is 
shown to improve classification performance for 
two aspectual distinctions, stativity and com- 
pletedness (i.e., telicity), over unrestricted sets 
of verbs from two corpora. Several of these in- 
dicators have not previously been discovered to 
correlate with aspect. 
1 Introduction 
Aspectual classification maps clauses to a small 
set of primitive categories in order to reason 
about time. For example, events such as, 
"You called your father," are distinguished from 
states such as, "You resemble your father." 
These two high-level categories correspond to 
primitive distinctions in many domains, e.g., the 
distinction between procedure and diagnosis in 
the medical domain. 
Aspectual classification further distinguishes 
events according to completedness (i.e., telicity), 
which determines whether an event reaches a 
culmination point in time at which a new state 
is introduced. For example, "I made a fire" 
is culminated, since a new state is introduced 
- something is made, whereas, "I gazed at the 
sunset" is non-culminated. 
Aspectual classification is necessary for inter- 
preting temporal modifiers and assessing tem- 
poral entailments (Vendler, 1967; Dowty, 1979; 
Moens and Steedman, 1988; Dorr, 1992), and 
is therefore a necessary component for applica- 
tions that perform certain natural language in- 
terpretation, natural language generation, sum- 
marization, information retrieval, and machine 
translation tasks. 
Aspect introduces a large-scale, domain- 
dependent lexical classification problem. Al- 
though an aspectual lexicon of verbs would suf- 
fice to classify many clauses by their main verb 
only, a verb's primary class is often domain- 
dependent (Siegel, 1998b). Therefore, it is nec- 
essary to produce a specialized lexicon for each 
domain. 
Most approaches to automatically catego- 
rizing words measure co-occurrences between 
open-class lexical items (Schfitze, 1992; Hatzi- 
vassiloglou and McKeown, 1993; Pereira et 
al., 1993). This approach is limited since co- 
occurrences between open-class lexical items is 
sparse, and is not specialized for particular se- 
mantic distinctions such as aspect. 
In this paper, we describe an expandable 
framework to classify verbs with linguistically- 
specialized numerical indicators. Each linguis- 
tic indicator measures the frequency of a lexico- 
syntactic marker, e.g. the perfect tense. These 
markers are linguistically related to aspect, so 
the indicators are specialized for aspectual clas- 
sification in particular. We perform an evalua- 
tion of fourteen linguistic indicators over unre- 
stricted sets of verbs from two corpora. When 
used in combination, this group of indicators is 
shown to improve classification performance for 
two aspectual distinctions: stativity and com- 
pletedness. Moreover, our analysis reveals a 
predictive value for several indicators that have 
not previously been discovered to correlate with 
aspect in the linguistics literature. 
The following section further describes as- 
pect, and introduces linguistic insights that are 
exploited by linguistic indicators. The next sec- 
tion describes the set of linguistic indicators 
evaluated in this paper. Then, our experimen- 
tal method and results are given, followed by a 
discussion and conclusions. 
112 
Table 1: Aspectual classes. This table comes from 
Moens and Steedman (Moens and Steedman, 1988). 
Culm 
EVENTS STATES 
punctual extended 
CULM CULM 
PROCESS 
recognize build a house 
Non- POINT PROCESS 
Culm hiccup run, swim 
understand 
2 Aspect in Natural Language 
Table 1 summarizes the three aspectual distinc- 
tions, which compose five aspectual categories. 
In addition to the two distinctions described 
in the previous section, atomicity distinguishes 
events according to whether they have a time 
duration (punctual versus extended). Therefore, 
four classes of events are derived: culmination, 
culminated process, process, and point. These 
aspectual distinctions are defined formally by 
Dowty (1979). 
Several researchers have developed models 
that incorporate aspectual class to assess tem- 
poral constraints between clauses (Passonneau, 
1988; Dorr, 1992). For example, stativity must 
be identified to detect temporal constraints be- 
tween clauses connected with when, e.g., in in- 
terpreting (1), 
(1) She had good strength when objectively 
tested. 
the following temporal relationship holds: 
have I I 
I test I 
However, in interpreting (2), 
(2) Phototherapy was discontinued when the 
bilirubin came down to 13. 
the temporal relationship is different: 
COme I I 
discontinue I I 
These aspectual distinctions are motivated by 
a series of entailment constraints. In particu- 
lar, certain lexico-syntactic features of a clause, 
such as temporal adjuncts and tense, are con- 
strained by and contribute to the aspectual class 
of the clause (Vendler, 1967; Dowty, 1979). 
Tables 2 illustrates an array of linguistic con- 
Table 2: Several aspectual markers and associated 
constraints on aspectual class, primarily from Kla- 
vans' summary (1994). 
If a clause can occur: then it is: 
with a temporal adverb Event 
(e.g., then) 
in progressive Extended 
Event 
with a duration in-PP Culm Event 
(e.g., in an hour) 
in the perfect tense Culm Event 
or State 
straints. Each entry in this table describes an 
aspectual marker and the constraints on the as- 
pectual category of any clause that appears with 
that marker. For example, a clause must be 
an extended event to appear in the progressive 
tense, e.g., 
(3) He was prospering in India. (extended), 
which contrasts with, 
(4) *You were noticing it. (punctual). 
and, 
(5) *She was seeming sad. (state). 
As a second example, an event must be cul- 
minated to appear in the perfect tense, for ex- 
ample, 
(6) She had made an attempt. (culm.), 
which contrasts with, 
(7) *He has cowered down. (non-culm.) 
3 Linguistic Indicators 
The best way to exploit aspectual markers is not 
obvious, since, while the presence of a marker in 
a particular clause indicates a constraint on the 
aspectual class of the clause, the absence thereof 
does not place any constraint. Therefore, as 
with most statistical methods for natural lan- 
guage, the linguistic constraints associated with 
markers are best exploited by a system that 
measures co-occurrence frequencies. For exam- 
ple, a verb that appears more frequently in the 
progressive is more likely to describe an event. 
Klavans and Chodorow (1992) pioneered the ap- 
plication of statistical corpus analysis to aspec- 
tuai classification by ranking verbs according to 
the frequencies with which they occur with cer- 
tain aspectual markers. 
Table 3 lists the linguistic indicators evalu- 
ated for aspectual classification. Each indica- 
113 
Ling Indicator Example Clause 
frequency 
~tnot" or "never" 
temporal adverb 
no subject 
past/pres partic 
duration in-PP 
perfect 
present tense 
progressive 
manner adverb 
evaluation adverb 
past tense 
duration for-PP 
continuous adverb 
(not applicable) 
She can not explain why. 
I saw to it then. 
He was admitted. 
... blood pressure going up. 
She built it in an hour. 
They have landed. 
I am happy. 
I am behaving myself. 
She studied diligently. 
They performed horribly. 
I was happy. 
I sang for ten minutes. 
She will live indefinitely. 
Table 3: Fourteen linguistic indicators evaluated for 
aspectual classification. 
tor has a unique value for each verb. The first 
indicator, frequency, is simply the frequency 
with which each verb occurs over the entire 
corpus. The remaining 13 indicators measure 
how frequently each verb occurs in a clause 
with the named linguistic marker. For exam- 
ple, the next three indicators listed measure the 
frequency with which verbs 1) are modified by 
not or never, 2) are modified by a temporal ad- 
verb such as then or frequently, and 3) have no 
deep subject (e.g., passive phrases such as, "She 
was admitted to the hospital"). Further details 
regarding these indicators and their linguistic 
motivation is given by Siegel (1998b). 
There are several reasons to expect superior 
classification performance when employing mul- 
tiple linguistic indicators in combination rather 
than using them individually. While individ- 
ual indicators have predictive value, they are 
predictively incomplete. This incompleteness 
has been illustrated empirically by showing that 
some indicators help for only a subset of verbs 
(Siegel, 1998b). Such incompleteness is due in 
• part to sparsity and noise of data when com- 
puting indicator values over a corpus with lim- 
ited size and some parsing errors. However, this 
incompleteness is also a consequence of the lin- 
guistic characteristics of various indicators. For 
example: 
• Aspectual coercion such as iteration com- 
promises indicator measurements (Moens 
and Steedman, 1988). For example, a 
punctual event appears with the progres- 
sive in, "She was sneezing for a week." 
(point --, process --. culminated process) 
In this example, for a week can only modify 
an extended event, requiring the first coer- 
cion. In addition, this for-PP also makes 
an event culminated, causing the second 
transformation. 
• Some aspectual markers such as the 
pseudo-cleft and manner adverbs test for 
intentional events, and therefore are not 
compatible with all events, e.g., "*I died 
diligently." 
• The progressive indicator's predictiveness 
for stativity is compromised by the fact 
that many location verbs can appear with 
the progressive, even in their stative sense, 
e.g. "The book was lying on the shelf." 
(Dowty, 1979) 
• Several indicators measure phenomena 
that are not linguistically constrained by 
any aspectuM category, e.g., the present 
tense, frequency and not/never indicators. 
4 Method and Results 
In this section, we evaluate the set of 
fourteen linguistic indicators for two aspec- 
tual distinctions: stativity and completed- 
ness. Evaluation is over corpora of med- 
ical reports and novels, respectively. This 
data is summarized in Table 4 (available at 
www. CS. columbia, edu/~evs/YerbData). 
First, linguistic indicators are each evalu- 
ated individually. A training set is used to se- 
lect indicator value thresholds for classification. 
Then, we report the classification performance 
achieved by combining multiple indicators. In 
this case, the training set is used to optimize a 
model for combining indicators. In both cases, 
evaluation is performed over a separate test set 
of clauses. 
The combination of indicators is performed 
by four standard supervised learning algo- 
rithms: decision tree induction (Quinlan, 1986), 
CART (Friedman, 1977), log-linear regression 
(Santner and Duffy, 1989) and genetic program- 
ming (GP) (Cramer, 1985; Koza, 1992). 
A pilot study showed no further improve- 
ment in accuracy or recall tradeoff by additional 
learning algorithms: Naive Bayes (Duda and 
114 
stativity completedness 
corpus: 3,224 med reports 10 novels 
size: 1,159,891 846,913 
parsed 
clauses: 97,973 
training: 739 (634 events) 
testing: 739 (619 events) 
verbs in 
test set: 222 204 
clauses 
excluded: be and have stative 
75,289 
307 (196 culm) 
308 (195 culm) 
Table 4: Two classification problems on different 
data sets. 
Hart, 1973), Ripper (Cohen, 1995), ID3 (Quin- 
lan, 1986), C4.5 (Quinlan, 1993), and met- 
alearning to combine learning methods (Chan 
and Stolfo, 1993). 
4.1 Stativity 
Our experiments are performed across a cor- 
pus of 3,224 medical discharge summaries. A 
medical discharge summary describes the symp- 
toms, history, diagnosis, treatment and outcome 
of a patient's visit to the hospital. These re- 
ports were parsed with the English Slot Gram- 
mar (ESG) (McCord, 1990), resulting in 97,973 
clauses that were parsed fully with no self- 
diagnostic errors (ESG produced error messages 
on 12,877 of this corpus' 51,079 complex sen- 
tences). 
Be and have, the two most popular verbs, cov- 
ering 31.9% of the clauses in this corpus, are 
handled separately from all other verbs. Clauses 
with be as their main verb, comprising 23.9% of 
the corpus, always denote a state. Clauses with 
have as their main verb, composing 8.0% of the 
corpus, are highly ambiguous, and have been 
addressed separately by considering the direct 
object of such clauses (Siegel, 1998a). 
4.1.1 Manual Marking 
1,851 clauses from the parsed corpus were man- 
ually marked according to stativity. As a lin- 
guistic test for marking, each clause was tested 
for readability with "What happened was... ,1 
A comparison between human markers for this 
test performed over a different corpus is re- 
ported below in Section 4.2.1. Of these, 373 
1 Manual labeling followed a strict set of linguistically- 
motivated guidelines, e.g., negations were ignored 
(Siegel, 199Sb). 
Linguistic Stative Event T-test 
Indicator Mean Mean P-value 
frequency 932.89 667.57 0.0000 
"not" or "never" 4.44% 1.56% 0.0000 
temporal adverb 1.00% 2.70% 0.0000 
no subject 36.05% 57.56% 0.0000 
past/pres pattie 20.98% 15.37% 0.0005 
duration in-PP 0.16% 0.60% 0.0018 
perfect 2.27% 3.44% 0.0054 
present tense 11.19% 8.94% 0.0901 
progressive 1.79% 2.69% 0.0903 
manner adverb 0.00% 0.03% 0.1681 
evaluation adverb 0.69% 1.19% 0.1766 
past tense 62.85% 65.69% 0.2314 
duration for-PP 0.59% 0.61% 0.8402 
continuous adverb 0.04% 0.03% 0.8438 
Table 5: Indicators discriminate between states and 
events. 
clauses were rejected because of parsing prob- 
lems. This left 1,478 clauses, divided equally 
into training and testing sets. 
83.8% of clauses with main verbs other than 
be and have are events, which thus provides a 
baseline method of 83.8% for comparison. Since 
our approach examines only the main verb of a 
clause, classification accuracy over the test cases 
has a maximum of 97.4% due to the presence of 
verbs with multiple classes. 
4.1.2 Individual Indicators 
The values of the indicators listed in Table 5 
were computed, for each verb, across the 97,973 
parsed clauses from our corpus of medical dis- 
charge summaries. 
The second and third columns of Table 5 show 
the average value for each indicator over stative 
and event clauses, as measured over the training 
examples. For example, 4.44% of stative clauses 
are modified by either not or never, but only 
1.56% of event clauses were so modified. 
The fourth column shows the results of T- 
tests that compare indicator values over stative 
training cases to those over event cases for each 
indicator. As shown, the differences in stative 
and event means are statistically significant (p 
< .01) for the first seven indicators. 
Each indicator was tested individually for 
classification accuracy by establishing a classifi- 
cation threshold over the training data, and val- 
idating performance over the testing data using 
the same threshold. Only the frequency indi- 
cator succeeded in significantly improving clas- 
115 
States Events 
acc recall prec recall prec 
dt 93.9% 74.2% 86.4% 97.7% 95.1% 
GP 91.2% 47.4% 97.3% 99.7% 90.7% 
llr 86.7% 34.2% 68.3% 96.9% 88.4% 
bl 83.8% 0.0% 100.0% 100.0% 83.8% 
b12 94.5% 69.2% 95.4% 99.4% 94.3% 
Table 6: Comparison of three learning methods 
and two performance baselines, distinguishing states 
from events. 
sification accuracy by itself, achieving an accu- 
racy of 88.0%. This improvement in accuracy 
was achieved simply by discriminating the pop- 
ular verb show as a state~ but classifying all 
other verbs as events. Although many domains 
may primarily use show as an event, its appear- 
ances in medical discharge summaries, such as, 
"His lumbar puncture showed evidence of white 
cells," primarily utilize show to denote a state. 
4.1.3 Indicators in Combination 
Three machine learning methods successfully 
combined indicator values, improving classifi- 
cation accuracy over the baseline measure. As 
shown in Table 6, the decision tree attained the 
highest accuracy, 93.9%. Binomial tests showed 
this to be a significant improvement over the 
88.0% accuracy achieved by the frequency indi- 
cator alone, as well as over the other two learn- 
ing methods. No further improvement in classi- 
fication performance was achieved by CART. 
The increase in the number of stative clauses 
correctly classified, i.e. stative recall, illustrates 
an even greater improvement over the base- 
line. As shown in Table 6, the three learn- 
ing methods achieved stative recalls of 74.2%, 
47.4% and 34.2%, as compared to the 0.0% sta- 
tive recall achieved by the baseline, while only 
a small loss in recall over event clauses was suf- 
fered. The baseline does not classify any stative 
clauses correctly because it classifies all clauses 
as events. 
Classification performance is equally compet- 
itive without the frequency indicator, although 
this indicator appears to dominate over oth- 
ers. When decision tree induction was employed 
to combine only the 13 indicators other than 
frequency, the resulting decision tree achieved 
92.4% accuracy and 77.5% stative recall. 
4.2 Completedness 
In medical discharge summaries, non- 
culminated event clauses are rare. Therefore, 
our experiments for classification according to 
completedness are performed across a corpus 
of ten novels comprising 846,913 words. These 
novels were parsed with ESG, resulting in 
75,289 fully-parsed clauses (22,505 of 59,816 
sentences produced errors). 
4.2.1 Manual Marking 
884 clauses from the parsed corpus were man- 
ually marked according to completedness. Of 
these, 109 were rejected because of parsing 
problems, and 160 rejected because they de- 
scribed states. The remaining 615 clauses were 
divided into training and test sets such that the 
distribution of classes was equal. The baseline 
method in this case achieves 63.3% accuracy. 
The linguistic test was selected for this task 
by Passonneau (1988): If a clause in the past 
progressive necessarily entails the past tense 
reading, the clause describes a non-culminated 
event. For example, We were talking just like 
men (non-culm.) entails that We talked just 
like men, but The woman was building a house 
(culm.) does not necessarily entail that The 
woman built a house. Cross-checking between 
linguists shows high agreement. In particular, 
in a pilot study manually annotating 89 clauses 
from this corpus according to stativity, two lin- 
guists agreed 81 times. Of 57 clauses agreed 
to be events, 46 had agreement with respect to 
completedness. 
The verb say (point), which occurs nine times 
in the test set, was initially marked incorrectly 
as culminated, since points are non-extended 
and therefore cannot be placed in the progres- 
sive. After some initial experimentation, we cor- 
rected the class of each occurrence of say in the 
data. 
4.2.2 Individual Indicators 
Table 7 is analogous to Table 5 for complete- 
ness. The differences in culminated and non- 
culminated means are statistically significant (p 
< .05) for the first six indicators. However, for 
completedness, no indicator was shown to sig- 
nificantly improve classification accuracy over 
the baseline. 
116 
Linguistic Culm Non-Culm T-test 
Indicator Mean Mean P-value 
perfect 7.87% 2.88% 0.0000 
temporal adverb 5.60% 3.41% 0.0000 
manner adverb 0.19% 0.61% 0.0008 
progressive 3.02% 5.03% 0.0031 
past/pres partic 14.03% 17.98% 0.0080 
no subject 30.77% 26.55% 0.0241 
duration in-PP 0.27% 0.06% 0.0626 
present tense 17.18% 14.29% 0.0757 
duration for-PP 0.34% 0.49% 0.1756 
continuous adverb 0.10% 0.49% 0.2563 
frequency 345.86 286.55 0.5652 
"not" or "never" 3.41% 3.15% 0.6164 
evaluation adverb 0.46% 0.39% 0.7063 
past tense 53.62% 54.36% 0.7132 
Table 7: Indicators discriminate between culmi- 
nated and non-culminated events. 
acc 
Culminated Non-Culm 
recall prec recall prec 
CART 74.0% 86.2% 76.0% 53.1% 69.0% 
llr 70.5% 83.1% 73.6% 48.7% 62.5% 
lit2 67.2% 81.5% 71.0% 42.5% 57.1% 
GP 68.6% 77.3% 74.2% 53.6% 57.8% 
dt 68.5% 86.2% 70.6% 38.1% 61.4% 
bl 63.3% 100.0% 63.3% 0.0% 100.0% 
b12 70.8% 94.9% 69.8% 29.2% 76.7% 
Table 8: Comparison of four learning methods 
and two performance baselines, distinguishing cul- 
minated from non-culminated events. 
4.2.3 Indicators in Combination 
As shown in Table 8, the highest accuracy, 
74.0%, was attained by CART. A binomial test 
shows this is a significant improvement over the 
63.3% baseline. 
The increase in non-culminated recall illus- 
trates a greater improvement over the baseline. 
As shown in Table 8, non-culminated recalls of 
up to 53.6% were achieved by the learning meth- 
ods, compared to 0.0%, achieved by the base- 
line. 
Additionally, a non-culminated F-measure of 
61.9 was achieved by GP, when optimizing for 
F-Measure, improving over 53.7 attained by the 
optimal uninformed baseline. F-measure com- 
putes a tradeoff between recall and precision 
(Van Rijsbergen, 1979). In this work, we weigh 
recall and precision equally, in which case, recall*precision 
F - measure = (recall+precision)f2 
Automatic methods highly prioritized the 
perfect indicator. The induced decision tree uses 
the perfect indicator as its first discriminator, 
log-linear regression ranked the perfect indica- 
tor as fourth out of fourteen, function trees cre- 
ated by GP include the perfect indicator as one 
of five indicators used together to increase clas- 
sification performance, and the perfect indicator 
tied as most highly correlated with completed- 
ness (cf. Table 7). 
5 Discussion 
Since certain verbs are aspectually ambiguous, 
and, in this work, clauses are classified by their 
main verb only, a second baseline approach 
would be to simply memorize the majority as- 
pect of each verb in the training set, and classify 
verbs in the test set accordingly. In this case, 
test verbs that did not appear in the training set 
would be classified according to majority class. 
However, classifying verbs and clauses accord- 
ing to numerical indicators has several impor- 
tant advantages over this baseline: 
• Handles rare or unlabeled verbs. The 
results we have shown serve to estimate 
classification performance over "unseen" 
verbs that were not included in the super- 
vised training sample. Once the system 
has been trained to distinguish by indi- 
cator values, it can automatically classify 
any verb that appears in unlabeled cor- 
pora, since measuring linguistic indicators 
for a verb is fully automatic. This also ap- 
plies to verbs that are underrepresented in 
the training set. For example, one node 
of the resulting decision tree trained to 
distinguish according to stativity identifies 
19 stative test cases without misclassifying 
any of 27 event test cases with verbs that 
occur only one time each in the training 
set. 
• Success when training doesn't include 
test verbs. To test this, all test verbs 
were eliminated from the training set, and 
log-linear regression was trained over this 
smaller set to distinguish according to com- 
pletedness. The result is shown in Table 8 
("llr2"). Accuracy remained higher than 
the baseline "br' (bl2 not applicable), and 
the recall tradeoff is felicitous. 
. Improved performance. Memorizing 
majority aspect does not achieve as high 
an accuracy as the linguistic indicators for 
117 
completedness, nor does it achieve as wide 
a recall tradeff for both stativity and com- 
pletedness. These results are indicated as 
the second baselines ("bl2") in tables 6 and 
8, respectively. 
• Scalar values assigned to each verb al- 
low the tradeoff between recall and preci- 
sion to be selected for particular applica- 
tions by selecting the classification thresh- 
old. For example, in a separate study, op- 
timizing for F-measure resulted in a more 
dramatic tradeoff in recall values as com- 
pared to those attained when optimizing 
for accuracy (Siegel, 1998b). Moreover, 
such scalar values can provide input to sys- 
tems that perform reasoning on fuzzy or 
uncertainty knowledge. 
• This framework is expandable since 
additional indicators can be introduced 
by measuring the frequencies of additional 
aspectual markers. Furthermore, indica- 
tors measured over multiple clausal con- 
stituents, e.g., main verb-object pairs, al- 
leviate verb ambiguity and sparsity and 
improve classification performance (Siegel, 
1998b). 
6 Conclusions 
We have developed a full-scale system for aspec- 
tual classification with multiple linguistic indi- 
cators. Once trained, this system can automati- 
cally classify all verbs appearing in a corpus, in- 
cluding "unseen" verbs that were not included 
in the supervised training sample. This frame- 
work is expandable, since additional lexico- 
syntactic markers may also correlate with as- 
pectual class. Future work will extend this ap- 
proach to other semantic distinctions in natural 
language. 
Linguistic indicators successfully exploit lin- 
guistic insights to provide a much-needed 
method for aspectual classification. When com- 
bined with a decision tree to classify according 
to stativity, the indicators achieve an accuracy 
of 93.9% and stative recall of 74.2%. When com- 
bined with CART to classify according to com- 
pletedness, indicators achieved 74.0% accuracy 
and 53.1% non-culminated recall. 
A favorable tradeoff in recall presents an ad- 
vantage for applications that weigh the identi- 
fication of non-dominant classes more heavily 
(Cardie and Howe, 1997). For example, cor- 
rectly identifying occurrences of for that denote 
event durations relies on positively identifying 
non-culminated events. A system that summa- 
rizes the duration of events which incorrectly 
classifies "She ran (for a minute)" as culmi- 
nated will not detect that "for a minute" de- 
scribes the duration of the run event. This is be- 
cause durative for-PPs that modify culminated 
events denote the duration of the ensuing state, 
e.g., I leJt the room for a minute. (Vendler, 
1967) 
Our analysis has revealed several insights re- 
garding individual indicators. For example, 
both duration in-PP and manner adverb are 
particularly valuable for multiple aspectual dis- 
tinctions - they were ranked in the top two po- 
sitions by log-linear modeling for both stativity 
and completedness. 
We have discovered several new linguistic in- 
dicators that are not traditionally linked to as- 
pectual class. In particular, verb frequency with 
no deep subject was positively correlated with 
both stativity and completedness. Moreover, 
four other indicators are newly linked to stativ- 
ity: (1) Verb frequency, (2) occurrences modi- 
fied by "not" or "never", (3) occurrences in the 
past or present participle, and (4) occurrences 
in the perfect tense. Additionally, another three 
were newly linked to completedness: (1) occur- 
rences modified by a manner adverb, (2) occur- 
rences in the past or present participle, and (3) 
occurrences in the progressive. 
These new correlations can be understood in 
pragmatic terms. For example, since points 
(non-culminated, punctual events, e.g., hiccup) 
are rare, punctual events are likely to be cul- 
minated. Therefore, an indicator that discrim- 
inates events according to extendedness, e.g., 
the progressive, past/present participle, and du- 
ration for-PP, is likely to also discriminate be- 
tween culminated and non-culminated events. 
As a second example, the not/never indica- 
tor correlates with stativity in medical reports 
because diagnoses (i.e., states) are often ruled 
out in medical discharge summaries, e.g., "The 
patient was not hypertensive," but procedures 
(i.e., events) that were not done are not usu- 
ally mentioned, e.g., '~.An examination was not 
performed." 
118 
Acknowledgements 
Kathleen R. McKeown was extremely helpful regard- 
ing the formulation of this work and Judith L. Kla- 
vans regarding linguistic techniques, and they, along 
with Min-Yen Kan and Dragomir R. Radev provided 
useful feedback on an earlier draft of this paper. 
This research was supported in part by the 
Columbia University Center for Advanced Technol- 
ogy in High Performance Computing and Commu- 
nications in Healthcare (funded by the New York 
State Science and Technology Foundation), the Of- 
fice of Naval Research under contract N00014-95-1- 
0745 and by the National Science Foundation under 
contract GER-90-24069. 

References 
C. Cardie and N. Howe. 1997. Improving mi- 
nority class prediction using case-specific feature 
weights. In D. Fisher, editor, Proceedings of the 
Fourteenth International Conference on Machine 
Learning. Morgan Kaufmann. 
P.K. Chan and S.J. Stolfo. 1993. Toward multistrat- 
egy parallel and distributed learning in sequence 
analysis. In Proceedings of the First International 
Conference on Intelligent Systems for Molecular 
Biology. 
W. Cohen. 1995. Fast effective rule induction. In 
Proc. 12th Intl. Conf. Machine Learning, pages 
115-123. 
N. Cramer. 1985. A representation for the adap- 
tive generation of simple sequential programs. In 
J. Grefenstette, editor, Proceedings of the \[First\] 
International Conference on Genetic Algorithms. 
Lawrence Erlbaum. 
B.& Dorr. 1992. A two-level knowledge represen- 
tation for machine translation: lexical seman- 
tics and tense/aspect. In James Pustejovsky 
and Sabine Bergler, editors, Lexieal Semantics 
and Knowledge Representation. Springer Verlag, 
Berlin. 
D.R. Dowty. 1979. Word Meaning and Montague 
Grammar. D. Reidel, Dordrecht, W. Germany. 
R. O. Duda and P.E. Hart. 1973. Pattern Classifi- 
cation and Scene Analysis. Wiley, New York. 
J.H. Friedman. 1977. A recursive partitioning deci- 
sion rule for non-parametric classification. IEEE 
Transactions on Computers. 
V. Hatzivassiloglou and K. McKeown. 1993. To- 
wards the automatic identification of adjectival 
scales: clustering adjectives according to mean- 
ing. In Proceedings of the 31st Annual Meeting of 
the ACL, Columbus, Ohio, June. Association for 
Computational Linguistics. 
J.L. Klavans and M. Chodorow. 1992. Degrees of 
stativity: the lexical representation of verb as- 
pect. In Proceedings of the 14th International 
Conference on Computation Linguistics. 
J.L. Klavans. 1994. Linguistic tests over large cor- 
pora: aspectual classes in the lexicon. Technical 
report, Columbia University Dept. of Computer 
Science. unpublished manuscript. 
J.R. Koza. 1992. Genetic Programming: On the 
programming of computers by means of natural 
selection. MIT Press, Cambridge, MA. 
M.C. McCord. 1990. SLOT GRAMMAR. In 
R. Studer, editor, International Symposium on 
Natural Language and Logic. Springer Verlag. 
M. Moens and M. Steedman. 1988. Temporal ontol- 
ogy and temporal reference. Computational Lin- 
guistics, 14(2). 
R.J. Passonneau. 1988. A computational model of 
the semantics of tense and aspect. Computational 
Linguistics, 14(2). 
F. Pereira, N. Tishby, and L. Lee. 1993. Distribu- 
tional clustering of english words. In Proceedings 
of the 31st Conference of the ACL, Columbus, 
Ohio. Association for Computational Linguistics. 
J.R. Quinlan. 1986. Induction of decision trees. Ma- 
chine Learning, 1(1):81-106. 
J.R. Quinlan. 1993. C~.5: Programs for Machine 
Learning. Morgan Kaufmann, San Mateo, CA. 
T.J. Santner and D.E. Duffy. 1989. The Statistical 
Analysis of Discrete Data. Springer-Verlag, New 
York. 
H. Schfitze. 1992. Dimensions of meaning. In Pro- 
ceedings of Supereomputing. 
E.V. Siegel and K.R. McKeown. 1996. Gathering 
statistics to aspectually classify sentences with a 
genetic algorithm. In K. Oflazer and H. Somers, 
editors, Proceedings of the Second International 
Conference on New Methods in Language Process- 
ing, Ankara, Turkey, Sept. Bilkent University. 
E.V. Siegel. 1997. Learning methods for combining 
linguistic indicators to classify verbs. In Proceed- 
ings of the Second Conference on Empirical Meth- 
ods in Natural Language Processing, Providence, 
RI, August. 
E.V. Siegel. 1998a. Disambiguating verbs with the 
wordnet category of the direct object. In Proced- 
ings of the Usage of WordNet in Natural Language 
Processing Systems Workshop, Montreal, Canada. 
E.V. Siegel. 1998b. Linguistic Indicators for Lan- 
guage Understanding: Using machine learning 
methods to combine corpus-based indicators for 
aspectual classification of clauses. Ph.D. thesis, 
Columbia University. 
C.J. Van Rijsbergen. 1979. Information Retrieval. 
Butterwoths, London. 
Z. Vendler. 1967. Verbs and times. In Linguistics in 
Philosophy. Cornell University Press, Ithaca, NY. 
