XML Viewer - w97-0318

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0318_metho.xml
Size: 12,218 bytes
Last Modified: 2025-10-06 14:14:45
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0318">
  <Title>Learning Methods for Combining Linguistic Indicators to Classify Verbs</Title>
  <Section position="4" start_page="156" end_page="158" type="metho">
    <SectionTitle>
3 Approach
</SectionTitle>
    <Paragraph position="0"> Our goal is to exploit linguistic constraints such as those listed in Table 1 by counting their frequencies in a corpus. For example, it is likely that event verbs will occur more frequently in the progressive than state verbs, since the progressive is constrained to occur with event verbs. Therefore, the frequency with which a verb occurs in the progressive indicates whether it is an event or stative verb.</Paragraph>
    <Paragraph position="1"> We have evaluated 14 such linguistic indicators over clauses selected uniformly from a text corpus.</Paragraph>
    <Paragraph position="2"> In this way, we are measuring classification performance over an unrestricted set of verbs. First, the ability for each indicator to individually distinguish between stative and event verbs is evaluated. Then, in order in increase classification performance, machine learning techniques are employed to combine multiple indicators.</Paragraph>
    <Paragraph position="3"> In this section, we first describe the set of linguistic indicators used to discriminate events and states. Then, we show how machine learning is used to combine multiple indicators to improve classification performance. Three learning methods are compared for this task. Finally, we describe the corpus and evaluation set used for these experiments.</Paragraph>
    <Section position="1" start_page="156" end_page="157" type="sub_section">
      <SectionTitle>
3.1 Linguistic Indicators
</SectionTitle>
      <Paragraph position="0"> The first column of Table 2 lists the 14 linguistic indicators evaluated in this paper for classifying verbs.</Paragraph>
      <Paragraph position="1"> The second and third columns show the average value for each indicator over stative and event verbs, respectively, as computed over a corpus of parsed clauses, described below in Section 3.3. These values, as well as the third column, are further detailed in Section 4.</Paragraph>
      <Paragraph position="2"> Each verb has a unique value for each indicator.</Paragraph>
      <Paragraph position="3"> The first indicator, frequency, is simply the the frequency with which each verb occurs. As shown in Table 2, stative verbs occur more frequently than event verbs in our corpus.</Paragraph>
      <Paragraph position="4"> The remaining 13 indicators measure how frequently each verb occurs in a clause with the linguistic marker indicated. This list includes the four markers listed in Table 1, as well as 9 additional markers that have not previously been linked to stativity. For example, the next three indicators listed in Table 2 measure the frequency with which verbs 1) are modified by not or never, 2) are modified by a temporal adverb such as then or frequently, and 3) have no deep subject (passivized phrases often have no deep subject, e.g., &amp;quot;She was admitted to the hospital&amp;quot;). As shown, stative verbs are modified by not or never more frequently than event verbs, but event verbs are modified by temporal adverbs more frequently than stative verbs. For further detail regarding the set of 14 indicators, see Siegel (1997).</Paragraph>
      <Paragraph position="5"> An individual indicator can be used to classify verbs by simply establishing a threshold; if a verb's indicator value is below the threshold, it is assigned one class, otherwise it is assigned the alternative class. For example, in Table 3, which shows the predominant class and four indicator values corresponding to each of four verbs, a threshold of 1.00% would allow events to be distinguished from states based on the values of the not/never indicator. The next subsection describes how all 14 indicators can be used together to classify verbs.</Paragraph>
    </Section>
    <Section position="2" start_page="157" end_page="158" type="sub_section">
      <SectionTitle>
3.2 Combining Indicators with Learning
</SectionTitle>
      <Paragraph position="0"> Given a verb and its 14 indicator values, our goal is to use all 14 values in combination to classify the verb as a state or an event. Once a function for combining indicator values has been established, previously unobserved verbs can be automatically classified according to their indicator values. This section describes three machine learning methods employed to this end.</Paragraph>
      <Paragraph position="1"> Log-linear regression. As suggested by Klavans and Chodorow (1992), a weighted sum of multiple indicators that results in one &amp;quot;overall&amp;quot; indicator may provide an increase in classification performance. This method embodies the intuition that each indicator correlates with the probability that a verb describes an event or state, but that each indicator has its own unique scale, and so must be weighted accordingly. One way to determine these weights is log-linear regression (Santner and Duffy, 1989), a popular technique for binary classification.</Paragraph>
      <Paragraph position="2"> This technique, which is more extensive than a simple weighted sum, applies an inverse logit function, and employs the iterative reweighted least squares algorithm (Baker and Nelder, 1989).</Paragraph>
      <Paragraph position="3"> Genetic programming. An alternative to avoid the limitations of a linear combination is to generate a non-linear function tree that combines multiple indicators. A popular method for generating such function trees is a genetic algorithm (Holland, 1975; Goldberg, 1989). The use of genetic algorithms to generate function trees (Cramer, 1985; Koza, 1992) is frequently called genetic programming. The function trees are generated from a set of 17 primitives: the binary functions ADD, MULTIPLY and DIVIDE, and 14 terminals corresponding to the 14 indicators listed in Table 2. This set of primitives was established empirically; conditional functions, subtraction, and random constants failed to change performance significantly. The polarities for several indicators were reversed according to the polarities of the weights established by log-linear regression.</Paragraph>
      <Paragraph position="4"> Because the genetic algorithm is stochastic, each run may produce a different function tree. Runs of the genetic algorithm have a population size of 500, and end after 50,000 new individuals have been evaluated. null A threshold must be selected for both linear and function tree combinations of indicators. This way, overall outputs can be discriminated such that classification performance is maximized. For both methods, this threshold is established over the training set and frozen for evaluation over the test set.</Paragraph>
      <Paragraph position="5"> Decision trees. Another method capable of modeling non-linear relationships between indicators is a decision tree. Each internal node of a decision tree is a choice point, dividing an individual indicator into ranges of possible values. Each leaf node is labeled with a classification (state or event). Given the set of indicator values corresponding to a verb, that verb's class is established by deterministically traversing the tree from the root to a leaf. The most popular method of decision tree induction, employed here, is recursive partitioning (Quinlan, 1986; Breiman et al., 1984), which expands the tree from top to bottom. The Splus statistical package was used for the induction process, with parameters set to their default values.</Paragraph>
      <Paragraph position="6"> Previous efforts in corpus-based natural language processing have incorporated machine learning methods to coordinate multiple linguistic indicators, e.g., to classify adjectives according to markedness (Hatzivassiloglou and McKeown, 1995), to perform accent restoration (Yarowsky, 1994), for disambiguation problems (Yarowsky, 1994; Luk, 1995),  and for the automatic identification of semantically related groups of words (Pereira, Tishby, and Lee, 1993; Hatzivassiloglou and McKeown, 1993). For more detail on the machine learning experiments described here, see Siegel (1997).</Paragraph>
    </Section>
    <Section position="3" start_page="158" end_page="158" type="sub_section">
      <SectionTitle>
3.3 A Parsed Corpus
</SectionTitle>
      <Paragraph position="0"> The automatic identification of individual constituents within a clause is necessary to compute the values of the linguistic indicators in Table 2. The English Slot Grammar (ESG) (McCord, 1990) has previously been used on corpora to accumulate aspectual data (Klavans and Chodorow, 1992). ESG is particularly attractive for this task since its output describes a clause's deep roles, detecting, for example, the deep subject and object of a passiviz~d phrase.</Paragraph>
      <Paragraph position="1"> Our experiments are performed across a 1,159,891 word corpus of medical discharge summaries from which 97,973 clauses were parsed fully by ESG, with no self-diagnostic errors (ESG produced error messages on some of this corpus' complex sentences).</Paragraph>
      <Paragraph position="2"> The values of each indicator in Table 2 are computed, for each verb, across these 97,973 clauses.</Paragraph>
      <Paragraph position="3"> In this paper, we evaluate our approach over verbs other than be and have, the two most frequent verbs in this corpus. Table 4 shows the distribution of clauses with be, have, and remaining verbs as their main verb. Clauses with be as their main verb always denote states. Have is highly ambiguous, so the aspectual classification of clauses headed by have must incorporate additional constituents. For example, &amp;quot;The patient had Medicaid&amp;quot; denotes a state, while, &amp;quot;The patient had an enema&amp;quot; denotes an event. In separate work, we have shown that the semantic category of the direct object of have informs classification according to stativity (Siegel, 1997). Since the remaining problem is to increase the classification accuracy over the 68.1% of clauses that have main verbs other than be and have, all results are measured only across that portion of the corpus. As shown in Table 4, 83.8% of clauses with verbs other than be and have are events.</Paragraph>
      <Paragraph position="4"> A portion of the parsed clauses must be manually classified to provide supervised training data for the three learning methods mentioned above, and to provide a separate set of test data with which to evaluate the classification performance of our system. To this end, we manually marked 1,851 clauses selectcd uniformly from the set of parsed clauses not headed by be or have. As a linguistic test to mark according to stativity, each clause was tested for readability with &amp;quot;What happened was... &amp;quot;~ Of these, 373 were rejected because of parsing problems (verb or direct object incorrectly identified). This left 1,478 parsed clauses, which were divided equally into 739 training and 739 testing cases.</Paragraph>
      <Paragraph position="5"> Some verbs can denote both states and events, depending on other constituents of the clause. For example, show denotes a state in &amp;quot;His lumbar puncture showed evidence of white cells,&amp;quot; but denotes an event in &amp;quot;He showed me the photographs.&amp;quot; However, in this corpus, most verbs other than have are highly dominated by one sense. Of the 739 clauses included in the training set, 235 verbs occurred. Only 11 of these verbs were observed as both states and events.</Paragraph>
      <Paragraph position="6"> Among these, there was a strong tendency towards one sense. For example, show appears primarily as a state. Only five verbs - say, state, supplement, describe, and lie, were not dominated by one class over 80% of the time. Further, each of these were observed less than 6 times a piece, which makes the estimation of sense dominance inaccurate.</Paragraph>
      <Paragraph position="7"> The limited presence of verbal ambiguity in the test set does, however, place an upper bound of 97.4% on classification accuracy, since linguistic indicators are computed over the main verb only.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML