XML Viewer - w06-2110

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2110_metho.xml
Size: 21,342 bytes
Last Modified: 2025-10-06 14:10:46
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2110">
  <Title>Automatic Identification of English Verb Particle Constructions using Linguistic Features</Title>
  <Section position="3" start_page="65" end_page="66" type="metho">
    <SectionTitle>
2 Linguistic Features
</SectionTitle>
    <Paragraph position="0"> When verbs co-occur with particles to form VPCs, their meaning can be significantly different from the semantics of the head verb in isolation. According to Baldwin et al. (2003), divergences in VPC and head verb semantics are often reflected in differing selectional preferences, as manifested in patterns of noun co-occurrence. In one example cited in the paper, the cosine similarity between cut and cut out, based on word co-occurrence vectors, was found to be greater than that between cut and cut off, mirroring the intuitive compositionality of these VPCs.</Paragraph>
    <Paragraph position="1"> (1) and (2) illustrate the difference in the selectional preferences of the verb put in isolation as compared with the VPC put on.3  (1) put = place EX: Put the book on the table.</Paragraph>
    <Paragraph position="2"> ARGS: bookOBJ = book, publication, object ANALYSIS: verb-PP (2) put on = wear EX: Put on the sweater .</Paragraph>
    <Paragraph position="3"> ARGS: sweaterOBJ = garment, clothing ANALYSIS: verb particle construction  While put on is generally used in the context of wearing something, it usually occurs with clothingtype nouns such as sweater and coat, whereas the simplex put has less sharply defined selectional restrictions and can occur with any noun. In terms of the word senses of the head nouns of the object NPs, the VPC put on will tend to co-occur with objects which have the semantics of clothes or garment. On the other hand, the simplex verb put in isolation tends to be used with objects with the semantics of object and prepositional phrases containing NPs with the semantics of place.</Paragraph>
    <Paragraph position="4"> Also, as observed above, the valence of a VPC can differ from that of the head verb. (3) and (4) illustrate two different senses of take off with intransitive and transitive syntax, respectively. Note that take cannot occur as a simplex intransitive verb.</Paragraph>
    <Paragraph position="5"> (3) take off = lift off EX: The airplane takes off.</Paragraph>
    <Paragraph position="6"> ARGS: airplaneSUBJ = airplane, aeroplane ANALYSIS: verb particle construction (4) take off = remove EX: They take off the cape .</Paragraph>
    <Paragraph position="7"> ARGS: theySUBJ = person, individual capeOBJ = garment, clothing ANALYSIS: verb particle construction Note that in (3), take off = lift off co-occurs with a subject of the class airplane, aeroplane. In (4), on the other hand, take off = remove and the corresponding object noun is of class garment or clothing. From the above, we can see that head nouns in the subject and object argument positions can be used to distinguish VPCs from simplex verbs with prepositional phrases (i.e. verb-PPs).</Paragraph>
  </Section>
  <Section position="4" start_page="66" end_page="67" type="metho">
    <SectionTitle>
3 Approach
</SectionTitle>
    <Paragraph position="0"> Our goal is to distinguish VPCs from verb-PPs in corpus data, i.e. to take individual inputs such as Kim handed the paper in today and tag each as either a VPC or a verb-PP. Our basic approach is to parse each sentence with RASP (Briscoe and Carroll, 2002) to obtain a first-gloss estimate of the VPC and verb-PP token instances, and also identify the head nouns of the arguments of each VPC and simplex verb. For the head noun of each subject and object, as identified by RASP, we use WordNet 2.1 (Fellbaum, 1998) to obtain the word sense. Finally we build a supervised classifier using TiMBL 5.1 (Daelemans et al., 2004).</Paragraph>
    <Section position="1" start_page="66" end_page="66" type="sub_section">
      <SectionTitle>
3.1 Method
</SectionTitle>
      <Paragraph position="0"> Compared to the method proposed by Baldwin (2005), our approach (a) tackles the task of VPC identification rather than VPC extraction, and (b) uses both syntactic and semantic features, employing the WordNet 2.1 senses of the subject and/or object(s) of the verb. In the sentence He put the coat on the table, e.g., to distinguish the VPC put on from the verb put occurring with the prepositional phrase on the table, we identify the senses of the head nouns of the subject and object(s) of the verb put (i.e. he and coat, respectively).</Paragraph>
      <Paragraph position="1"> First, we parse all sentences in the given corpus using RASP, and identify verbs and prepositions in the RASP output. This is a simple process of checking the POS tags in the most-probable parse, and for both particles (tagged RP) and transitive prepositions (taggedII) reading off the governing verb from the dependency tuple output (see Section 3.2 for details). We also retrieved the head nouns of the subject and object(s) of each head verb directly from the dependency tuples. Using WordNet 2.1, we then obtain the word sense of the head nouns.</Paragraph>
      <Paragraph position="2"> The VPCs or verb-PPs are represented with corresponding information as given below:</Paragraph>
      <Paragraph position="4"> where type denotes either a VPC or verb-PP, v is the head verb, p is the preposition, and ws* is the word sense of the subject, direct object or indirect object.</Paragraph>
      <Paragraph position="5"> Once all the data was gathered, we separated it into test and training data. We then used TiMBL 5.1 to learn a classifier from the training data, which was then run and evaluated over the test data. See Section 5 for full details of the results. Figure 1 depicts the complete process used to distinguish VPCs from verb-PPs.</Paragraph>
    </Section>
    <Section position="2" start_page="66" end_page="67" type="sub_section">
      <SectionTitle>
3.2 On the use of RASP, WordNet and
TiMBL
</SectionTitle>
      <Paragraph position="0"> RASP is used to identify the syntactic structure of each sentence, including the head nouns of arguments and first-gloss determination of whether a given preposition is incorporated in a VPC or verb-PP. The RASP output contains dependency tuples derived from the most probable parse, each of which includes a label identifying the nature of the dependency (e.g. SUBJ, DOBJ), the head word of the modifying constituent, and the head of the modified constituent. In addition, each word is tagged with a POS tag from which it is possible to determine the valence of any prepositions. McCarthy et al. (2003) evaluate the precision of RASP at identifying VPCs to be 87.6% and the recall to be 49.4%. However the paper does not evaluate the parser's ability to distinguish sentences containing VPCs and sentences with verb-PPs.</Paragraph>
      <Paragraph position="1"> To better understand the baseline performance of RASP, we counted the number of false-positive examples tagged with RP and false-negative examples tagged with II, relative to gold-standard data. See Section 5 for details.</Paragraph>
      <Paragraph position="2"> We use WordNet to obtain the first-sense word sense of the head nouns of subject and object phrases, according to the default word sense ranking provided within WordNet. McCarthy et al.</Paragraph>
      <Paragraph position="3"> (2004) found that 54% of word tokens are used with their first (or default) sense. With the performance of current word sense disambiguation (WSD) systems hovering around 60-70%, a simple first-sense WSD system has room for improvement, but is sufficient for our immediate purposes  in this paper.</Paragraph>
      <Paragraph position="4"> To evaluate our approach, we built a supervised classifier using the TiMBL 5.1 memory-based learner and training data extracted from the Brown and WSJ corpora.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="67" end_page="68" type="metho">
    <SectionTitle>
4 Data Collection
</SectionTitle>
    <Paragraph position="0"> We evaluated out method by running RASP over Brown Corpus and Wall Street Journal, as contained in the Penn Treebank (Marcus et al., 1993).</Paragraph>
    <Section position="1" start_page="67" end_page="67" type="sub_section">
      <SectionTitle>
4.1 Data Classification
</SectionTitle>
      <Paragraph position="0"> The data we consider is sentences containing prepositions tagged as either RP or II. Based on the output of RASP, we divide the data into four  instances tagged tagged exclusively as VPCs (i.e. the preposition is never tagged as II in combination with the given head verb). Group B contains the verb-preposition token instances identified as VPCs by RASP where there were also instances of that same combination identified as verb-PPs. Group C contains the verb-preposition token instances identified as verb-PPs by RASP where there were also instances of that same combination identified as VPCs. Finally, group D contains the verb-preposition combinations which were tagged exclusively as verb-PPs by RASP.</Paragraph>
      <Paragraph position="1"> We focus particularly on disambiguating verb-preposition token instances falling into groups B and C, where RASP has identified an ambiguity for that particular combination. We do not further classify token instances in group D, on the grounds that (a) for high-frequency verb-preposition combinations, RASP was unable to find a single instance warranting a VPC analysis, suggesting it had high confidence in its ability to correctly identify instances of this lexical type, and (b) for low-frequency verb-preposition combinations where the confidence of there definitively no being a VPC usage is low, the token sample is too small to disambiguate effectively and the overall impact would be negligible even if we tried. We do, however, return to considered data in group D in computing the precision and recall of RASP.</Paragraph>
      <Paragraph position="2"> Naturally, the output of RASP parser is not error-free, i.e. VPCs may be parsed as verb-PPs  and vice versa. In particular, other than the reported results of McCarthy et al. (2003) targeting VPCs vs. all other analyses, we had no a priori sense of RASP's ability to distinguish VPCs and verb-PPs. Therefore, we manually checked the false-positive and false-negative rates in all four groups and obtained the performance of parser with respect to VPCs. The verb-PPs in group A and B are false-positives while the VPCs in group C and D are false-negatives (we consider the VPCs to be positive examples).</Paragraph>
      <Paragraph position="3"> To calculate the number of incorrect examples, two human annotators independently checked each verb-preposition instance. Table 1 details the rate of false-positives and false-negative examples in each data group, as well as the inter-annotator agreement (calculated over the entire group).</Paragraph>
    </Section>
    <Section position="2" start_page="67" end_page="68" type="sub_section">
      <SectionTitle>
4.2 Collection
</SectionTitle>
      <Paragraph position="0"> We combined together the 6,535 (putative) VPCs and 995 (putative) verb-PPs from groups A, B and C, as identified by RASP over the corpus data. Table 2 shows the number of VPCs in groups A and B and the number of verb-PPs in group C. The first number is the number of examples occuring at least once and the second number that of examples occurring five or more times.</Paragraph>
      <Paragraph position="1"> From the sentences containing VPCs and verb-PPs, we retrieved a total of 8,165 nouns, including  nouns in group A&amp;B, and group C pronouns (e.g. I, he, she), proper nouns (e.g. CITI, Canada, Ford) and demonstrative pronouns (e.g.</Paragraph>
      <Paragraph position="2"> one, some, this), which occurred as the head noun of a subject or object of a VPC in group A or B. We similarly retrieved 1,343 nouns for verb-PPs in group C. Table 3 shows the distribution of different noun types in these two sets.</Paragraph>
      <Paragraph position="3"> We found that about 10% of the nouns are pronouns (personal or demonstrative), proper nouns or WH words. For pronouns, we manually resolved the antecedent and took this as the head noun. When which is used as a relative pronoun, we identified if it was coindexed with an argument position of a VPC or verb-PP, and if so, manually identified the antecedent, as illustrated in (5).  (5) EX: Tom likes the books which he sold off.</Paragraph>
      <Paragraph position="4"> ARGS: heSUBJ = person</Paragraph>
      <Paragraph position="6"> With what, on the other hand, we were generally not able to identify an antecedent, in which case the argument position was left without a word sense (we come back to this in Section 6).</Paragraph>
      <Paragraph position="7"> (6) Tom didn't look up what to do.</Paragraph>
      <Paragraph position="8"> What went on? We also replaced all proper nouns with corresponding common noun hypernyms based on manual disambiguation, as the coverage of proper nouns in WordNet is (intentionally) poor. The following are examples of proper nouns and their common noun hypernyms:  When we retrieved the first word sense of nouns from WordNet, we selected the first sense and the associated hypernyms (up to) three levels up the WordNet hierarchy. This is intended as a crude form of smoothing for closely-related word senses which occur in the same basic region of the Word-Net hierarchy. As an illustration of this process, in Figure 2, apple and orange are used as edible fruit, fruit or food, and the semantic overlap is picked up on by the fact that edible fruit is a hypernym of both apple and orange. On the other hand, food is the fourth hypernym for orange so it is ignored by our method. However, because we use the four senses, the common senses of nouns are extracted properly. This approach works reasonably well for retrieving common word senses of nouns which are in the immediate vicinity of each other in the WordNet hierarchy, as was the case with apple and orange. In terms of feature representation, we generate an individual instance for each noun sense generated based on the above method, and in the case that we have multiple arguments for a given VPC or verb-PP (e.g. both a subject and a direct object), we generate an individual instance for the cross product of all sense combinations between the arguments.</Paragraph>
      <Paragraph position="9"> We use 80% of the data for training and 20% for testing. The following is the total number of training instances, before and after performing hy-</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="68" end_page="68" type="metho">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> We selected 20% of the test data from different combinations of the four groups and over the two frequency thresholds, leading to a total of 8 test data sets. The first data set contains examples from group B only, the second set is from groups B and A, the third set is from groups B and C, and the fourth set is from groups B, A and C. Additionally, each data set is divided into: (1) f [?] 1, i.e. verb-preposition combinations occurring at least once, and (2) f [?] 5, i.e. verb-preposition combinations occurring at least five times (hereafter, f [?] 1 is labelled f[?]1 and f [?] 5 is labelled f[?]5 ). In the group C data, there are 217 verb-PPs with f[?]5 , which is slightly more than 20% of the data so we use verb-PPs with f[?]1 for experiments instead of verb-PP with f[?]5. The first and second data sets do not contain negative examples while the third and fourth data sets contain both positive and negative examples. As a result, the precision for the first two data sets is 1.0.</Paragraph>
    <Paragraph position="1"> Table 5 shows the precision, recall and F-score of our method over each data set, relative to the identification of VPCs only. A,B,C are groups and f# is the frequency of examples.</Paragraph>
    <Paragraph position="2"> Table 6 compares the performance of VPC identification and verb-PP identification.</Paragraph>
    <Paragraph position="3"> Table 7 indicates the result using four word senses (i.e. with hypernym expansion) and only one word sense (i.e. the first sense only).</Paragraph>
  </Section>
  <Section position="7" start_page="68" end_page="71" type="metho">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> The performance of RASP as shown in Tables 5 and 6 is based on human judgement. Note that we only consider the ability of the parser to distinguish sentences with prepositions as either VPCs or verb-PPs (i.e. we judge the parse to be correct if the preposition is classified correctly, irrespective of whether there are other errors in the output).</Paragraph>
    <Paragraph position="1">  identification (P = precision, R = recall, F = Fscore) null Also, we ignore the ambiguity between particles and adverbs, which is the principal reason for our evaluation being much higher than that reported by McCarthy et al. (2003). In Table 5, the precision (P) and recall (R) for VPCs are computed as follows:</Paragraph>
    <Paragraph position="3"> The performance of RASP in Table 6 shows how well it distinguishes between VPCs and verb-PPs for ambiguous verb-preposition combinations. Since Table 6 shows the comparative performance of our method between VPCs and verb-PPs, the performance of RASP with examples which are misrecognized as each other should be the guideline. Note, the baseline RASP accuracy, based on assigning the majority class to instances in each of groups A, B and C, is 83.04%.</Paragraph>
    <Paragraph position="4"> In Table 5, the performance over high-frequency data identified from groups B, A and C is the highest (F-score = .974). In general, we would expect the data set containing the high frequency and both positive and negative examples  and only the first sense (1WS), in terms of precision (P), recall (R) and F-score (F) to give us the best performance at VPC identification. We achieved a slightly better result than the 95.8%-97.5% performance reported by Li et al. (2003). However, considering that Li et al.</Paragraph>
    <Paragraph position="5"> (2003) need considerable time and human labour to generate hand-coded rules, our method has advantages in terms of both raw performance and labour efficiency.</Paragraph>
    <Paragraph position="6"> Combining the results for Table 5 and Table 6, we see that our method performs better for VPC identification than verb-PP identification. Since we do not take into account the data from group D with our method, the performance of verb-PP identification is low compared to that for RASP, which in turn leads to a decrement in the overall performance.</Paragraph>
    <Paragraph position="7"> Since we ignored the data from group D containing unambiguous verb-PPs, the number of positive training instances for verb-PP identification was relatively small. As for the different number of word senses in Table 7, we conclude that the more word senses the better the performance, particularly for higher-frequency data items.</Paragraph>
    <Paragraph position="8"> In order to get a clearer sense of the impact of selectional preferences on the results, we investigated the relative performance over VPCs of varying semantic compositionality, based on 117 VPCs (f[?]1 ) attested in the data set of McCarthy et al. (2003). According to our hypothesis from above, we would expect VPCs with low compositionality to have markedly different selectional preferences to the corresponding simplex verb, and VPCs with high compositionality to have similar selectional preferences to the simplex verb. In terms of the performance of our method, therefore, we would expect the degree of compositionality to be inversely proportional to the system performance. We test this hypothesis in Figure 3, where we calculate the error rate reduction (in F-score)  for the proposed method relative to the majorityclass baseline, at various degrees of compositionality. McCarthy et al. (2003) provides compositionality judgements from three human judges, which we take the average of and bin into 11 categories (with 0 = non-compositional and 10 = fully compositional). In Figure 3, we plot both the error rate reduction in each bin (both the raw numbers and a smoothed curve), and also the number of attested VPC types found in each bin. From the graph, we see our hypothesis born out that, with perfect performance over non-compositional VPCs and near-baseline performance over fully compositional VPCs. Combining this result with the overall results from above, we conclude that our method is highly successful at distinguishing non-compositional VPCs from verb-PPs, and further that there is a direct correlation between the degree of compositionality and the similarity of the selectional preferences of VPCs and their verb counterparts.</Paragraph>
    <Paragraph position="9"> Several factors are considered to have influenced performance. Some data instances are missing head nouns which would assist us in determining the semantics of the verb-preposition combination. Particular examples of this are imperative and abbreviated sentences: (7) a. Come in.</Paragraph>
    <Paragraph position="10"> b. (How is your cold?) Broiled out.</Paragraph>
    <Paragraph position="11"> Another confounding factor is the lack of word sense data, particularly in WH questions:  (8) a. What do I hand in? b. You can add up anything .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML