XML Viewer - p06-2064

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2064_metho.xml
Size: 14,034 bytes
Last Modified: 2025-10-06 14:10:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2064">
  <Title>Interpreting Semantic Relations in Noun Compounds via Verb Semantics</Title>
  <Section position="5" start_page="491" end_page="491" type="metho">
    <SectionTitle>
2 Motivation
</SectionTitle>
    <Paragraph position="0"> The semantic relation in NCs is the relation between the head noun (denoted &amp;quot;H&amp;quot;) and the modifier(s) (denoted &amp;quot;M&amp;quot;). We can find this semantic relation expressed in certain sentential constructions involving the head noun and modifier.</Paragraph>
    <Paragraph position="1">  (1) family car CASE: family owns the car.</Paragraph>
    <Paragraph position="2"> FORM: H own M RELATION: POSSESSOR (2) student protest  CASE: protest is performed by student. FORM: M is performed by H</Paragraph>
  </Section>
  <Section position="6" start_page="491" end_page="491" type="metho">
    <SectionTitle>
RELATION: AGENT
</SectionTitle>
    <Paragraph position="0"> In the examples above, the semantic relation (e.g. POSSESSOR) provides an interpretation of how the head noun and modifiers relate to each other, andtheseedverb(e.g.own)providesaparaphrase evidencing that relation. For example, in the case of family car, the family is the POSSESSOR of the car, and in student protest, student(s) aretheAGENToftheprotest. Notethatvoiceisimportant in aligning sentential contexts with semantic relations. For instance, family car can be represented as car is owned by family (passive) and student protest as student performs protest (active).</Paragraph>
    <Paragraph position="1"> The exact nature of the sentential context varies with different synonyms of the seed verbs.</Paragraph>
  </Section>
  <Section position="7" start_page="491" end_page="492" type="metho">
    <SectionTitle>
RELATION: AGENT
</SectionTitle>
    <Paragraph position="0"> The verb own in the POSSESSOR relation has the synonyms have, possess and belong to. In the context of have and possess, the form of relation would be same as the form with verb, own.</Paragraph>
    <Paragraph position="1"> However, in the context of belong to, family car  would mean that the car belongs to family. Thus, even when the voice of the verb is the same (voice=active), the grammatical role of the head noun and modifier can change.</Paragraph>
    <Paragraph position="2"> In our approach we map the actual verbs in sentences containing the head noun and modifiers to seed verbs corresponding to the relation forms. The set of seed verbs contains verbs representative of each semantic relation form. We chose two sets of seed verbs of size 57 and 84, to examine how the coverage of actual verbs by seed verbs affects the performance of our method. Initially, we manually chose a set of 60 seed verbs. We then added synonyms from Moby's thesaurus for some of the 60 verbs. Finally, we filtered verbs from the two expanded sets, since these verbs occur very frequently in the corpus (as this might skew the results). The set of seed verbs {have, own, possess, belong to} are in the set of 57 seed verbs, and {acquire, grab, occupy} are added to the set of 84 seed verbs; all correspond to the POSSESSOR relation.</Paragraph>
    <Paragraph position="3"> For each relation, we generate a set of constructional templates associating a subset of seed verbs with appropriate grammatical relations for the head noun and modifier. Examples for POS-</Paragraph>
    <Paragraph position="5"> whereV is the set of seed verbs, M is the modifier and H is the head noun.</Paragraph>
    <Paragraph position="6"> Two relations which do not map readily onto seed verbs are TIME (e.g. winter semester) and EQUATIVE (e.g. composer arranger). Here, we rely on an independent set of contextual evidence, as outlined in Section 6.1.</Paragraph>
    <Paragraph position="7"> Through matching actual verbs attested in corpus data onto seed verbs, we can match sentences withrelations(seeSection6.2). Usingthismethod we can identify the matching relation forms of semanticrelationstodecidethesemanticrelationfor null NCs.</Paragraph>
  </Section>
  <Section position="8" start_page="492" end_page="492" type="metho">
    <SectionTitle>
3 Semantic Relations in Compound
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="492" end_page="492" type="sub_section">
      <SectionTitle>
Nouns
</SectionTitle>
      <Paragraph position="0"> While there has been wide recognition of the need for a system of semantic relations with which to classify NCs, there is still active debate as to what the composition of that set should be, or indeed  whether it is reasonable to expect that all NCs shouldbeinterpretablewithafixedsetofsemantic relations.</Paragraph>
      <Paragraph position="1"> Based on the pioneering work on Levi (1979) and Finin (1980), there have been efforts in computational linguistics to arrive at largely task-specific sets of semantic relations, driven by the annotation of a representative sample of NCs from a given corpus type (Vanderwende, 1994; Barker and Szpakowicz, 1998; Rosario and Marti, 2001; Moldovan et al., 2004). In this paper, we use the set of 20 semantic relations defined by Barker and Szpakowicz (1998), rather than defining a new set of relations. The main reasons we chose this set are: (a) that it clearly distinguishes between the head noun and modifiers, and (b) there is clear documentation of each relation, which is vital for NC annotation effort. The one change we make to the original set of 20 semantic relations is to excludethePROPERTYrelationsinceitistoogeneral null and a more general form of several other relations including MATERIAL (e.g. apple pie).</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="492" end_page="493" type="metho">
    <SectionTitle>
4 Method
</SectionTitle>
    <Paragraph position="0"> Figure 1 outlines the system architecture of our approach. We used three corpora: the Brown corpus (as contained in the Penn Treebank), the Wall Street Journal corpus (also taken from the Penn treebank), and the written component of the British National Corpus (BNC). We first parsed each of these corpora using RASP (Briscoe and Carroll, 2002), and identified for each verb token the voice, head nouns of the subject and object, and also, for each PP attached to that verb, the head preposition and head noun of the  NP (hereafter, PPN). Next, for our test NCs, we identified all verbs for which the modifier and head noun co-occur as subject, object, or PPN.</Paragraph>
    <Paragraph position="1"> We then mapped these verbs to seed verbs using WordNet::Similarity and Moby's Thesaurus(seeSection5fordetails). Finally, weidentifiedthecorrespondingrelationforeachseedverb null and selected the best-fitting semantic relation using a classifier. To evaluate our approach, we built a classifier usingTiMBL(Daelemans et al., 2004).</Paragraph>
  </Section>
  <Section position="10" start_page="493" end_page="493" type="metho">
    <SectionTitle>
5 Resources
</SectionTitle>
    <Paragraph position="0"> In this section, we outline the tools and resources employed in our method.</Paragraph>
    <Paragraph position="1"> As our parser, we used RASP, generating a dependency representation for the most probable parse for each sentence. Note that RASP also lemmatises all words in a POS-sensitive manner.</Paragraph>
    <Paragraph position="2"> To map actual verbs onto seed verbs, we experimented with two resources: WordNet::Similarity and Moby's thesaurus. WordNet::Similarity2 is an open source software package that allows the user to measure the semantic similarity or relatedness between two words (Patwardhan et al., 2003). Of the many methods implemented in WordNet::Similarity, we report on results for one path-based method (WUP, Wu and Palmer (1994)), one content-information based method (JCN, Jiang and Conrath (1998)) and two semantic relatedness methods (LESK, Banerjee and Pedersen (2003), and VECTOR, (Patwardhan, 2003)). We also used a random similarity-generating method as a baseline (RANDOM).</Paragraph>
    <Paragraph position="3"> The second semantic resource we use for verb-mapping method is Moby's thesaurus. Moby's thesaurus is based on Roget's thesaurus, and contains 30K root words, and 2.5M synonyms and related words. Since the direct synonyms of seed verbs have limited coverage over the set of sentences used in our experiment, we also experimentedwithusingsecond-levelindirectsynonyms null of seed verbs.</Paragraph>
    <Paragraph position="4"> In order to deal with the TIME relation, we used CoreLex (Buitelaar, 1998). CoreLex is based on a unified approach to systematic polysemy and the semantic underspecification of nouns, and derives from WordNet 1.5. It contains 45 basic CoreLex types, systematic polysemous classes and 39,937 nouns with tags.</Paragraph>
    <Paragraph position="5">  As mentioned earlier, we built our supervised classifier using TiMBL.</Paragraph>
  </Section>
  <Section position="11" start_page="493" end_page="494" type="metho">
    <SectionTitle>
6 Data Collection
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="493" end_page="494" type="sub_section">
      <SectionTitle>
6.1 Data Processing
</SectionTitle>
      <Paragraph position="0"> To test our method, we extracted 2,166 NC types from the Wall Street Journal (WSJ) component of the Penn Treebank. We additionally extracted sentences containing the head noun and modifier in  pre-definedconstructionalcontextsfromtheamalgam of: (1) the Brown Corpus subset contained in the Penn Treebank, (2) the WSJ portion of the Penn Treebank, and (3) the British National Corpus (BNC). Note that while these pre-defined constructional contexts are based on the contexts in which our seed verbs are predicted to correlate with a given semantic relation, we instances of all verbs occurring in those contexts. For example, based on the construction in Equation 5, we extract all instances of S(Vi,M SUBJj ,H OBJj ) for all verbs Vi and all instances of NCj = (Mj,Hj) in our dataset.</Paragraph>
      <Paragraph position="1"> Two annotators tagged the 2,166 NC types independently at 52.3% inter-annotator agreement, and then met to discus all contentious annotations and arrive at a mutually-acceptable gold-standard annotation for each NC. The Brown, WSJ and BNC data was pre-parsed with RASP, and sentences were extracted which contained the head noun and modifier of one of our 2,166 NCs in subjectorobjectposition,oras(head)nounwithinthe null NP of an PP. After extracting these sentences, we counted the frequencies of the different modifier-head noun pairs, and filtered out: (a) all constructional contexts not involving a verb contained in WordNet 2.0, and (b) all NCs for which the modifier and head noun did not co-occur in at least five sentential contexts. This left us with a total of 453 NCs for training and testing. The combined total numberofsententialcontextsforour453NCswas 7,714, containing 1,165 distinct main verbs.</Paragraph>
      <Paragraph position="2"> We next randomly split the NC data into 80% training data and 20% test data. The final number of test NCs is 88; the final number of training NCs varies depending on the verb-mapping method.</Paragraph>
      <Paragraph position="3"> As noted in Section 2, the relations TIME and EQUATIVE are not associated with seed verbs. For TIME, rather than using contextual evidence, we simply flag the possibility of a TIME if the modifier is found to occur in the TIME class of CoreLex. In the case of TIME, we consider coordinated occur- null rences of the modifier and head noun (e.g. coach and player for player coach) as evidence for the relation.3 We thus separately collate statistics from coordinated NPs for each NC, and from this compute a weight for each NC based on mutual information:</Paragraph>
      <Paragraph position="5"> where Mi and Hi are the modifier and head of NCi, respectively, and freq(coord(Mi,Hi)) is the frequency of occurrence of Mi and Hi in coordinated NPs.</Paragraph>
      <Paragraph position="6"> Finally, we calculate a normalised weight for each seed verb by determining the proportion of head verbs each seed verb occurs with.</Paragraph>
    </Section>
    <Section position="2" start_page="494" end_page="494" type="sub_section">
      <SectionTitle>
6.2 Verb Mapping
</SectionTitle>
      <Paragraph position="0"> The sentential contexts gathered from corpus data contain a wide range of verbs, not just the seed verbs. To map the verbs onto seed verbs, and hence estimate which semantic relation(s) each is a predictor of, we experimented with two different methods. First we used the WordNet::Similarity package to calculate the similarity between a given verb and each of the seed verbs, experimenting with the 5 methods mentioned in Section 5. Second, we used Moby's thesaurus to extract both direct synonyms (D-SYNONYM) and a combination of direct and second-level indirect synonyms of verbs (I-SYNONYM), and from this, calculate the closestmatching seed verb(s) for a given verb.</Paragraph>
      <Paragraph position="1"> Figure 2 depicts the procedure for mapping verbs in constructional contexts onto the seed verbs. Verbs found in the various contexts in the 3Note the order of the modifier and head in coordinated NPs is considered to be irrelevant, i.e. player and coach and coach and player are equally evidence for an EQUATIVE interpretation for player coach (and coach player).</Paragraph>
      <Paragraph position="2">  corpus (on the left side of the figure) map onto one or more seed verbs, which in turn map onto one or more semantic relations.4 We replace all nonseed verbs in the corpus data with the seed verb(s) they map onto, potentially increasing the number of corpus instances.</Paragraph>
      <Paragraph position="3"> Since direct (i.e. level 1) synonyms from Moby's thesaurus are not sufficient to map all verbs onto seed verbs, we also include second-level (i.e. level 2) synonyms, expanding from direct synonyms. Table 1 shows the coverage of sentences for test NCs, in which D indicates direct synonyms and I indicates indirect synonyms.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML