XML Viewer - w06-3504

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3504_metho.xml
Size: 19,685 bytes
Last Modified: 2025-10-06 14:10:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3504">
  <Title>Increasing the coverage of a domain independent dialogue lexicon with VERBNET</Title>
  <Section position="5" start_page="25" end_page="25" type="metho">
    <SectionTitle>
2 The TRIPS Lexicon
</SectionTitle>
    <Paragraph position="0"> The TRIPS lexicon (Dzikovska, 2004) is the target of the mapping procedure we describe in Section 4. It includes syntactic and semantic information necessary to build semantic representations usable in dialogue systems. The TRIPS parser is equipped with a fairly detailed grammar, but a major restriction on coverage in new domains is often lack of lexical information. The lexicon used in our evaluation comprised approximately 700 verb lemmas with 1010 senses (out of approximately 2500 total word senses, covering both open- and closed-class words). The lexicon is designed for incremental growth, since the lexical representation is domain-independent and the added words are then re-used in new domains.</Paragraph>
    <Paragraph position="1"> A graphical representation of the information stored in the TRIPS lexicon and used in parsing is shown in Figure 1. The lexicon is a list of canonical word entries each of which is made of a set of sense definitions comprised of a LF type and a syntax-semantic template.</Paragraph>
    <Paragraph position="2"> Semantic classes (LF types) in the TRIPS lexicon are organised in a domain-independent ontology (the LF ontology). The LF Ontology was originally based on a simplified version of FRAMENET source Grammar (Copestake and Flickinger, 2000) build deep semantic representations which account for scoping and temporal structure, their lexicons do not provide information related to word senses and role labels, in part due to the additional difficulty involved building a wide coverage lexicon with the necessary lexical semantic information.</Paragraph>
    <Paragraph position="3"> The tourists admired the paintings  inition for mapping between syntactic and semantic roles.</Paragraph>
    <Paragraph position="4"> (Baker et al., 1998; Dzikovska et al., 2004), with each LF type describing a particular situation, object or event and its participants. Syntax-Semantics Templates (or templates) capture the linking between the syntax and semantics (LF type and semantic roles) of a word. The semantic properties of an argument are described by means of a semantic role assigned to it and selectional restrictions.2 The TRIPS grammar contains a set of independently described lexical rules, such as the passive or dative shift rules, which are designed to create non-canonical lexical entries automatically, while preserving the linking properties defined in the canonical entry.</Paragraph>
    <Paragraph position="5"> In this context adding an entry to the lexicon requires determining both the list of LF types and the list of templates for canonical contexts, that is, the list of mappings between a logical frame and a canonical subcategorization frame.</Paragraph>
  </Section>
  <Section position="6" start_page="25" end_page="26" type="metho">
    <SectionTitle>
3 VERBNET
</SectionTitle>
    <Paragraph position="0"> VERBNET (Kipper et al., 2000) provides an actual implementation of the descriptive work carried out by Levin (1993), which has been extended to cover prepositional constructions and corpus-based sub-categorization frames (Kipper et al., 2004; Kipper et al., 2006).</Paragraph>
    <Paragraph position="1"> VERBNET is a hierarchical verb lexicon in which verbs are organised in classes. The fundamental assumption underlying the classification is that the members of a given class share a similar syntactic 2The selectional restrictions are domain independent and specified using features derived from EuroWordNet (Vossen, 1997; Dzikovska et al., to appear).</Paragraph>
    <Paragraph position="2">  behaviour, that is, they pattern in the same set of alternations, and are further assumed to share common semantic properties.3 VERBNET classes are organised in an inheritance hierarchy. Each class includes a set of members (verbs), a set of (subcategorization) frames and a set of semantic descriptions. Frames are descriptions of the linking between syntax and semantics for that class. Each frame argument contains a syntactic category augmented with syntactic features, and a corresponding thematic role. Each class also specifies a set of additional selectional restriction features. VERBNET further includes for each class a semantic description stated in terms of event semantics, that we ignore in this paper.</Paragraph>
  </Section>
  <Section position="7" start_page="26" end_page="28" type="metho">
    <SectionTitle>
4 Methodology
</SectionTitle>
    <Paragraph position="0"> The methodology used in the mapping process consists of two steps. First we translate the source, VERBNET, to an intermediate representation best suited for parsing purposes. Second this intermediate representation is translated to a specific target, here the TRIPS lexicon. At this stage of our work, the translation from VERBNET to the intermediate representation mainly concerns normalising syntactic information coded in VERBNET to make them easier to handle for parsing purposes, and the translation from the intermediate representation to the TRIPS lexicon focuses on translating semantic information. This architecture is best understood as a cross compilation scheme: we further expect to reuse this intermediate representation for producing outputs for different parsers and to accept inputs from other lexical databases such as FRAMENET.</Paragraph>
    <Section position="1" start_page="26" end_page="27" type="sub_section">
      <SectionTitle>
4.1 The intermediate representation
</SectionTitle>
      <Paragraph position="0"> The intermediate representation is a lexical representation scheme mainly tailored for parsing: in this context, a lexicon is thus made of a set of words, each of which consists of a lemma, a syntactic category and a list of sense definitions. Each sense definition has a name and a frame. The name of the sense definition is actually the name of the VERBNET class it derives from. The frame of the sense definition has a list of arguments, each of which con- null pothesis (Kipper, 2005).</Paragraph>
      <Paragraph position="1"> sists of a syntactic category, a syntactic function, a thematic role and possibly a set of prepositions and syntactic feature structures.</Paragraph>
      <Paragraph position="2"> The content of the intermediate representation uses the following data categories. Syntactic categories, thematic roles and features are those used in VERBNET. We further add the syntactic functions described in (Carroll et al., 1998). Specifically, two categories left implicit in VERBNET by the use of feature structures are made explicit here: prepositional phrases (PP) and sentential arguments (S). Each argument described in a sense definition frame is marked with respect to its coreness status.</Paragraph>
      <Paragraph position="3"> The coreness status aims to provide the lexicon with an operational account for common discrepancies between syntax and semantics descriptions. This status may be valued as core, non-core or non-sem and reflects the status of the argument with respect to the syntax-semantics interface.</Paragraph>
      <Paragraph position="4"> Indeed, there is a methodological pitfall concerning the mapping between thematic roles and syntactic arguments: semantic arguments are not defined following criteria identical to those for syntactic arguments. The main criterion for describing semantic arguments is their participation in the event, situation, object described by the frame whereas the criterion for describing syntactic arguments is based on the obligatoriness or the specificity of the argument with respect to the verb. The following example illustrates such conflicts:  (1) a. It is raining  b. I am walking to the store The It in example (1a) plays no role in the semantic representation, but is obligatory in syntax since it fills a subject position. The locative PP in example (1b) is traditionally not treated as an argument in syntax, rather as a modifier, hence it does not fill a complement position. Such phrases are, however, classified in VERBNET as part of the frames. Following this, we distinguish three kinds of arguments: non-sem as in (1a) are syntactic-only arguments with no semantic contribution. non-core as in (1b) contribute to the semantics but are not subcategorized.</Paragraph>
    </Section>
    <Section position="2" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
4.2 From VERBNET to the intermediate
</SectionTitle>
      <Paragraph position="0"> representation Given VERBNET as described in Section 3 and the intermediate representation we described above, the translation process requires mainly (1) to turn the class based representation of VERBNET into a listof-word based representation (2) to mark arguments for coreness (3) to merge some arguments and (4) to annotate arguments with syntactic functions.</Paragraph>
      <Paragraph position="1"> The first step is quite straightforward. Every member m of every VERBNET class C is associated with every frame of C yielding a new sense definition in the intermediate representation for m. In the second step, each argument receives a coreness mark. Arguments marked as non-core are adverbs, and prepositional phrases introduced by a large class of prepositions (e.g. spatial prepositions). The arguments marked as non-sem are those with an impersonal it, typically members of the weather class. All other arguments listed in VERBNET frames are marked as core.</Paragraph>
      <Paragraph position="2"> In the third step, syntactic arguments are merged to correspond better to phrase-based syntax.4 For example, the VERBNET encoding of subcategorization frames splits prepositional frames on two slots: one for the preposition and one for the noun phrase.</Paragraph>
      <Paragraph position="3"> We have merged the two arguments, to become a PP, also merging their syntactic and semantic features. Other merges at this stage include merging possessive arguments such as John's brother which are described with three argument slots in VERBNET frames. We merged them as a single NP.</Paragraph>
      <Paragraph position="4"> The last step in the translation is the inference of syntactic functions. It is possible to reasonably infer syntactic functions from positional arguments and syntactic categories by (a) considering the following oblicity order over the set of syntactic functions used in the intermediate representation:5  (2) NCSUBJ &lt; DOBJ &lt; OBJ2 &lt;{IOBJ, XCOMP,CCOMP} 4We also relabel some categories for convenience without affecting the process. For instance, VERBNET labels both clausal arguments and noun phrases with the category NP. The difference is made with syntactic features. We take advantage of the features to relabel clausal arguments with the category S.  ordered wrt to each other. These functions are the subset of the functions described in (Carroll et al., 1998) relevant for handling VERBNET data.</Paragraph>
      <Paragraph position="5"> and by (b) considering this problem as a transduction problem over two tapes. One tape being the tape of syntactic categories and the second the tape of syntactic functions. Given that, we designed a transducer that implements a category to function mapping. It implements the above oblicity order together with an additional mapping constraint: nouns can only map to NCSUBJ, DOBJ, prepositional phrases can only map to OBJ2, IOBJ, infinitival clauses can only map to XCOMP and finite clauses to CCOMP.</Paragraph>
      <Paragraph position="6"> We further added refinements to account for frames that do not encode their arguments following the canonical oblicity order: for dealing with dative shift encoded in VERBNET with two different frames and for dealing with impersonal contexts, so that we eventually used the transducer in Figure 2. All states except 0 are meant to be final. The transduction operates only on core and non-sem arguments, non-core arguments are systematically associated with an adjunct function. This transducer is capable of correctly handling the majority of VERBNET frames, finding a functional assignment for more than 99% of the instances.</Paragraph>
    </Section>
    <Section position="3" start_page="27" end_page="28" type="sub_section">
      <SectionTitle>
4.3 From Intermediate representation to TRIPS
</SectionTitle>
      <Paragraph position="0"> Recall that a TRIPS lexical entry is comprised of an LF type with a set of semantic roles and a template representing the mappings from syntactic functions to semantic roles. Converting from our intermediate representation to the TRIPS format involves two steps:  ments, and generate the appropriate mapping in the TRIPS format.</Paragraph>
      <Paragraph position="1"> We investigated two strategies to align semantic classes (VERBNET classes and TRIPS LFs). Both use a class intersection algorithm as a basis for decision: two semantic classes are considered a match if they are associated with the same lexical items. The intersection algorithm takes advantage of the fact that both VERBNET and TRIPS contain lexical sets. A lexical set for VERBNET is a class name and the set of its members, for TRIPS it is an LF type and the set of words that are associated with it in the lexicon. Our intersection algorithm computes the intersection between every VERBNET lexical set and every TRIPS lexical set. The sets which intersect are then considered as candidate mappings from a VERBNET class to a TRIPS class.</Paragraph>
      <Paragraph position="2"> However, this technique produces many 1-word class intersections, and leads to spurious entries. We considered two ways of improving precision: first by requiring a significantly large intersection, second by using syntactic structure as a filter. We discuss them in turn.</Paragraph>
    </Section>
    <Section position="4" start_page="28" end_page="28" type="sub_section">
      <SectionTitle>
4.4 Direct Mapping Between Semantic
Representations
</SectionTitle>
      <Paragraph position="0"> The first technique which we tried for mapping between TRIPS and VERBNET semantic representations is to map the classes directly. We consider all candidate mappings between the TRIPS and VERBNET classes, and take the match with the largest intersection. We then align the semantic roles between the two classes and produce all possible syntax-semantics mappings specified by VERBNET.</Paragraph>
      <Paragraph position="1"> This technique has the advantage of providing the most complete set of syntactic frames and syntax-semantics mappings which can be retrieved from VERBNET. However, since VERBNET lists many possible subcategorization frames for every word, guessing the class incorrectly is very expensive, resulting in many spurious senses generated. We use a class intersection threshold to improve reliability.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="28" end_page="29" type="metho">
    <SectionTitle>
VERBNET ROLE TRIPS ROLES
</SectionTitle>
    <Paragraph position="0"> At present, we count an LF type match as successfully guessed if there is an intersection in lexical entries above the threshold (we determined 3 words as a best value by finding an optimal balance of precision/recall figures over a small gold-standard mapping set). Since the classes contain closely related items, larger intersection means a more reliable mapping. If the VERBNET class is not successfully mapped to an LF type then no TRIPS lexical entry is generated.</Paragraph>
    <Paragraph position="1"> Once the correspondence between the LF type and the VERBNET class has been established, semantic arguments have to be aligned between the two classes. We established a role mapping table (a sample is shown in Table 1), which is an extended version of the mapping from Swift (2005). The role mapping is one to many (each VERBNET role maps to 1 to 8 TRIPS roles), however, since the appropriate LF type has been identified prior to argument mapping, we usually have a unique mapping based on the roles defined by the LF type.6 Once the classes and semantic roles have been aligned, the mapping of syntactic functions between the intermediate representation and TRIPS syntax is quite straightforward. Functional and category mappings are one to one and do not raise specific problems. Syntactic features are also translated into TRIPS representation.</Paragraph>
    <Paragraph position="2"> To illustrate the results obtained by the automatic mapping process, two of the sense definitions generated for the verb relish are shown in Figure 3. The TRIPS entries contain references to the class description in the TRIPS LF ontology (line introduced by 6In rare cases where more than 1 correspondence is possible, we are using the first value in the intersection as the default.  LF-PARENT) and to a template (line introduced by TEMPL) generated on the fly by our syntactic conversion algorithm. The first sense definition and template in Figure 3 represent the same information shown graphically in Figure 1. Each argument in a template is assigned a syntactic function, a feature structure describing its syntactic properties, and a mapping to a semantic role defined in the LF type definition (not depicted here).</Paragraph>
    <Section position="1" start_page="29" end_page="29" type="sub_section">
      <SectionTitle>
4.5 Filtering with syntactic structure
</SectionTitle>
      <Paragraph position="0"> The approach described in the previous section provides a fairly complete set of subcategorization frames for each word, provided that the class correspondence has been established successfully. However, it misses classes with small intersections and classes for which some but not all members match (see Section 5 for discussion). To address these issues we tried another approach that automatically generates all possible class matches between TRIPS and VERBNET, again using class member intersection, but using the a TRIPS syntactic template as an additional filter on the class match. For each potential match, a human evaluator is presented with the following: {confidence score {verbs in TRIPS-VN class intersection}/ LF-type TRIPS-template =&gt; VN-class: {VN class members}} The confidence score is based on the number of verbs in the intersection, weighted by taking into account the number of verbs remaining in the respective TRIPS and VERBNET classes. The template used for filtering is taken from all templates that occur with the TRIPS words in this intersection (one match per template is generated for inspection). For example: 93.271% {clutch,grip,clasp,hold,wield,grasp}/ lf::body-manipulation agent-theme-xp-templ =&gt; hold-15.1-1: {handle} This gives the evaluator additional syntactic information to make the judgement on class intersections. The evaluator can reject entire class matches, or just individual verbs from the VERBNET class which don't quite fit an otherwise good match. We only used the templates already in TRIPS (those corresponding to each of the word senses in the intersection) to avoid overwhelming the evaluator with a large number of possibly spurious template matches resulting from an incorrect class match. This technique allows us to pick up class matches based on a single member intersection, such as: 7.814% {swallow}/ lf::consume agent-theme-xp-templ =&gt; gobble-39.3-2: {gulp,guzzle,quaff,swig} However, the entries obtained are not guaranteed to cover all frames in VERBNET because if a given alternation is not already covered in TRIPS, it is not derived from VERBNET with this method.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML