File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2116_metho.xml
Size: 10,315 bytes
Last Modified: 2025-10-06 14:09:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2116"> <Title>Empirical Acquisition of Differentiating Relations from Definitions</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Differentia Extraction </SectionTitle> <Paragraph position="0"> The approach to differentia extraction is entirely automated. This starts with using the Link Grammar Parser (Sleator and Temperley, 1993), a dependency parser, to determine the syntactic lexical relations that occur in the sentence. Dictionary definitions are often given in the form of sentence fragments with the headword omitted. For example, the definition for the beverage sense of 'wine' is &quot;fermented juice (of grapes especially).&quot; Therefore, prior to running the definition analysis, the definitions are converted into complete sentences, using simple templates for each part of speech.</Paragraph> <Paragraph position="1"> After parsing, a series of postprocessing steps is performed prior to the extraction of the lexical relations. For the Link Parser, this mainly involves conversion of the binary dependencies into relational tuples and the realignment of the tuples around function words. The Link Parser outputs syntactic dependencies among words, punctuation, and sentence boundary markers. The parser uses quite specialized syntactic relations, so these are converted into general ones prior to the extraction of the relational tuples. For example, the relation A, which is used for pre-noun adjectives, is converted into modifies.Figure 1 illustrates the syntactic relations that would be extracted, along with the original parser output.</Paragraph> <Paragraph position="2"> The syntactic relationships are first converted into relational tuples using the format <source-word, relation-word, target-word> .Thisconversionisperformed by following the dependencies involving the content words, ignoring cases involving non-word elements (e.g., punctuation). For example, the first tuple extracted from the parse would be <n:wine, Definition sentence: Wine is fermented juice (of grapes especially).</Paragraph> <Paragraph position="3"> Link Grammar parse: </////, Wd, 1. n:wine> </////, Xp, 10. .> <1. n:wine, Ss, 2. v:is> <10. ., RW, 11. /////> <2. v:is, Ost, 4. n:juice> <3. v:fermented, A, 4. n:juice> <4. n:juice, MXs, 6. of> <5. (, Xd, 6. of> <6. of, Jp, 7. n:grapes> <6. of, Xc, 9. )> Extracted relations: <1. n:wine, 2. v:is, 4. n:juice> <3. v:fermented, modifies-3-4, 4. n:juice> <4. n:juice, 6. of, 7. n:grapes> treated specially by converting the syntactic relationships directly into a relational tuple involving a special relation-indicating word (e.g., 'modifies'). The relational tuples extracted from the parse form the basis for the lexical relations derived from the definition. Structural ambiguity resolution is not addressed here, so the first parse returned is used. The remaining optional step assigns weights to the relations that are extracted.</Paragraph> <Paragraph position="4"> When using the relations in applications, it is desirabletohaveameasureofhowrelevantthere- null lations are to the associated concepts. One such measure would be the degree to which the relation applies to the concept being described as opposed to sibling concepts. To account for this, cue validities are used, borrowing from cognitive psychology (Smith and Medin, 1981). Cue validities can be interpreted as probabilities indicating the degree to which features apply to a given concept versus similar concepts (i.e., P(C|F)).</Paragraph> <Paragraph position="5"> Cue validities are estimated by calculating the percentage of times that the feature is associated with a concept versus the total associations of contrasting concepts. This requires a means of determining the set of contrasting concepts for a given concept. The simplest way of doing this would be to just select the set of sibling concepts (e.g., synsets sharing a common parent in WordNet). However, due to the idiosyncratic way concepts are specialized in knowledge bases, this likely would not include concepts intuitively considered as contrasting.</Paragraph> <Paragraph position="6"> To alleviate this problem the most-informative ancestor will be used instead of the parent. This is determined by selecting the ancestor that best balances frequency of occurrence in a tagged corpus with specificity. This is similar to Resnik's (1995) notion of most-informative subsumer for a pair of concepts. In his approach, estimated frequencies for synsets are percolated up the hierarchy, so that the frequency always increases as one proceeds up the hierarchy. Therefore the first common ancestor for a pair is the most-informative subsumer (i.e., has most information content). Here attested frequencies from SemCor (Miller et al., 1994) are used, so all ancestors are considered. Specificity is accounted for by applying a scaling factor to the frequencies that decreases as one proceeds up the hierarchy. Thus, 'informative' is used more in an intuitive sense rather than technical.</Paragraph> <Paragraph position="7"> More details on the extraction process and the subsequent disambiguation can be found in (O'Hara, forthcoming).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Differentia Disambiguation </SectionTitle> <Paragraph position="0"> After the differentia properties have been extracted from a definition, the words for the relation source and object terms are disambiguated to order to reduce vagueness in the relationships. In addition, the relation types are converted from surface-level relations (e.g., object) or relation-indicating words (e.g., prepositions) into the underlying semantic relation.</Paragraph> <Paragraph position="1"> Since WordNet serves as the knowledge base being targeted, term disambiguation involves selecting the most appropriate synset for both the source and target terms. The WordNet definitions have recently been sense-tagged as part of the Extended WordNet (Novischi, 2002), so these annotations are incorporated. For other dictionaries, use of traditional word-sense disambiguation algorithms would be required.</Paragraph> <Paragraph position="2"> With the emphasis on corpus analysis in computational linguistics, there has been shift away from relying on explicitly coded knowledge towards the use of knowledge inferred from naturally occurring text, in particular text that has been annotated by humans to indicate phenomena of interest. The Penn Treebank version II (Marcus et al., 1994) provided the first large-scale set of case role annotations for general-purpose text. These are very general roles akin to Fillmore's (1968) case roles. The Berkeley FrameNet (Fillmore et al., 2001) project provides the most recent large-scale annotation of semantic roles. These are at a much finer granularity than those in Treebank, so they should prove quite useful for applications learning detailed semantics from corpora. O'Hara and Wiebe (2003) explain how both inventories can be used for preposition disambiguation. null The goal of relation disambiguation is to determine the underlying semantic role indicated by particular words in a phrase or by word order. For relations indicated directly by function words, the disambiguation can be seen as a special case of word-sense disambiguation (WSD). As an example, refining the relationship <'dog', 'with','ears'> into <'dog', has-part,'ears'> , is equivalent to disambiguating the preposition 'with,' given that the senses are the dif-Local-context features POS: part of speech of target word POS[?]i: part-of-speech of ith word to left POS+i: part-of-speech of ith word to right Word: target wordform as is Word[?]i: ith word to the left Word+i: ith word to the right ferent relations it can indicate. For relations that are indicated implicitly (e.g., adjectival modification), other classification techniques would be required, reflecting the more syntactic nature of the task.</Paragraph> <Paragraph position="3"> A straightforward approach for preposition disambiguation would be to use standard WSD features, such as the parts-of-speech of surrounding words and, more importantly, collocations (e.g., lexical associations). Although this can be highly accurate, it tends to overfit the data and to generalize poorly. The latter is of particular concern here as the training data is taken from a different genre (e.g., newspaper text rather than dictionary definitions). To overcome these problems, a class-based approach is used for the collocations, with WordNet high-level synsets as the source of the word classes. Figure 2 lists the features used for the classifier.</Paragraph> <Paragraph position="4"> For the application to differentia disambiguation, the classifiers learned over Treebank and FrameNet need to be combined. This can be done readily in a cascaded fashion with the classifier for the most specific relation inventory (i.e., FrameNet) being used first and then the other classifiers being applied in turn whenever the classification is inconclusive. This has the advantage that new resources can be integrated into the combined relation classifier with minimal effort. However, the resulting role inventory will likely be heterogeneous and might be prone to inconsistent classifications. In addition, the role inventory could change whenever new annotation resources are incorporated, making the differentia disambiguation system less predictable.</Paragraph> <Paragraph position="5"> Alternatively, the annotations can be converted into a common inventory, and a separate relation classifier induced over the resulting data. This has the advantage that the target relation-type inventory remains stable whenever new sources of relation annotations are introduced. The drawback however is that annotations from new resources must first be mapped into the common inventory before incorporation. The latter approach is employed here. The common inventory incorporates some of the general relation types defined by Gildea and Jurafsky (2002) for their experiments in classifying semantic rela-</Paragraph> </Section> class="xml-element"></Paper>