File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-1017_concl.xml

Size: 2,426 bytes

Last Modified: 2025-10-06 13:55:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1017">
  <Title>Unknown word sense detection as outlier detection</Title>
  <Section position="9" start_page="133" end_page="134" type="concl">
    <SectionTitle>
8 Conclusion and outlook
</SectionTitle>
    <Paragraph position="0"> We have defined and addressed the problem of unknown word sense detection: the identification of corpus occurrences that are not covered by a given sense inventory, using a training set of sense-annotated data as a basis. We have modeled this problem as an instance of outlier detection, using the simple nearest neighbor-based approach of Tax and Duin to measure the resemblance of a new occurrence to the training data. In combination with a method that alleviates data sparseness by sharing training data across lemmas, the approach achieves good results that make it usable in practice: With items represented as vectors of context words (including lemma, POS and NE), the system achieves 0.77 precision and 0.82 recall in an evaluation on FrameNet 1.2. The training set extension method,  which proved crucial to our approach, relies solely on a grouping of annotated data by semantic similarity. As such, the method is applicable to any resource that groups words into semantic classes, for example WordNet.</Paragraph>
    <Paragraph position="1"> For this first study on unknown sense detection, we have chosen a maximally simple outlier detection method; many extensions are possible. One obvious possibility is the extension of Tax and Duin's method to more than one nearest training neighbor for a more accurate estimate of local density. Furthermore, more sophisticated feature vectors can be employed to generalize over context words, and other outlier detection approaches (Markou and Singh, 2003a; Markou and Singh, 2003b; Marsland, 2003) can be tested on this task.</Paragraph>
    <Paragraph position="2"> Our immediate goal is to use unknown sense detection in combination with WSD, to filter out items that the WSD system cannot handle due to missing senses. Once items have been identified as unknown, they are available for further processing: If possible one would like to assign some measure of sense information even to these items. Possibilities include associating items with similar existing senses (Widdows, 2003; Curran, 2005; Burchardt et al., 2005) or clustering them into approximate senses.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML