File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/p93-1023_intro.xml

Size: 4,344 bytes

Last Modified: 2025-10-06 14:05:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P93-1023">
  <Title>TOWARDS THE AUTOMATIC IDENTIFICATION OF ADJECTIVAL SCALES: CLUSTERING ADJECTIVES ACCORDING TO MEANING Vasileios Hatzivassiloglou</Title>
  <Section position="3" start_page="0" end_page="172" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> As natural language processing systems become more oriented towards solving real-world problems like machine translation or spoken language understanding in a limited domain, their need for access to vast amounts of knowledge increases. While a model of the general rules of the language at various levels (morphological, syntactic, etc.) can be hand-encoded, knowledge which pertains to each specific word is harder to encode manually, if only because of the size of the lexicon. Most systems currently rely on human linguists or lexicographers who compile lexicon entries by hand. This approach requires significant amounts of time and effort for expanding the system's lexicon. Furthermore, if the compiled information depends in any way on the domain of the application, the acquisition of lexical knowledge must be repeated whenever the system is transported to another domain. For systems which need access to large lexicons, some form of at least partial automation of the lexical knowledge acquisition phase is needed.</Paragraph>
    <Paragraph position="1"> One type of lexical knowledge which is useful for many natural language (NL) tasks is the semantic relatedness between words of the same or different syntactic categories. Semantic relatedness subsumes hyponymy, synonymy, and antonymyincompatibility. Special forms of relatedness are represented in the lexical entries of the WordNet lexical database (Miller et al., 1990). Paradigmatic semantic relations in WordNet have been used for diverse NL problems, including disambiguation of syntactic structure (Resnik, 1993) and semi-automatic construction of a large-scale ontology for machine translation (Knight, 1993).</Paragraph>
    <Paragraph position="2"> In this paper, we focus on a particular case of semantic relatedness: relatedness between adjectives which describe the same property. We describe a technique for automatically grouping adjectives according to their meaning based on a given text corpus, so that all adjectives placed in one group describe different values of the same property. Our method is based on statistical techniques, augmented with linguistic information derived from the corpus, and is completely domain independent. It demonstrates how high-level semantic knowledge can be computed from large amounts of low-level knowledge (essentially plain text, part-of-speech rules, and optionally syntactic relations).</Paragraph>
    <Paragraph position="3"> The problem of identifying semantically related words has received considerable attention, both in computational linguistics (e.g. in connection with thesaurus or dictionary construction (Sparck-Jones, 1986)) and in psychology (Osgood et al., 1957).</Paragraph>
    <Paragraph position="4"> However, only recently has work been done on the automatic computation of such relationships from text, quantifying similarity between words and clustering them ( (Brown et aL, 1992), (Pereira et al., 1993)). In comparison, our work emphasizes the use of shallow linguistic knowledge in addition to a statistical model and is original in the use of negative knowledge to constrain the search space. Furthermore, we use a flexible architecture which will allow us to easily incorporate additional knowledge sources for computing similarity.</Paragraph>
    <Paragraph position="5">  While our current system does not distinguish between scalar and non-scalar adjectives, it is a first step in the automatic identification of adjectival scales, since the scales can be subsequently ordered and the non-scalar adjectives filtered on the basis of independent tests, done in part automatically and in part by hand in a post-editing phase. The result is a semi-automated system for the compilation of adjectival scales.</Paragraph>
    <Paragraph position="6"> In the following sections, we first provide background on scales, then describe our algorithm in detail, present the results obtained, and finally provide a formal evaluation of the results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML