File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1027_intro.xml

Size: 4,090 bytes

Last Modified: 2025-10-06 14:05:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1027">
  <Title>A Quantitative Evaluation of Linguistic Tests for the Automatic Prediction of Semantic Markedness</Title>
  <Section position="3" start_page="197" end_page="197" type="intro">
    <SectionTitle>
2 Motivation
</SectionTitle>
    <Paragraph position="0"> The goal of our work is twofold: First, we are interested in providing hard, quantitative evidence on the performance of markedness tests already proposed in the linguistics literature. Such tests are based on intuitive observations and/or particular theories of semantics, but their accuracy has not been measured on actual data. The results of our analysis can be used to substantiate theories which are compatible with the empirical evidence, and thus offer insight into the complex linguistic phenomenon of antonymy.</Paragraph>
    <Paragraph position="1"> The second purpose of our work is practical applications. The semantically unmarked term is almost always the positive term of the opposition (Boucher and Osgood, 1969); e.g., high is positive, while low is negative. Therefore, an automatic method for determining markedness values can also be used to determine the polarity of antonyms. The work reported in this paper helps clarify which types of data and tests are useful for such a method and which are not.</Paragraph>
    <Paragraph position="2"> The need for an automatic corpus-based method for the identification of markedness becomes apparent when we consider the high number of adjectives in unrestricted text and the domain-dependence of markedness values. In the MRC Psycholinguistic Database (Coltheart, 1981), a large machine-readable annotated word list, 25,547 of the 150,837 entries (16.94%) are classified as adjectives, not including past participles; if we only consider regularly used grammatical categories for each word, the percentage of adjectives rises to 22.97%. For comparison, nouns (the largest class) account for 51.28% and 57.47% of the words under the two criteria.</Paragraph>
    <Paragraph position="3"> In addition, while adjectives tend to have prevalent markedness and polarity values in the language at large, frequently these values are negated in specific domains or contexts. For example, healthy is in most contexts the unmarked member of the opposition healthy:sick; but in a hospital setting, sickness rather than health is expected, so sick becomes the unmarked term. The methods we describe are based on the form of the words and their overall statistical properties, and thus cannot predict specific occurfences of markedness reversals. But they can predict the prevalent markedness value for each adjective in a given domain, something which is impractical to do by hand separately for each domain.</Paragraph>
    <Paragraph position="4"> We have built a large system for the automatic, domain-dependent classification of adjectives according to semantic criteria. The first phase of our system (Hatzivassiloglou and McKeown, 1993) separates adjectives into groups of semantically related ones. We extract markedness values according to the methods described in this paper and use them in subsequent phases of the system that further analyze these groups and determine their scalar structure.</Paragraph>
    <Paragraph position="5"> An automatic method for extracting polarity information would also be useful for the augmentation of lexico-semantic databases such as WordNet (Miller et al., 1990), particularly when the method accounts for the specificities of the domain sublanguage; an increasing number of NLP systems rely on such databases (e.g., (Resnik, 1993; Knight and Luk, 1994)). Finally, knowledge of polarity can be combined with corpus-based collocation extraction methods (Smadja, 1993) to automatically produce entries for the lexical functions used in Meaning-Text Theory (Mel'~uk and Pertsov, 1987) for text generation. For example, knowing that hearty is a positive term enables the assignment of the collocation hearty eater to the lexical function entry MAGS( eater)=-hearty. 1</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML