File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-3008_intro.xml

Size: 4,479 bytes

Last Modified: 2025-10-06 14:03:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-3008">
  <Title>Towards Robust Animacy Classification Using Morphosyntactic Distributional Features</Title>
  <Section position="3" start_page="0" end_page="47" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Animacy is a an inherent property of the referents of nouns which has been claimed to figure as an influencing factor in a range of different grammatical phenomena in various languages and it is correlated with central linguistic concepts such as agentivity and discourse salience. Knowledge about the animacy of a noun is therefore relevant for several different kinds of NLP problems ranging from coreference resolution toparsing and generation.</Paragraph>
    <Paragraph position="1"> In recent years a range of linguistic studies have examined the influence of argument animacy in grammatical phenomena such as differential object marking (Aissen, 2003), the passive construction (Dingare, 2001), the dative alternation (Bresnan et al., 2005), etc. A variety of languages are sensitive to the dimension of animacy in the expression and interpretation of core syntactic arguments (Lee, 2002; Ovrelid, 2004). A key generalisation or tendency observed there is that prominent grammatical features tend to attract other prominent features;1 subjects, for instance, will tend to be animate and agentive, whereas objects prototypically are inanimate and themes/patients.</Paragraph>
    <Paragraph position="2"> Exceptions to this generalisation express a more marked structure, a property which has consequences, for instance, for the distributional properties of the structure in question.</Paragraph>
    <Paragraph position="3"> Even though knowledge about the animacy of a noun clearly has some interesting implications, little work has been done within the field of lexical acquisition in order to automatically acquire such knowledge. OrVasan and Evans (2001) make useofhyponym-relations takenfrom theWordNet resource (Fellbaum, 1998) in order to classify animate referents. However, such a method is clearly restricted to languages for which large scale lexical resources, such as the Word Net, are available. Merlo and Stevenson (2001) present a method for verb classification which relies only on distributional statistics taken from corpora inorder totrain a decision tree classifier to distinguish between three groups of intransitive verbs.</Paragraph>
    <Paragraph position="4"> 1The notion of prominence has been linked to several properties such as most likely as topic, agent, most available referent, etc.</Paragraph>
    <Paragraph position="5">  This paper presents experiments in automatic classification of the animacy of unseen Norwegian common nouns, inspired by the method for verb classification presented in Merlo and Stevenson (2001). The learning task is, for a given common noun, to classify it as either belonging to the class animate or inanimate. Based on correlations between animacy and other linguistic dimensions, a set of morphosyntactic features is presented and shown to differentiate common nouns along the binary dimension of animacy with promising results. Themethodreliesonaggregated relative frequencies for common noun lemmas, hence might be expected to seriously suffer from data sparseness. Experiments attempting to empirically locate a frequency threshold for the classification method will therefore be presented. It turns out that a subset of the chosen morphosyntactic approximators of animacy show a resilience to data sparseness which can be exploited in classification. By backing off to this smaller set of features, we show that we can maintain the same classification accuracy also for lower frequency nouns. The rest of the paper is structured as follows.</Paragraph>
    <Paragraph position="6"> Section 2identifies andmotivates thesetofchosen features for the classification task and describes how these features are approximated through feature extraction from an automatically annotated corpus of Norwegian. In section 3, a group of experiments testing the viability of the method and chosen features is presented. Section 4 goes on to investigate the effect of sparse data on the classification performance and present experiments which address possible remedies for the sparse data problem. Section 5 sums up the main findings of the previous sections and outlines a few suggestions for further research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML