File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/e06-3008_concl.xml

Size: 3,444 bytes

Last Modified: 2025-10-06 13:55:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-3008">
  <Title>Towards Robust Animacy Classification Using Morphosyntactic Distributional Features</Title>
  <Section position="7" start_page="52" end_page="53" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> The above experiments have shown that the classification of animacy for Norwegian common nouns is achievable using distributional data from a morphosyntactically annotated corpus. The chosen morphosyntactic features of animacy have proven to differentiate well between the two classes. As we have seen, the transitive subject, direct object andmorphological genitive provide stable features for animacy even when the data is sparse(r). Four groups of experiments have been reported above which indicate that a reasonable remedy for sparse data in animacy classification consists of backing off to a smaller feature set in classification. These experiments indicate that a classifier trained on highly frequent nouns (experiment 1) backed off to the most frequent features (experiment 3) sufficiently capture generalizations which pertain to nouns with absolute frequencies down to approximately fifty occurrences and enables an unchanged performance approaching 90% accuracy.</Paragraph>
    <Paragraph position="1"> Even so, there are certainly still possibilities for improvement. As is well-known, singleton occurrences of nouns abound and the above classification method is based on data for lemmas, rather than individual instances or tokens. One possibility to be explored is token-based classification of animacy, possibly in combination with a lemma-based approach like the one outlined above.</Paragraph>
    <Paragraph position="2"> Such an approach might also include a finer subdivision of the nouns. We have chosen to classify along a binary dimension, however, it might be argued that this is an artificial dichotomy. (Zaenen et al., 2004) describe an encoding scheme for the manual encoding of animacy information in part of the English Switchboard corpus.</Paragraph>
    <Paragraph position="3"> They make a three-way distinction between human, other animates, and inanimates, where the 'other animates' category describes a rather heterogeneous group of entities: organisations, animals, intelligent machines and vehicles. However, what these seem to have in common is that they may all be construed linguistically as ani- null tions of the most frequent features mate beings, even though they, in the real world, are not. Interestingly, the two misclassified inanimate nouns in experiment 1, were bil 'car' and fly 'air plane', both vehicles. A token-based approach to classification might better capture the context-dependent and dual nature of these types of nouns. Automatic acquisition of animacy in itself is not necessarily the primary goal. By testing the use of acquired animacy information in various NLP applications such as parsing, generation or coreference resolution, we might obtain an extrinsic evaluation measure for the usefulness of animacy information. Since very frequent nouns are usually well described in other lexical resources, it is important that a method for animacy classification is fairly robust to data sparseness. This paper suggests that a method based on seven morphosyntactic features, in combination with feature back-off, can contribute towards such a classification.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML