File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/00/j00-4004_relat.xml
Size: 4,321 bytes
Last Modified: 2025-10-06 14:15:34
<?xml version="1.0" standalone="yes"?> <Paper uid="J00-4004"> <Title>Learning Methods to Combine Linguistic Indicators: Improving Aspectual Classification and Revealing Linguistic Insights</Title> <Section position="9" start_page="622" end_page="623" type="relat"> <SectionTitle> 7. Related Work </SectionTitle> <Paragraph position="0"> The aspectual classification of a clause has thus far been primarily approached from a knowledge-based perspective. For example, Pustejovsky's generative lexicon describes semantic interactions between clausal constituents that effect aspectual class (Pustejovsky 1991). Additionally, Resnik (1996) demonstrates the influence of implicit direct objects on aspectual classification.</Paragraph> <Paragraph position="1"> The application of automatic corpus-based techniques to aspectual classification is in its infancy. Klavans and Chodorow (1992) pioneered the application of statistical corpus analysis to aspectual classification by placing verbs on a scale according to the frequency with which they occur with certain aspectual markers from Table 2. This way, verbs are automatically ranked according to their &quot;degree of stativity.&quot; Machine learning has become instrumental in the development of robust natural language understanding systems in general (Cardie and Mooney 1999). For example, decision tree induction has been applied to word sense disambiguation (Black 1988), determiner prediction (Knight et al. 1995), coordination parsing (Resnik 1993), syntactic parsing (Magerman 1993), and disambiguating clue phrases (Siegel 1994; Siegel and McKeown 1994; Litman 1994). An overview of psycholinguistic issues behind learning for natural language problems in particular is given by Powers and Turk (1989). Models resulting from machine induction have beeen manually inspected to discover linguistic insights for disambiguating clue words (Siegel and McKeown 1994).</Paragraph> <Paragraph position="2"> However, machine learning techniques have not previously been applied to aspectual disambiguation.</Paragraph> <Paragraph position="3"> Previous efforts have applied machine induction methods to coordinate corpus-based linguistic indicators in particular, for example, to classify adjectives according to markedness (Hatzivassiloglou and McKeown 1995), to perform accent restoration (Yarowsky 1994), for sense disambiguation problems (Luk 1995), and for the automatic identification of semantically related groups of words (Pereira, Tishby, and Lee 1993; Hatzivassiloglou and McKeown 1993; Schi.itze 1992).</Paragraph> <Paragraph position="4"> 8. Future Work Parallel bilingual corpora are potential sources of supervised examples for training and testing aspectual classification systems. For example, since many languages have explicit markings corresponding to completedness (as described in Section 2.6), the category of a clause can be determined by its translation.</Paragraph> <Paragraph position="5"> Additional machine learning methods should be evaluated for combining linguistic indicators. For example, neural networks are especially suited for combining numerical inputs, and Naive Bayes models are especially suited for additive concepts. Also, iteratively refining the model (e.g., for logistic regression) may be an important way to eliminate indicators that do not help for a particular classification problem and to eliminate redundancy between indicators that correlate highly with one another.</Paragraph> <Paragraph position="6"> Machine learning techniques may be able to automatically determine how best to measure linguistic indicators, if trained over a large supervised sample. For example, previous work has measured indicators by applying a symbolic expression induced by GP to a subset of clauses in a corpus (Siegel and McKeown 1996). This way, interactions between markers in a clause can be automatically measured. In principle, machine learning techniques could further generalize these methods by automatically inducing Computational Linguistics Volume 26, Number 4 an algorithm that scans a corpus dynamically, depending on what it sees as it processes clauses. This could automatically select relevant markers as well as relevant portions of the corpus for a particular input clause.</Paragraph> </Section> class="xml-element"></Paper>