File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/w94-0106_intro.xml

Size: 4,227 bytes

Last Modified: 2025-10-06 14:05:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W94-0106">
  <Title>DO WE NEED LINGUISTICS WHEN WE HAVE STATISTICS? A COMPARATIVE ANALYSIS OF THE CONTRIBUTIONS OF LINGUISTIC CUES TO A STATISTICAL WORD GROUPING SYSTEM</Title>
  <Section position="3" start_page="0" end_page="43" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> The idea of integrating statistical and knowledge-based approaches for natural language problems, has been. recently. . gainin$ ground, in. the computational lingmsucs commumty, as it is expected that a combined approach will offer significantly better performance over either methodology alone. This paper supplements this intuitive belief with actual evaluatzon data, obtained when several linguistics-based modules were integrated in a statistical system.</Paragraph>
    <Paragraph position="1"> We used a system we previously developed for the separation of adjectives into semantic groups \[Hatzivassiloglou and McKeown, 1993\] as the basis for our comparative analysis. We identified several different types of shallow linguistic knowledge that can be efficiently introduced into our system. We evaluated the system with and  without each such feature, obtaining an estimate of each feature's positive or negative contribution to the overall performance. By matching cases where all system parameters are the same except for one teature, we assess the statistical significance of the differences found. Also, a statistical model of the system's performance in terms of the active features for each run offers a view of the contributions of features from a different angle, contrasting the significance of linguistic features (or other modeled system parameters) against each other. Our analysis of the experimental results showed that many forms of li%.uistic knowledge have a significant positive conmbution to the performance of the system. We attribute to the combined effect of the linguistic knowledge modules the ability of our system to perform fine-tuned classification of adjectives into semantic classes.</Paragraph>
    <Paragraph position="2"> Other statistical systems that address word classification probleans do not emphasize the use of linguistic knowledge and do not deal with a specific word class\[Brown et al., 1992\], or do not exploit as much linguistic knowledge as we do \[Pereira et al., 1993\]. As a result, a coarser classification is usually produced. In contrast, by limiting the system's input to adjectives, we can take advantage of specific syntactic relationships and additional faltering procedures that apply only to ,particular word classes. These sources of lingmstic knowledge provide in turn the extra eedgc for discriminating among the adjectives at the semantic level.</Paragraph>
    <Paragraph position="3"> Our&amp;quot; adjective grouping system can be used for applications such as natural lansuage generation (where knowledge of the semanuc groups and of the ordering of the elements within them allows the precise lexiealization of semantic concepts \[Elhadad, 1991\]) and computational lexicography (by automatically eompifing domain-dependent lists of synonyms and antonyms). The produced groups can also help correct erroneous usage of multiple qualifiers that are superfluc~ts or contradict each other, a phenomenon that has been observed in medical reports 1. But in addition to the immediate applications of word classification, many other sfatistical NLP applications can be cast in a similar framework. Therefore, the positive effects of linguistic knowledge on our system indicate that the incorpo/'ation of linguistic knowledge will probably result in similar b~efits for other applications as well.</Paragraph>
    <Paragraph position="4"> In what follows, we briefly review our adjective grouping system, and then present the !ingui.'s - tic features we explored and the alternatives tor each of them. In Section 5 we give the results of our evaluation on different combinations of fea- tures and we analyze their significance. We also ~rresent these results in a predictor-response amework, and we conclude by discussing the applicability of our results to other NLP problems.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML