File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/w94-0106_concl.xml

Size: 2,294 bytes

Last Modified: 2025-10-06 13:57:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W94-0106">
  <Title>DO WE NEED LINGUISTICS WHEN WE HAVE STATISTICS? A COMPARATIVE ANALYSIS OF THE CONTRIBUTIONS OF LINGUISTIC CUES TO A STATISTICAL WORD GROUPING SYSTEM</Title>
  <Section position="9" start_page="51" end_page="51" type="concl">
    <SectionTitle>
7. CONCLUSIONS AND FUTURE WORK
</SectionTitle>
    <Paragraph position="0"> We have showed that all lin$uistic features considered in this study had a posiuve contribution to the performance of the system. Except for ~spellchecking, all these contributions were both statistically significant and large enough to make a differencce in practical situations. Furthermore, the results can be expected to generalize to a wide variety of corpus-based systems for different ap- plications. The cost of incorporating the linguistics-based modules in the system is not prohibitive. The effort needed to implement all the linguistic modules was about 5 person-months, in contrast with 7 person-months needed to develop the basic statistical system. Furthermore, the run-time overhead causedby the linguistic modules is not significant.</Paragraph>
    <Paragraph position="1"> Each takes from lto 7 minutes on a Sun SparcStation 10 to process a million entries (words or pairs) and all except the negative knowledge module need process a corpus only once, reusing the same information for different adjective sets.</Paragraph>
    <Paragraph position="2"> This should be compared to the approximately 15 minutes needed by the statistical component for grouping about 40 adjectives.</Paragraph>
    <Paragraph position="3"> In the future, we plan to extend the results discussed in this paper by an analysis of the dependence of the effects of each parameter on the values of the other parameters. We are currently stratifying the experimental data obtained to study trends in the magnitude of parameter effects as other parameters vary in a controlled manner, and we will examine the interactions with corpus size and specificity of clustered adjectives. We are also interested in providing similar quantitative results for other applications, to corroborate our belief in the generality of the importance of easily obtainable linguistic knowledge for statistical systems. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML