File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/e95-1008_concl.xml
Size: 1,805 bytes
Last Modified: 2025-10-06 13:57:22
<?xml version="1.0" standalone="yes"?> <Paper uid="E95-1008"> <Title>Collocation Map for Overcoming Data Sparseness</Title> <Section position="5" start_page="55" end_page="58" type="concl"> <SectionTitle> 4 Conclusion </SectionTitle> <Paragraph position="0"> Corpus based natural language processing has been one of the central subjects gaining rapid attention from the research community. The major virtue of statistical approaches is in evaluating linguistic events and determining the relative importance of the events to resolve ambiguities. The evaluation on the events (mostly cooccurrences) in many cases, however, has been unreliable because of the lack of data.</Paragraph> <Paragraph position="1"> Data sparseness addresses the shortage of data in estimating probabilistic parameters. As a result, there are too many events unobserved, and even if events have been found, the occurrence is not sufficient enough for the estimation to be reliable. null In contrast with existing methods that are based on strong assumptions, the method using Collocation map promises a logical approximation since it is built on a thorough formal argument of Bayesian probability theory. The powerful feature of the framework is the ability to make use of the conditional independence among word units and to make associations about unseen cooccurrences based on observed ones. This naturally induces the attributes required to deal with data sparseness. Our experiments confirm that Collocation map makes predictive approximation and avoids overestimation of infrequent occurrences.</Paragraph> <Paragraph position="2"> One critical drawback of Collocation map is the time complexity, but it can be useful for applications of limited scope.</Paragraph> <Paragraph position="4"/> </Section> class="xml-element"></Paper>