XML Viewer - w03-1106

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/03/w03-1106_relat.xml

Size: 1,322 bytes

Last Modified: 2025-10-06 14:15:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1106">
  <Title>Text Classi cation in Asian Languages without Word Segmentation</Title>
  <Section position="8" start_page="0" end_page="0" type="relat">
    <SectionTitle>
5.5 Related Work
</SectionTitle>
    <Paragraph position="0"> The use ofa7 -gram models has also been extensively investigated in information retrieval. However, unlike previous research (Cavnar and Trenkle, 1994; Damashek, 1995), where researchers have used a7 -grams as features for a traditional feature selection process and then deployed classi ers based on calculating feature-vector similarities, we consider all a7 -grams as features and determine their importance implicitly by assessing their contribution to perplexity. In this way, we avoid an error prone feature selection step.</Paragraph>
    <Paragraph position="1"> Language modeling for text classi cation is a relatively new area. In principle, any language model can be used to perform text categorization. However, a7 -gram models are extremely simple and have been found to be effective in many applications. Teahan and Harper (Teahan and Harper, 2001) used a PPM (prediction by partial matching) model for text categorization where they seek a model that obtains the best compression on a new document.</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML