File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1080_intro.xml

Size: 1,684 bytes

Last Modified: 2025-10-06 14:03:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1080">
  <Title>Self-Organizing D2-gram Model for Automatic Word Spacing</Title>
  <Section position="4" start_page="633" end_page="633" type="intro">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> Many previous work has explored the possibility of automatic word spacing. While most of them reported high accuracy, they can be categorized into two parts in methodology: analytic approach and statistical approach. The analytic approach is based on the results of morphological analysis.</Paragraph>
    <Paragraph position="1"> Kang used the fundamental morphological analysis techniques (Kang, 2000), and Kim et al. distinguished each word by the morphemic information of postpositions and endings (Kim et al., 1998).</Paragraph>
    <Paragraph position="2"> The main drawbacks of this approach are that (i) the analytic step is very complex, and (ii) it is expensive to construct and maintain the analytic knowledge.</Paragraph>
    <Paragraph position="3"> In the other hand, the statistical approach extracts from corpora the probability that a space is put between two syllables. Since this approach can obtain the necessary information automatically, it does require neither the linguistic knowledge on syllable composition nor the costs for knowledge construction and maintenance. In addition, the fact that it does not use a morphological analyzer produces solid results even for unknown words.</Paragraph>
    <Paragraph position="4"> Many previous studies using corpora are based on bigram information. According to (Kang, 2004), the number of syllables used in modern Korean is</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML