File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/99/w99-0701_abstr.xml

Size: 986 bytes

Last Modified: 2025-10-06 13:49:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0701">
  <Title>Unsupervised Learning of Word Boundary with Description Length Gain</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper presents an unsupervised approach to lexical acquisition with the goodness measure description length gain (DLG) formulated following classic information theory within the minimum description length (MDL) paradigm. The learning algorithm seeks for an optimal segmentation of an utterance that maximises the description length gain from the individual segments. The resultant segments show a nice correspondence to lexical items (in particular, words) in a natural language like English. Learning experiments on large-scMe corpora (e.g., the Brown corpus) have shown the effectiveness of both the learning algorithm and the goodness measure that guides that learning.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML