File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-0603_concl.xml

Size: 3,003 bytes

Last Modified: 2025-10-06 13:53:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0603">
  <Title>Unsupervised Discovery of Morphemes</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
7 Conclusions
</SectionTitle>
    <Paragraph position="0"> In the experiments the online method with the MDL cost function and recursive splitting appeared most successful especially for Finnish, whereas for English the compared methods were rather equal in performance. This is likely to be partially due to the model structure of the presented methods which is especially suitable for languages such as Finnish.</Paragraph>
    <Paragraph position="1"> However, there is still room for considerable improvement in the model structure, especially regarding the representation of contextual dependencies.</Paragraph>
    <Paragraph position="2"> Considering the two examined model optimization methods, the Recursive MDL method performed consistently somewhat better. Whether this is due to the cost function or the splitting strategy cannot be deduced based on these experiments. In the future, we intend to extend the latter method to utilize an MDL-like cost function.</Paragraph>
    <Paragraph position="3"> Table 4: Some English and Finnish word segmentations produced by the three methods. The Finnish words are el&amp;quot;ainl&amp;quot;a&amp;quot;ak&amp;quot;ari (veterinarian, lit. animal doctor), el&amp;quot;ainmuseo (zoological museum, lit. animal museum), el&amp;quot;ainpuisto (zoological park, lit. animal park), and el&amp;quot;aintarha (zoo, lit. animal garden). The suffixes -lle, -n, -on, and -sta are linguistically correct. (Note that in the Sequential ML method the rejection criteria mentioned are not applied on the last round of Viterbi segmentation. This is why two one letter morphs appear in a sequence in the segmentation el&amp;quot;ain + tarh + a + n.) Recursive MDL Sequential ML Linguistica affect affect affect affect + ing affect + ing affect + ing affect + ing + ly affect + ing + ly affect + ing + ly affect + ion affecti + on affect + ion affect + ion + ate affecti + on + at + e affect + ion + ate affect + ion + s affecti + on + s affect + ion + s affect + s affect + s affect + s el&amp;quot;ain + l&amp;quot;a&amp;quot;ak&amp;quot;ari el&amp;quot;ain + l&amp;quot;a&amp;quot;ak&amp;quot;ari el&amp;quot;ainl&amp;quot;a&amp;quot;ak&amp;quot;ari el&amp;quot;ain + l&amp;quot;a&amp;quot;ak&amp;quot;ari + lle el&amp;quot;ain + l&amp;quot;a&amp;quot;ak&amp;quot;ari + lle el&amp;quot;ainl&amp;quot;a&amp;quot;ak&amp;quot;ari + lle el&amp;quot;ain + museo + n el&amp;quot;ain + museo + n el&amp;quot;ainmuseo + n el&amp;quot;ain + museo + on el&amp;quot;ain + museo + on el&amp;quot;ainmuseo + on el&amp;quot;ain + puisto + n el&amp;quot;ain + puisto + n el&amp;quot;ainpuisto + n el&amp;quot;ain + puisto + sta el&amp;quot;ain + puisto + sta el&amp;quot;ainpuisto + sta el&amp;quot;ain + tar + han el&amp;quot;ain + tarh + a + n el&amp;quot;aintarh + an</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML