File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/p02-1024_concl.xml

Size: 2,243 bytes

Last Modified: 2025-10-06 13:53:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1024">
  <Title>Exploring Asymmetric Clustering for Statistical Language Modeling</Title>
  <Section position="7" start_page="4" end_page="4" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> There are three main contributions of this paper.</Paragraph>
    <Paragraph position="1"> First, after presenting a formal definition of the ACM, we described in detail the methodology of constructing the ACM effectively. We showed empirically that both the asymmetric clustering and the parameter optimization (i.e. optimal cluster numbers) have positive impacts on the performance of the resulting ACM. The finding demonstrates partially the effectiveness of our research focus: techniques for using clusters (i.e. the ACM) rather than techniques for finding clusters (i.e. clustering algorithms). Second, we explored the actual representation of the ACM and evaluate it on a realistic application - Japanese Kana-Kanji conversion. Results show approximately 6-10% CER reduction of the ACMs in comparison with the word trigram models, even when the ACMs are slightly smaller. Third, the reasons underlying the superiority of the ACM are analyzed. For instance, our analysis suggests the benefit of the ACM comes partially from its better structure and its better smoothing.</Paragraph>
    <Paragraph position="2"> All cluster models discussed in this paper are based on hard clustering, meaning that each word belongs to only one cluster. One area we have not explored is the use of soft clustering, where a word w can be assigned to multiple clusters W with a probability P(W|w) [Pereira et al., 1993]. Saul and Pereira [1997] demonstrated the utility of soft clustering and concluded that any method that assigns each word to a single cluster would lose information. It is an interesting question whether our techniques for hard clustering can be extended to soft clustering. On the other hand, soft clustering models tend to be larger than hard clustering models because a given word can belong to multiple clusters, and thus a training instance P(w</Paragraph>
    <Paragraph position="4"> can lead to multiple counts instead of just 1.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML