File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1014_intro.xml

Size: 4,199 bytes

Last Modified: 2025-10-06 14:06:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1014">
  <Title>Word Triggers and the EM Algorithm</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper, we study the use of so-called word trigger pairs (for short: word triggers) (Bahl et al., 1984, Lau and Rosenfeld, 1993, Tillmann and Ney, 1996) to improve an existing language model, which is typically a trigram model in combination with a cache component (Ney and Essen, 1994).</Paragraph>
    <Paragraph position="1"> We use a reference model p(wlh), i.e. the conditional probability of observing the word w for a given history h. For a trigram model, this history h includes the two predecessor words of the word under consideration, but in general it can be the whole sequence of the last M predecessor words.</Paragraph>
    <Paragraph position="2"> The criterion for measuring the quality of a language model p(Wlh ) is the so-called log-likelihood criterion (Ney and Essen, 1994), which for a corpus Wl, ..., wn, ...wN is defined by:</Paragraph>
    <Paragraph position="4"> According to this definition, the log-likelihood criterion measures for each position n how well the language model can predict the next word given the knowledge about the preceeding words and computes an average over all word positions n. In the context of language modeling, the log-likelihood criterion F is converted to perplexity PP, defined by</Paragraph>
    <Paragraph position="6"> For applications where the topic-dependence of the language model is important, e.g. text dictation, the history h may reach back several sentences so that the history length M covers several hundred words, say, M = 400 as it is for the cache model.</Paragraph>
    <Paragraph position="7"> To illustrate what is meant by word triggers, we give a few examples:  Thus word trigger pairs can be viewed as long-distance word bigrams. In this view, we are faced the problem of finding suitable word trigger pairs. This will be achieved by analysing a large text corpus (i.e. several millions of running words) and learning those trigger pairs that are able to improve the base-line language model. A related approach to capturing long-distance dependencies is based on stochastic variants of link grammars (Pietra and Pietra, 1994). In several papers (Bahl et al., 1984, Lau and Rosenfeld, 1993, Tillmann and Ney, 1996), selection criteria for single word trigger pairs were studied. In this paper, this work is extended as follows: * Single-Trigger Model: We consider the definition of a single word trigger pair. There are two models we consider, namely a backing-off model and a linear interpolation model. For the case of the backing-off model, there is a closed-form solution for estimating the trigger parameter by maximum likelihood. For the linear interpolation model, there is no explicit solution Tillmann ~ Ney 117 Word Triggers and EM Christoph Tillmann and Hermann Ney (1997) Word Triggers and the EM Algorithm. In T.M. Ellison (ed.) CoNLL97: Computational Natural Language Learning, ACL pp 117-124. Q 1997 Association for Computational Linguistics anymore, but this model is better suited for the extension towards a large number of simultaneous trigger pairs.</Paragraph>
    <Paragraph position="8"> Multi-Trigger Model: In practice, we have to take into account the interaction of many trigger pairs. Here, we introduce a model for this purpose. To really use the word triggers for a language model, they must be combined with an existing language model. This is achieved by using linear interpolation between the existing language model and a model for the multi-trigger effects. The parameters of the resulting model, namely the trigger parameters and one interpolation parameter, are trained by the EM algorithm.</Paragraph>
    <Paragraph position="9"> * We present experimental results on the Wall Street Journal corpus. Both the single-trigger approach and the multi-trigger approach are used to improve the perplexity of a baseline language model. We give examples of selected trigger pairs with and without using the EM algorithm. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML