File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1601_intro.xml

Size: 5,424 bytes

Last Modified: 2025-10-06 14:03:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1601">
  <Title>Statistical Generation: Three Methods Compared and Evaluated</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Traditionally, NLG systems have been built as deterministic decision-makers, that is to say, they generate one string of words for any given input, in a sequence of decisions that increasingly specify the word string. In practice, this has meant carefully handcrafting generators to make decisions locally, at each step in the generation process. Such generators tend to be specialised and domain-specific, are hard to adapt to new domains, or even subdomains, and have no way of dealing with incomplete or incorrect input. Wide-coverage tools only exist for surface realisation, and tend to require highly specific and often idiosyncratic inputs. The rest of NLP has reached a stage where state-of-the-art tools are expected to be generic: wide-coverage, reusable and robust. NLG is lagging behind the field on all three counts.</Paragraph>
    <Paragraph position="1"> The last decade has seen a new generation of NLG methodologies that are characterised by a separation between the definition of the generation space (all possible generation processes from inputs to outputs) on the one hand, and control over the decisions that lead to a (set of) output realisation(s), on the other.</Paragraph>
    <Paragraph position="2"> In making this separation, generate-and-select NLG takes several crucial steps towards genericity: reusing systems becomes easier if the selection mechanism can be adjusted or replaced separately, without changing the definition of the generation space; coverage can be increased more easily if every expansion of the generation space does not have to be accompanied by handcrafted rules controlling the resulting increase in nondeterminism; and certain types of selection methods can provide robustness, for example through probabilistic choice.</Paragraph>
    <Paragraph position="3"> Statistical generation has aroused by far the most interest among these methods, and it has mostly meant n-gram selection: a packed representation of all alternative (partial) realisations is produced, and an n-gram language model is applied to select the most likely realisation. N-gram methods have several desirable properties: they offer a fully automatic method for building and adapting control mechanisms for generate-and-select NLG from raw corpora (reusability); they base selections on statistical models (robustness); and they can potentially be used for deep as well as surface generation. null However, n-gram models are expensive to apply: in order to select the most likely realisation according to an n-gram model, all alternative realisations have to be generated and the probability of each realisation according to the model has to be calculated. This can get very expensive (even if packed representations of the set of alternatives are used), especially when the system accepts incompletely specified input, because the number of alternatives can be vast. In Halogen, Langkilde [2000] deals with trillions of alternatives, and the generator used in the experiments reported in this paper has up to 1040 alternative realisations (see Section 4.3 for empirical evidence of the relative inefficiency of n-gram generation). null Furthermore, n-gram models have a built-in bias towards shorter strings. This is because they calculate the likelihood of a string of words as the joint probability of the words, or, more precisely, as the product of the probabilities of each word given the n[?]1 preceding words. The likelihood of any string will therefore generally be lower than that of any of its substrings (see Section 4.3 for empirical evidence of this bias). This is wholly inappropriate for NLG where equally good realisations can vary greatly in length (see Section 5 for discussion of normalisation for length in statistical modelling). null The research reported in this paper is part of an ongoing research project1 the purpose of which is to investigate issues in generic NLG. The experiments (Section 4.3) were carried out to evaluate and compare different methods for exploiting the frequencies of word sequences and word sequence cooccurrences in raw text corpora to build models for NL generation, and different ways of using such models during generation. One of the methods uses a standard 2-gram model for selection among all realisations (with a new selection algorithm, see Section 4.3). The other two use a treebank-trained model of the generation space (Section 3). The basic idea behind treebank-training of generators is simple: determine for the strings and substrings in the corpus the different ways in which the generator could have generated them, i.e.</Paragraph>
    <Paragraph position="4"> the different sequences of decisions that lead to them, then collect frequency counts for individual decisions, and determine a probability distribution over decisions on this basis. In the experiments, treebank-trained generation models are combined with two different ways of using them during generation, one locally and one globally optimal (Section 3.1).</Paragraph>
    <Paragraph position="5"> All three methods are evaluated on a corpus of weather forecasts (Section 4.1).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML