File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1628_intro.xml

Size: 2,615 bytes

Last Modified: 2025-10-06 14:03:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1628">
  <Title>2Information and Communication Technologies</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Narrowing the Search Space: A Description
</SectionTitle>
    <Paragraph position="0"> of the Statistical Sentence Generation Problem In this work, sentence generation is essentially a search for the most probable sequence of words, given some source text. However, this constitutes an enormous space which requires ef cient searching. Whilst reducing a vocabulary to a suitable subset narrows this space somewhat, we can use statistical models, representing properties of language, to prune the search space of word sequences further to those strings that re ect real language usage. For example, n-gram models limit the word sequences examined to those that seem grammatically correct, at least for small windows of text.</Paragraph>
    <Paragraph position="1"> However, n-grams alone often result in sentences that, whilst near-grammatical, are often just gibberish. When combined with a (word) content selection model, we narrow the search space even further to those sentences that appear to make sense. Accordingly, approaches such as Witbrock and Mittal [1999] and Wan et al. [2003] have investigated models that improve the choice of words in the sentence. Witbrock and Mittal's content model chooses words that make good headlines, whilst that of Wan et al. attempts to ensure that, given a short document like a news article, only words from sentences of the same subtopic are combined to form a new sentences. In this paper, we narrow the search space to those sequences that conserve dependency structures from within the input text.</Paragraph>
    <Paragraph position="2"> Our algorithm extension essentially passes along the long-distance context of dependency head information of the preceding word sequence, in order to in uence the choice of the next word appended to the sentence. This dependency structure is constructed statistically by an O(n) algorithm, which is folded into the Viterbi algorithm. Thus, the extension is in an O(n4) algorithm. The use of dependency relations further constrains the search space. Competing paths through the search space are ranked taking into account the proposed dependency structures of the partially generated word sequences. Sentences with probable dependency structures are ranked higher. To model the probability of a dependency relation, we use the statistical dependency models inspired by those described in Collins [1996].</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML