File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2311_metho.xml

Size: 12,250 bytes

Last Modified: 2025-10-06 14:09:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2311">
  <Title>The Importance of Discourse Context for Statistical Natural Language Generation</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Where counting forms fails
</SectionTitle>
    <Paragraph position="0"> This section provides evidence from English and Finnish that word order affects meaning and acceptability. For each phenomenon we show how a statistical generation technique based only on the probability of forms in a corpus will fail to capture this distinction in meaning.</Paragraph>
    <Paragraph position="1"> Speakers can use a particular form to indicate their assumptions about the status of entities, properties, and events in the discourse model. For example, references to entities may appear as full NPs, pronouns, or be missing entirely, depending on whether speakers regard them as new or old to the hearer or the discourse or as particularly salient (Gundel et al., 1993; Prince, 1992). Not just the lexical form of referential expressions, but also their position or role within the clause may vary depending on the information status of its referent (Birner and Ward, 1998). An example of this in English is ditransitive verbs, which have two variants, the to-dative (I gave the book to the manager) and the double-object (I gave the manager the book). Without a context both forms are equally acceptable, and in context native speakers may be unable to consciously decide which is more appropriate. However, the use of the forms is highly systematic and almost entirely predictable from the relative information status and the relative size of the object NPs (Snyder, 2003). In general, older, lighter NPs precede newer, heavier NPs.</Paragraph>
    <Paragraph position="2"> Generating the appropriate ditransitive form based only on their relative frequencies is impossible, as can be seen in the behavior of the ditransitive give in a corpus of naturally occurring written and spoken English (Snyder, 2003).1 Of the 552 tokens of give where the indirect and direct objects are full NPs,2 152 (27.5%) are the to-dative and 400 (72.5%) are the double object construction. Given this ratio, only the double object construction would be generated. If the distribution of relative information status and heaviness of direct and indirect objects is the same in the domain of generation as in the source corpus, then on average, the construction chosen as a surface realization will be inappropriate 3 times out of 10.</Paragraph>
    <Paragraph position="3"> Compared to English, the evidence for the importance of word order from a free word order language like Finnish is even more striking. When word order is used to encode the information status and discourse function of NP referents, native speakers will judge the use of the wrong form infelicitous and odd, and a text incorporating  fected by additional phonological factors related to cliticization. several wrong forms in succession rapidly becomes incoherent (cf. Kruijff-Korbayov'a et al. (2002) on Czech, Russian, and Bulgarian).</Paragraph>
    <Paragraph position="4"> Although Finnish is regarded as canonically subject-verb-object (SVO), all six permutations of these three elements are possible, and corpus studies reveal that SVO order only occurs in 56% of sentences (Hakulinen and Karlsson, 1980). Different word order variants in Finnish realize different pragmatic structurings of the conveyed information. For example, Finnish has no definite or indefinite article, and the SVO/OVS variation is used to encode the distinction between already-mentioned entities and new entities (e.g. Chesterman (1991)). OVS order typically marks the object as given, and the subject as new. SVO order is more flexible. It can be used when the subject is given, and the object is new, and also when both are old or both are new. In orders with more than one preverbal argument (SOV, OSV), as well as verb-initial orders (VOS, VSO), the initial constituent is interpreted as being contrastive (Vilkuna (1995); and others).</Paragraph>
    <Paragraph position="5"> Because different orders have different discourse properties, use of an inappropriate order can lead to severe misunderstandings, including difficulty in interpreting NPs. For example, if a speaker uses canonical SVO order in a context where the subject is discourse-new information but the object has already been mentioned, the hearer will tend to have difficulty interpreting the NPs because OVS--not SVO--is the order that usually marks the object as discourse-old and subject as discourse-new. Psycholinguistic evidence from sentence processing experiments shows that humans are very sensitive to the given-new information carried by word order (Kaiser, 2003).</Paragraph>
    <Paragraph position="6"> Hence, it is an important factor in the quality of linguistic output of a NLG system.</Paragraph>
    <Paragraph position="7"> Attempts to choose the appropriate word order in Finnish will encounter the same problem found with English ditransitives. Table 1 illustrates the frequency of the different word orders in a 10,000 sentence corpus used by Hakulinen and Karlsson (1980). The most frequent order is SV(X), where X is any non-subject, non-verbal constituent, and so this order should always be the one selected by a statistical algorithm. Based on the counts then, assuming that the proportion of discourse contexts is roughly similar within a domain, in only 56% of contexts will the choice of SV(X) order actually match the discourse conditions in which it is used.</Paragraph>
    <Paragraph position="8">  The point here is not that statistical approaches to NLG are entirely flawed. Attempting to generate natural language by mimicking a corpus of naturally-occurring language may be the most practical strategy for designing robust, scalable NLG systems. However, human language is not just a system for concatenating words (or assembling trees) to create grammatical outputs. Speakers do not put constituents in a certain order simply because the words they are using to express the constituents have been frequently put in that order in the past. Constituents (and thereby words) appear in particular orders because those orders can reliably indicate the content speakers wish to communicate. Because of the lucky coincidence that statistical NLG has been primarily based on English, where the effects of word order variation are subtle, the problems with selecting a form f based only on a calculation of P(f) are not obvious. It might seem as if the most frequent tree can express a given proposition adequately. However, given the English word order phenomenon shown above, a model based on P(f) is problematic. Moreover, in languages like Finnish, even the generation of simple transitive clauses may result in output which is confusing for human users.</Paragraph>
    <Paragraph position="9"> NLG must take into account not just grammaticality but contextual appropriateness, and so statistical algorithms need to be provided with an augmented representation from which to learn--not just strings or trees, but pairings of linguistic forms, contexts, and meanings. The probability we need to maximize for NLG is the probability that f is used given a meaning to be expressed and the context in which f will be used, P(f|meaning,context).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 An alternative approach
</SectionTitle>
    <Paragraph position="0"> This section describes a very simple example of how a probability like P(f|meaning,context) could be utilized as part of a surface realization algorithm for English ditransitives, in particular for the verb give. This example is only a small subset of the larger problem of surface realization, but it illustrates well the improvement in performance of using P(f|meaning,context) vs. P(f), when evaluated against actual corpus data.</Paragraph>
    <Paragraph position="1"> First, the corpus from which the probabilities are being taken must be annotated with the additional meaning information conditioning the use of the form. For ditransitives, this is the information status of the indirect object NP, in particular whether it is hearer-new. Hearerstatus can be quickly and reliably annotated and has been widely used in corpus-based pragmatic studies (Birner and Ward, 1998). It could be applied as an additional markup of a corpus to be used as input to a statistical generation algorithm, like the Penn Treebank, such that each NP indirect object of a ditransitive verb would be given an additional tag marking its hearer status. Here we use the corpus counts presented in Snyder (2003) for the verb give as our training data. Table 2 shows the frequency of the properties of hearer-newness and relative heaviness of indirect objects (IOs) and direct objects (DOs) with respect to the two ditransitive alternations.</Paragraph>
    <Paragraph position="2">  To demonstrate the performance of an approach which counts only form, we use the equation P(f) to determine the choice of double-object vs. to-dative. The relative probabilities of each order in the Snyder (2003) corpus are .725 and .275 for double object and to-dative, respectively. As such, this method will always select the double object form, yielding an error rate of 27.5% on the training data, as shown in the row labeled P(f) of Table 3.</Paragraph>
    <Paragraph position="3"> An algorithm which incorporates more information than just raw frequencies will proceed as follows: if the IO is hearer-new, generate a to-dative because the probability in the corpus of finding a to-dative given that the indirect object is hearer-new is 1 (60 out of 552 tokens).</Paragraph>
    <Paragraph position="4"> In all other cases (i.e. all other information statuses of IO and DO), the probability of finding a to-dative is now 92/400, or 18.6%, so generate a double object. This method results in 92 incorrect forms (all cases where the double object is generated instead of a to-dative), an error rate of 16.7% on the training data.</Paragraph>
    <Paragraph position="5"> If the generation algorithm is further augmented to take into account information about the relative heaviness of the direct and indirect object NPs--possible in a system where NPs are generated separately from sentences as a whole, the error rate can be reduced even more. This algorithm will be as follows, if the IO is hearer-new, the form chosen is a to-dative. If the IO is not hearer-new, the IO and DO are compared with respect to number of syllables. If the IO is longer, generate a to-dative; if the DO is longer, generate a double object. As before, the first rule applies to the 60 tokens where the IO is hearernew. Out of the remaining 492 tokens, 474 have IOs and DOs of different heaviness. In 357 of the 388 double objects, the DO is heavier, and in 79 of the 86 to-datives, the IO is heavier. This leaves 38 of 474 tokens not covered by the heaviness rule, along with 18 tokens where the IO and DO are equal. For these 56 cases, we generate the more probable overall form, the double object. In total then, this augmented generation rule will yield 139 to-datives (60 cases where the IO is hearer-new and 79 cases where the IO is heavier). With this algorithm, only 13 actual to-datives will be generated wrongly as doubleobjects when compared to their actual form in the corpus, an error rate of only 2.4%</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DO-IO IO-DO Error
</SectionTitle>
    <Paragraph position="0"> This example shows that for some arbitrary generation of a surface realization of the predicate GIVE, simply including the hearer-status of the recipient as a condition on the choice of form yields the order that matches the &amp;quot;gold standard&amp;quot; of human behavior in a meaningful way about 80% of the time vs. only 70% for an approach based on counts of trees including give alone. By including additional information about the relative size of the NPs, the surface realization will match the gold standard over 97% of the time, a highly human-like output.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML