File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2311_concl.xml
Size: 2,774 bytes
Last Modified: 2025-10-06 13:54:26
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2311"> <Title>The Importance of Discourse Context for Statistical Natural Language Generation</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Implementation & implications for NLG </SectionTitle> <Paragraph position="0"> The approach argued for above is one where discourse context and meaning must be taken into account when selecting a construction for NLG purposes. Admittedly, the demonstration of the error rate here is not derived from an actual system. However, functioning NLG systems have been implemented where exactly such information conditions the algorithm for choice of main clause word order (Stone et al., 2001; Kruijff-Korbayov'a et al., 2002). Additionally, an approach like Bangalore and Rambow's could easily be extended by annotating their corpus for hearer-status of NPs. The necessary information could also possibly be extracted automatically from a corpus like the Prague Dependency Treebank which includes discourse-level information relevant to word order. For phenomena which have not been as closely studied as English ditransitives, machine learning could be used to find correlations between context and forms in corpora which could be incorporated into statistical NLG algorithms.</Paragraph> <Paragraph position="1"> The primary implication of our argument here is that counting words and trees is not enough for statistical NLG. Meaning, semantic and pragmatic, is a crucial component of natural language generation. Despite the desire to lessen the need for labeled data in statistical NLP, such data remain crucial. Efforts to create multi-level corpora which overlay semantic annotation on top of syntactic annotation, such as the Propbank (Kingsbury and Palmer, 2002), should be expanded to include annotations of pragmatic and discourse information and used in the development of statistical NLG methods. We cannot generate forms by ignoring their meaning and expect to get meaningful output. In other words, if the input to an NLG system does lack distinctions that play a crucial role in human language comprehension, the system will not be able to overcome this lack of quality and generate high-quality output.</Paragraph> <Paragraph position="2"> In addition, in the effort to push the boundaries of statistical techniques, limiting the scope of research to English may give falsely promising results. If one of the primary benefits of statistical techniques is robust portability to other languages, presentation of results based on experimentation on a small subset of human languages must be accompanied by a typologically-informed examination of the assumptions underlying such experiments.</Paragraph> </Section> class="xml-element"></Paper>