File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1016_concl.xml

Size: 3,201 bytes

Last Modified: 2025-10-06 13:53:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1016">
  <Title>Statistical Acquisition of Content Selection Rules for Natural Language Generation</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusions and Further Work
</SectionTitle>
    <Paragraph position="0"> We have presented a novel method for learning Content Selection rules, a task that is difficult to perform manually and must be repeated for each new domain. The experiments presented here use a resource of text and associated knowledge that we have produced from the Web. The size of the corpus and the methodology we have followed in its construction make it a major resource for learning in generation. Our methodology shows that data currently available on the Internet, for various domains, is readily useable for this purpose. Using our corpora, we have performed experimentation with three methods (exact matching, statistical selection and rule induction) to infer rules from indirect observations from the data.</Paragraph>
    <Paragraph position="1"> Given the importance of content selection for the acceptance of generated text by the final user, it is clear that leaving out required information is an error that should be avoided. Thus, in evaluation, high recall is preferable to high precision. In that respect, our class-based statistically selected rules perform well. They achieve 94% recall in the best case, while filtering out half of the data in the input knowledge base. This significant reduction in data makes the task of developing further rules for content selection a more feasible task. It will aid the practitioner of NLG in the Content Selection task by reducing the set of data that will need to be examined manually (e.g., discussed with domains experts).</Paragraph>
    <Paragraph position="2"> We find the results for ripperdisappointing and think more experimentation is needed before discounting this approach. It seems to us rippermay be overwhelmed by too many features. Or, this may be the best possible result without incorporating domain knowledge explicitly. We would like to investigate the impact of additional sources of knowledge. These alternatives are discussed below.</Paragraph>
    <Paragraph position="3"> In order to improve the rule induction results, we may use spreading activation starting from the particular frame to be considered for content selection and include the semantic information in the local context of the frame. For example, to content select  ered (e.g., a0 relativea3a5a3a5a3 a7 will be completely disregarded). Another improvement may come from more intertwining between the exact match and statistical selector techniques. Even if some data path appears to be copied verbatim most of the time, we can run our statistical selector for it and use held out data to decide which performs better.</Paragraph>
    <Paragraph position="4"> Finally, we are interested in adding a domain paraphrasing dictionary to enrich the exact matching step. This could be obtained by running the semantic input through the lexical chooser of our biography generator, PROGENIE, currently under construction. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML