File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/w00-1440_abstr.xml

Size: 2,893 bytes

Last Modified: 2025-10-06 13:41:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1440">
  <Title>Appendix II: Discussion Panel on Evaluation Research in Generation Moderator: Inderjeet Mani</Title>
  <Section position="1" start_page="0" end_page="273" type="abstr">
    <SectionTitle>
Moderator: Inderjeet Mani
</SectionTitle>
    <Paragraph position="0"> Evaluation is critical in offering feedback on progress_toboth developers andpotential consumers of NLG technology. However, evaluation has thus far not been as well-established in NLG as it has become in NLU. This panel will discuss evaluation methods and resources. It is aimed at building a better understanding of NLG evaluation methods, and hopefully arriving at steps to facilitate future evaluations.</Paragraph>
    <Paragraph position="1"> Applicable evaluation methods can be derived from work in NLG as well as Text Summarization and Machine Translation. The evaluation methods include intrinsic methods which test the generation system in itself, and extrinsic methods which test the generation system in relation to some other task.</Paragraph>
    <Paragraph position="2"> Intrinsic methods can include assessing coverage of different varieties of generation input, the quality of the generated output, and comparison of generated output against reference output at some level (e.g., by using subjective grading, comparison against templates, or comparing human correctness in answering questions based on each type of output, etc.) Of course, a fundamental problem in evaluating NLG is that there may be many acceptable outputs.</Paragraph>
    <Paragraph position="3"> Extrinsic methods can include measuring efficiency in executing generated instructions (e.g., how easy was it to install the component by following the generated manual?), assessing the relevance of generated output to some information need or goal (e.g., are the generated business letters effective?), its impact on a system in which it is embedded (e.g., how much does the generation help the question answering system?), measuring the amount of effort required to post-edit the output (e.g., how much do the generated briefings need to be fixed up?), etc.</Paragraph>
    <Paragraph position="4"> As the generation technology becomes more mature, it is useful to assess end-user acceptability of generated output, extensibility and portability, throughput, cost-benefit measures, etc. it is also interesting for evaluations address both features important to the overall task, as well as features unique to NL generation.</Paragraph>
    <Paragraph position="5"> Participants will address the following issues:  1. What evaluation methods are applicable to NLG? 2. What are the pros and cons of NLG evaluations you have carried out? 3. Can we construct corpora to help evaluate NLG systems? 4. What steps can we collectively take to improve the role of evaluation in NLG?</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML