File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/n06-1049_abstr.xml

Size: 1,379 bytes

Last Modified: 2025-10-06 13:44:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1049">
  <Title>Will Pyramids Built of Nuggets Topple Over?</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> The present methodology for evaluating complex questions at TREC analyzes answers in terms of facts called &amp;quot;nuggets&amp;quot;. The official F-score metric represents the harmonic mean between recall and precision at the nugget level. There is an implicit assumption that some facts are more important than others, which is implemented in a binary split between &amp;quot;vital&amp;quot; and &amp;quot;okay&amp;quot; nuggets. This distinction holds important implications for the TREC scoring model--essentially, systems only receive credit for retrieving vital nuggets--and is a source of evaluation instability. The upshot is that for many questions in the TREC testsets, the median score across all submitted runs is zero. In this work, we introduce a scoring model based on judgments from multipleassessorsthatcapturesamorerefined null notion of nugget importance. We demonstrateonTREC2003, 2004, and2005data that our &amp;quot;nugget pyramids&amp;quot; address many shortcomings of the present methodology, while introducing only minimal additional overhead on the evaluation flow.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML