File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-0802_evalu.xml
Size: 3,030 bytes
Last Modified: 2025-10-06 13:58:47
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0802"> <Title>A Two-stage Model for Content Determination</Title> <Section position="8" start_page="0" end_page="0" type="evalu"> <SectionTitle> 8 Evaluation </SectionTitle> <Paragraph position="0"> We are currently building a testbed system called SUMTIME-MOUSAM which will enable us to test the hypotheses we have presented in this paper and other hypotheses suggested by our KA activities. SUMTIME-MOUSAM is a framework system that consists of * &quot;Infrastructure&quot; software for accessing data files, regression testing of new software versions, etc.</Paragraph> <Paragraph position="1"> * An ontology, which defines a conceptual level of representation of texts.</Paragraph> <Paragraph position="2"> * A corpus of human-written texts with their corresponding conceptual representations defined using the above ontology.</Paragraph> <Paragraph position="3"> * Scoring software which compares the output of a module (either at a conceptual or text level) against the human corpus.</Paragraph> <Paragraph position="4"> Because we are primarily interested in content issues, it is important to evaluate our system at a content level as well as at a text level. To support this, we are developing conceptual representations of the texts we will be generating, which can also be extracted from human texts by manual analysis.</Paragraph> <Paragraph position="5"> SUMTIME-MOUSAM is currently best developed in the area of producing wind texts. In this area, we have developed a conceptual representation and manual annotation guide (with good inter-annotator agreement, generally kappa values of .9 or higher); built an initial software system to automatically produce such texts based on a threshold model without an overview; and begun the process of analysing differences. We are currently working on extending SUMTIME-MOUSAM to other parts of weather forecasts, such as statements describing clouds and precipitation, and plan in the future to extend it to the gas-turbine domain.</Paragraph> <Paragraph position="6"> With regard to testing hypotheses specifically about two-stage content determination (the subject of this paper), our plan is as follows 1. Compare the output of the non-overview based software to human summary texts, and identify cases where an overview seems to be used.</Paragraph> <Paragraph position="7"> 2. Ask human experts to build an overview (using a GUI), modify our software to use this overview when generating texts, and see if this results in texts more similar to the human texts.</Paragraph> <Paragraph position="8"> 3. Attempt to automatically generate the overview from the data, and again compare the resultant texts to human texts.</Paragraph> <Paragraph position="9"> At some point towards the end of SUMTIME, we also hope to conduct user task evaluations. For example, we may show gas-turbine engineers our summary texts and see if this helps them detect problems in the gas turbine.</Paragraph> </Section> class="xml-element"></Paper>