File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/w98-0301_evalu.xml
Size: 3,275 bytes
Last Modified: 2025-10-06 14:00:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0301"> <Title>marization, and generation of natural language</Title> <Section position="5" start_page="4" end_page="6" type="evalu"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> To evaluate a C++ implementation of the clause-like unit and discourse-marker identification algorithm, I randomly selected three texts, each belonging to a different genre: an expository text of 5036 words from Scientific American; a magazine article of 1588 words from Time; and a narration of 583 words from the Brown Corpus. No fragment of any of the three texts was used during the corpus analysis. Three independent judges, graduate students in computational linguistics, broke the texts into elementary units. The judges were given no instructions about the criteria that they were to apply in order to determine the clause-like unit boundaries; rather, they were supposed to rely on their intuition and preferred definition of clause. The locations in texts that were labelled as clause-like unit boundaries by at least two of the three judges were considered to be &quot;valid elementary unit boundaries&quot;. I used the valid elementary unit boundaries assigned by judges as indicators of discourse usages of cue phrases and I deterrnined manually the cue phrases that signalled a discourse relation. For example, if an and was used in a sentence and if the judges agreed that a textual unit boundary existed just before the and, I assigned that and a discourse usage. Otherwise, I assigned it a sentential usage. Hence, although the corpus analysis was carried out by only one person, the validation of the actions and of the algorithm depicted in figure 1 was carried out against unseen texts, which were manually labelled by multiple subjects.</Paragraph> <Paragraph position="1"> Once the &quot;gold-standard&quot; textual unit boundaries and discourse markers were manually identified, I applied the algorithm in figure I on the same texts.</Paragraph> <Paragraph position="2"> The algorithm found 80.8% of the discourse markers with a precision of 89.5% (see Marcu (1997b) for details), a result that outperforms Hirschberg and Litman's (1993) and its subsequent improvements (Litman, 1996; Siegel and McKeown, 1994).</Paragraph> <Paragraph position="3"> The algorithm correctly identified 81.3% of the clause-like unit boundaries, with a precision of 90.3%. I am not aware of any surface-form algorithms that achieve similar results. Still, the clause-like unit and discourse-marker identification algorithm has its limitations. These are primarily due to the fact that the algorithm relies entirely on cue phrases and orthographic features that can be detected by shallow methods. For example, such methods are unable to classify correctly the sentential usage of but in example (12); as a consequence, the algorithm incorrectly inserts a textual unit boundary before it.</Paragraph> <Paragraph position="4"> (12) \[The U.S. has\] \[but a slight chance to win a medal in Atlanta,\] \[because the championship eastern European weight-lifting programs have endured in the newly independent countries that survived the fracturing of the Soviet bloc.\]</Paragraph> </Section> class="xml-element"></Paper>