File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-2034_evalu.xml
Size: 2,670 bytes
Last Modified: 2025-10-06 13:59:40
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2034"> <Title>Using Phrasal Patterns to Identify Discourse Relations</Title> <Section position="6" start_page="134" end_page="135" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> The system identifies one of six discourse relations, described in Table 1, for a test sentence pair. Using the 300 sentence pairs set aside earlier (50 of each discourse relation type), we ran two experiments for comparison purposes: one using only lexical information, the other using phrasal patterns as well. In the experiment using only lexical information, the system selects the relation choose that relation; else, if one relation maximizes Table 2 shows the result. For all discourse relations, the results using phrasal patterns are better or the same. When we consider the frequency of discourse relations, i.e. 43% for ELABORATION, 32% for CONTRAST etc., the weighted accuracy was 53% using only lexical information, which is comparable to the similar experiment by (Marcu and Echihabi 2002) of 49.7%. Using phrasal patterns, the accuracy improves 12% to 65%. Note that the baseline accuracy (by always selecting the most frequent relation) is 43%, so the improvement is significant.</Paragraph> <Paragraph position="1"> Since they are more frequent in the corpus, ELABORATION and CONTRAST are more likely to be selected by Score , it sometimes identifies the other relations.</Paragraph> <Paragraph position="2"> The system makes many mistakes, but people also may not be able to identify a discourse relation just using the two sentences if the cue phrase is deleted. We asked three human subjects (two of them are not authors of this paper) to do the same task. The total (un-weighted) accuracies are 63, 54 and 48%, which are about the same or even lower than the system performance. Note that the subjects are allowed to annotate more than one relation (Actually, they did it for 3% to 30% of the data). If the correct relation is included among their N choices, then 1/N is credited to the accuracy count. We measured inter annotator agreements. The average of the inter-annotator agreements is 69%. We also measured the system performance on the data where all three subjects identified the correct relation, or two of them identified the correct relation and so on (Table 3). We can see the correlation between the number of subjects who answered correctly and the system accuracy. In short, we can observe from the result and the analyses that the system works as well as a human does under the condition that only two sentences can be read.</Paragraph> </Section> class="xml-element"></Paper>