File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1028_evalu.xml

Size: 2,600 bytes

Last Modified: 2025-10-06 13:58:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1028">
  <Title>Finding Errors Automatically in Semantically Tagged Dialogues</Title>
  <Section position="8" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5. ANALYSIS
</SectionTitle>
    <Paragraph position="0"> The automatic method flagged 40 items as errors that the judges determined were not errors (17 occur, 23 detect). These 40 false errors can be classified as follows: A. 10 were due to bugs in the algorithm or source data B. 19 were false errors that can be eliminated with non-trivial changes to the semantic tagset and/or algorithm C. 3 were false errors that could not be eliminated without the ability to make inferences about world knowledge D. 8 were due to mistakes made by the semantic annotator One example of the 19 false errors above in B is when the first user utterance in a dialogue is a bare location, it is unclear whether the user intends it to be a departure or arrival location. Our semantic tagset currently has no tags for ambiguous situations such as these. Adding underspecified tags to our tagset (and updating the automatic algorithm appropriately) would solve this problem. Another example is a situation where a system was legitimately asking for clarification about a slot fill, but the algorithm flagged it as prompting for keys that had already been filled. This could be fixed by adding a CLARIFY element to the type dimension (currently PROMPT, FILL, and OFFER). We believe that making these changes would not compromise the generality of our semantic tagset. However, as the point of our approach is to derive errors without much additional annotation, additions to the semantic tagset should only be made when there is substantial justification.</Paragraph>
    <Paragraph position="1"> There were also 21 errors (15 occur, 6 detect) that were not detected by the automatic method, but were judged as real errors. These 21 errors may be categorized as follows: A. 2 were due to bugs in the algorithm B. 8 were situations where the algorithm correctly flagged the detect point of an error, but missed the associated occur point C. 6 were situations that could be fixed by modifications to the semantic tagset D. 1 was an error that could be fixed either by a revision to the semantic tagset or a revision to the algorithm E. 2 were situations where the system ignored a user fill, and the automatic algorithm interpreted it as no confirmation (not an error). Human judgement is required to detect these errors F. 2 were due to mistakes made by the semantic annotator</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML