File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-2002_intro.xml
Size: 984 bytes
Last Modified: 2025-10-06 14:01:51
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-2002"> <Title>An Ontology-based Semantic Tagger for IE system</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Corpus </SectionTitle> <Paragraph position="0"> The corpus is a collection of 95 manually transcribed telephone conversations (about 39,000 words). They are mostly informative dialogs, where two speakers (a caller C and an operator O) discuss the conditions and circumstances related to a SAR mission. The conversations are either (1) incident reports, such as reporting missing persons or overdue boats, (2) SAR mission plans, such as requesting an SAR airplane or coast guard ships for a mission, or (3) debriefings, in which case the results of the SAR mission are communicated. They can also be a combination of the three kinds. Figure 1 is an excerpt of such conversations. We can notice many disfluencies 1-O:Hi, it's Mr. Joe Blue |{z }.</Paragraph> </Section> class="xml-element"></Paper>