File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/n03-2001_evalu.xml
Size: 1,887 bytes
Last Modified: 2025-10-06 13:58:59
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2001"> <Title>Ronan.Reilly@may.ie</Title> <Section position="4" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3 Performance </SectionTitle> <Paragraph position="0"> We used documents from a number of different domains for our experiments, including letters from the MacGreevy archive (Schreibman, 1998, 2000), a database of employee records, Shakespearean plays (Bosak, 1998), poems from an early American encoding project, and scientific journal articles (Openly Informatics, Inc., 1999-200). Figure 2 shows a part of a scene from &quot;A Midsummer Night's Dream&quot; as an example of XML markup automatically produced by our system. The underlined text was not marked up by our system.</Paragraph> <Paragraph position="1"> We have also evaluated our system with some of the document sets. For evaluation, we considered the elements representing the content of the document, and a human expert is required to evaluate this. We have used three performance measures in evaluating the automatic mined by the system (i.e. text nodes for these markup elements are not present in the marked-up document produced by the system) The elements of 10 valid XML marked-up letters from the MacGreevy archive were used to learn C5 rules and text segmentation heuristics. By applying these rules and heuristics, 55 elements of five unmarked letters from the MacGreevy archive were automatically marked up by the system with 96% accuracy (we use the term &quot;accuracy&quot; here to mean the number of marked-up elements correctly determined by the system). Similarly, elements of 5 valid XML marked-up Shakespeare plays were used as training examples and 13882 elements of four Shakespearean plays were automatically marked-up by the system. In this case the accuracy rate was 92%.</Paragraph> </Section> class="xml-element"></Paper>