File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/95/w95-0113_evalu.xml

Size: 2,307 bytes

Last Modified: 2025-10-06 14:00:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0113">
  <Title>Development of a Partially Bracketed Corpus with Part-of-Speech Information Only</Title>
  <Section position="7" start_page="169" end_page="171" type="evalu">
    <SectionTitle>
5. Experimental Results
</SectionTitle>
    <Paragraph position="0"> LOB Corpus, which is a million-word collection of present-day British English texts, is adopted as the source of training data. Susanne Corpus is adopted as the source of testing data for evaluating the performance of our probabilistic chunker. This corpus contains one tenth of Brown Corpus, but involves more syntactic and semantic information than Brown Corpus.</Paragraph>
    <Paragraph position="1"> For evaluating the performance, a criterion \[2\], i.e., the content of each chunk should be dominated by one non-terminal node in Susanne parse field, is adopted. The performance evaluation model compares the chunked result C with the corresponding syntactic structure T. Accordmg to this criterion, the experimental results for Definitions 3 and 4 are shown in Table 7 as follows.</Paragraph>
    <Section position="1" start_page="169" end_page="171" type="sub_section">
      <SectionTitle>
File
</SectionTitle>
      <Paragraph position="0"> The experimental results demonstrate that Definition 4 (three parts of speech) is more powerful than Definition 3 (two parts of speech). Assume the chunk length is the number of tags in a chunk. The distribution of Chunk length is listed in Tables 8 and 9.</Paragraph>
      <Paragraph position="1">  One-tag chunks cover about 50%. We further analyze what grammatical components constitute the one-tag chunks and find that most of the one-tag chunks contam punctuation marks, nouns and verbs. This is because proper name forms the bare subject or object. Verb is presented in the form of third person and singular, past tense, or base form. These three cases form about 62% of one-tag chunks.</Paragraph>
      <Paragraph position="2"> By analyzing the error chunked results, we find that many errors result from conjunctions. Besides, some tags cannot be located at the end of the chunks. Therefore, the heuristic rule is applied to improve the performance. The tags that cannot be located at the end of chunks are listed as follows:</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML