File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-0908_evalu.xml

Size: 3,951 bytes

Last Modified: 2025-10-06 14:00:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0908">
  <Title>C/n G o C/ Q W</Title>
  <Section position="7" start_page="52" end_page="160" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="52" end_page="160" type="sub_section">
      <SectionTitle>
5.1 French
</SectionTitle>
      <Paragraph position="0"> The corpus gathered for French is 500 sentences long, and 16.9 words average in length. It covers a number of different text types (news, letters, online discussions, legal, literature, etc.) (see Pinkham 96 for details) 4. The text is used 'as is' from the Web, with only a few spelling corrections. Coverage on this corpus approximately a year ago was 54%; today it is 75~o. 5 Development work for French up until now has been biased toward sentences under 20 words. On the basis of the data collected from the experiment below, we can also deduce that effort spent on sentences in the 20+ word range would produce the quickest improvement overall in the future. Figure 3 shows coverage across different sentence length intervals for French. The coverage (i.e. the percentage of parses that is non-FI'VI'ED) is shown for each category on top of the columns.</Paragraph>
      <Paragraph position="1">  to sentence length for French (showing percentage of coverage) l \[\] number of sentences El non-FITTED parses II FITTED parses I 1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-55 &gt;56 sentence length  the grammatical sentences for French are on average 7 words long, and artificially simple in terms of lexical and grammatical complexity. On the TSNLP data, coverage of the French system is 96%.</Paragraph>
      <Paragraph position="2"> 5 We estimate that coverage at the very beginning of French development approximately 18 months ago would have been 25% (on the basis of tests done with other text).</Paragraph>
    </Section>
    <Section position="2" start_page="160" end_page="160" type="sub_section">
      <SectionTitle>
5.2 Spanish
</SectionTitle>
      <Paragraph position="0"> The Spanish benchmark file contains 503 sentences from textbooks, magazines, news articles, a children's book and literary writing (novel). To control for regional variation, both Latin American and Castilian Spanish are represented (the sources are from Spain, Chile, Argentina, and Mexico). The average sentence length is 19.1 words. Current coverage on the benchmark file is 75.15%.</Paragraph>
      <Paragraph position="1"> Because Spanish started grammar development while there was only a small prototype dictionary of about 2000 words, no coverage data were taken at the earlier stages of grammar work.</Paragraph>
      <Paragraph position="2">  to sentence length for Spanish (showing percentage of coverage)</Paragraph>
    </Section>
    <Section position="3" start_page="160" end_page="160" type="sub_section">
      <SectionTitle>
5.3 German
</SectionTitle>
      <Paragraph position="0"> The German benchmark corpus currently consists of 424 sentences with an average length of 15.3 words per sentence. The sentences are extracted from news articles, novels, children's books, travel guides, technical writing and interviews.</Paragraph>
      <Paragraph position="1"> Figure 5 below illustrates the progress of coverage over time from the first steps in grammar work in October 1996 until February 1997. At that 26-30 31-35 36-40 41-45 46-50 51-55 &gt;56 sentence length point, the coverage had reached over 56.13%. Note that the increase in coverage over time resembles the facts reported in section 5.1 for French. In November 1996, the size of the benchmark corpus was increased from 229 to 424 sentences. This addition of new sentences from new sources had very little impact on the statistics.</Paragraph>
      <Paragraph position="2"> Figure 6 shows statistics on the make-up of the corpus and coverage across different sentence-length categories in intervals of 5 words.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML