File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/c94-1096_evalu.xml

Size: 1,373 bytes

Last Modified: 2025-10-06 14:00:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1096">
  <Title>AN IBM-PC ENVIRONMENT FOR CHINESE CORPUS ANALYSIS</Title>
  <Section position="7" start_page="586" end_page="586" type="evalu">
    <SectionTitle>
VL EVALUATION PROGRAMS
</SectionTitle>
    <Paragraph position="0"> Two progranls were written to meastlre the performance of word segmentation and word identification. For segmentation, segperf.exe examines two identical texts that were segmented by different mettmds. The program shows the amount of segmentation error, the number of clauses, the mmlber of clause th.'lt are segmented correctly and the amount of over- or under-segmentation. Files of the segntented texts are specified by the -a and -m options. The user can inspect parallel clauses to examine individual differences in segmentation by setting the -d (diagnostic) option to 1.</Paragraph>
    <Paragraph position="1"> For word identification, wcomp.exe compares two sets of different word lists and determines the antount of word overlap. The program shows the distribution of word overlap for different length of words. This is important since long words tend to be compound nouns thal are not in a general dictionary.</Paragraph>
    <Paragraph position="2"> Using the -i and -j options, the program saves words that overlap and words that do not overlap, respectively.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML