File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1705_evalu.xml

Size: 2,059 bytes

Last Modified: 2025-10-06 13:59:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1705">
  <Title>A Bottom-up Merging Algorithm for Chinese Unknown Word Extraction</Title>
  <Section position="8" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6.3 Results
</SectionTitle>
    <Paragraph position="0"> Based on the four priority measures listed in table 4, the bottom-up merging algorithm is applied. The performances are shown in table 5.</Paragraph>
    <Paragraph position="1">  In table 5, comparing co-occurrence and MI, we found that the performance of co-occurrence measure is better than MI on both precision and recall. The possible reason is that the characteristic of reoccurrence of unknown words is more important than morphological association of unknown words while extracting unknown words from a size-limited text. That is because sometimes different unknown words consist of the same morpheme in a document, and if we use MI as the priority, these unknown words will have low MI values of their morphemes. Even though they have higher frequency, they are still easily sacrificed when they are competed with their adjacent unknown word candidates. This explanation is also proved by the performances of VMI and t-score, which emphasize more importance on co-occurrence in their formulas, are better than the performance of MI.</Paragraph>
    <Paragraph position="2"> According to above discussions, we adopt co-occurrence as the priority decision making in our unknown word extraction system.</Paragraph>
    <Paragraph position="3"> In our final system, we adopt morphological rules to extract regular type unknown words and the general rules to extract the remaining irregular unknown words and the total performance is a recall of 57% and a precision of 76%. An old system of using the morphological rules for names of people, compounds with prefix or suffix were tested, without using the general rules, having a recall of 25% and a precision of 80%. The general rules improve 32% of the recall and without sacrificing too much of precision.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML