File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1718_evalu.xml

Size: 7,739 bytes

Last Modified: 2025-10-06 13:59:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1718">
  <Title>Single Character Chinese Named Entity Recognition</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiment results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Evaluation Methodology
</SectionTitle>
      <Paragraph position="0"> To achieve a reliable evaluation, we developed an annotated test set. First, we discuss a standard of Chinese NE and SCNE. Most previous researches define their own standards; hence results of different systems are not comparable.</Paragraph>
      <Paragraph position="1"> Recently, two widely accepted standards were developed. They are (1) MET-2 (Multilingual Entity Task)4 for Chinese and Japanese NE, and (2) IEER-99' 5 for Chinese NE. IEER-99 is a slightly modified version of MET-2. Our NE/SCNE standard is based on these two well-known standards.</Paragraph>
      <Paragraph position="2"> Second, we manually annotated a 10MB training corpus and a 1MB test corpus. The texts are randomly selected from People's Daily, including articles from 10 subjects and 5 writing styles. This test set is much larger than MET-2 test data (which is about 106 KB), and contains more SCNE for evaluation.</Paragraph>
      <Paragraph position="3"> The evaluation metrics we used include precision (P), recall (R), and F-score. F-score is</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Results of Source-Channel Models
</SectionTitle>
      <Paragraph position="0"> We show the SCNE recognition results using the source-channel models described in Section 3.</Paragraph>
      <Paragraph position="1"> Two versions of NE models are used. M1 is the original model described in Section 3.1. M2 is the one adapted for SCNE, shown in Section 3.2. The results in Table 4 show that obvious improvement can be achieved on SCL and SCP after adapting source-channel models for SCNE. As shown in Table 5, the improvement of SCL and SCP has significant impact on performance of LN and PN</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Results of Different Methods
</SectionTitle>
      <Paragraph position="0"> In Figures 3 and 4, we compare the results of the source-channel models with two classifiers described in Section 4: ME and VSM.</Paragraph>
      <Paragraph position="1">  We can see that source-channel model achieves the best result. This can be interpreted as follows. The source-channel models use more information than the other two methods. The feature set of ME or VSM classifiers includes only six surrounding characters while the source-channel models use much rich global and local information as shown in Figure 1. Based on our analysis, we believe that even enlarging the window size of the local context, the performance gain of these classifiers is very limited, because most error tags cannot be correctly classified using local context features. We can then say with confidence that the source-channel models can achieve comparable results with ME and VSM even if they used more local context.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.4 Comparison with Other State-of-The-Art
Systems
</SectionTitle>
      <Paragraph position="0"> The section compares the performance of the source-channel models M2, with three state-of-the-art systems: MSWS, LCWS and PBWS.</Paragraph>
      <Paragraph position="1">  1. The MSWS system is one of the best available products. It is released by Microsoft(r) (as a set of Windows APIs). MSWS first conducts the word breaking using MM (augmented by heuristic rules for disambiguation), then conducts factoid detection and NER using rules.</Paragraph>
      <Paragraph position="2"> 2. The LCWS system is one of the best research systems in mainland China. It is released by Beijing Language University. The system works similarly to MSWS, but has a larger dictionary containing more PNs and LNs.</Paragraph>
      <Paragraph position="3"> 3. The PBWS system is a rule-based Chinese  parser, which can also output the NER results. It explores high-level linguistic knowledge such as syntactic structure for Chinese word segmentation and NER.</Paragraph>
      <Paragraph position="4"> To compare the results across different systems, we have to consider the problem that they might have different tagging format or spec. For example, the LCWS system tags the two-character string &amp;quot; a78a79&amp;quot; as a location name, and tags &amp;quot;a80 a81a82&amp;quot; other than &amp;quot; a80&amp;quot; as a person name. We then manually convert all tagging results of these three systems according to our spec. The results are shown in Table 6.</Paragraph>
      <Paragraph position="5">  We can see that our system (M2) achieves the best results in both SCL and SCP recognition. PBWS has the second best result in recognizing SCL (43.63%), and MSWS in SCP (43.48%).</Paragraph>
      <Paragraph position="6"> However, they achieved the worst result on SCP and SCL, respectively.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.5 Error Analysis
</SectionTitle>
      <Paragraph position="0"> Through human checking, we list the typical errors as follows:  as SCL. On the contrary, &amp;quot;a27a78a28a29a30a31 a32&amp;quot;(ri4- zhong1-you3-hao3-qi1-tuan2-ti3, seven Janpan-China friendship organizations ) is tagged as ON falsely. So &amp;quot;a27&amp;quot;(ri4,Japan), &amp;quot; a78&amp;quot; are missed. 2. SCNE list acquire from training data cannot covers some cases in test data: &amp;quot; a78a33a34a35 a36 &amp;quot;(zhong1-ka3-zu2-qiu2-sai4, China-Qatar soccer match), &amp;quot; a33&amp;quot;(ka3, Qatar) here is stand for &amp;quot; a33a37a38&amp;quot;(ka3-ta1-er3, Qatar), which is out of SCL list.</Paragraph>
      <Paragraph position="1">  3. Other errors: &amp;quot; a78a39a40a41a42a43&amp;quot;(zhong1-ba1 null shi3-chu1-da4-men2, the middle bus drives out from the gate),&amp;quot; a78&amp;quot;(zhong1, middle), &amp;quot;a39&amp;quot; (ba1, bus)are recognized falsely as SCL. Because&amp;quot; a78&amp;quot; and &amp;quot;a39&amp;quot; can also stand for China and Pakistan.&amp;quot; a39&amp;quot; can even stand for other countries such as &amp;quot;a39a44&amp;quot; a45ba1-xi1, Brazil a46.</Paragraph>
      <Paragraph position="2"> Errors in (1) account for about 40% of all errors. SCNE is usually a part of multi-character NE, such as &amp;quot;a19&amp;quot;, &amp;quot; a78&amp;quot; in &amp;quot;a19a78a20a21a22a23a24a25a26&amp;quot;.Viterbi search has to make a decision: recognizing the multi-character NE, or recognizing SCNE. Current features we used seem not powful enough to resovle this ambiguity well. Errors in (3) come from another kind of ambiguities such as ambiguity between SCNE and normal lexicon words. They are partly caused by noises in training data, because SCNE are very likely to be neglected by annotators, which makes training data more sparse. Both errors in (1) and (3) are not easy to handle.</Paragraph>
      <Paragraph position="3"> Our immediate work is to cope with errors in (2), which account for about 8.9% of all errors. We can obtain additional SCNE entries from resources such as abbreviation dictionaries. However, the procedure to select SENE entries should be careful, because the SCNE characters we do not cover currently might be rare to act as SCNE, and difficult to recall. Besides, unsupervised methods can be applied to the task, considering insufficiency of the training data of the task.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML