File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-4010_metho.xml

Size: 3,470 bytes

Last Modified: 2025-10-06 14:08:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4010">
  <Title>Using N-best Lists for Named Entity Recognition from Chinese Speech</Title>
  <Section position="4" start_page="1" end_page="1" type="metho">
    <SectionTitle>
2 results for German were obtained by systems that use
</SectionTitle>
    <Paragraph position="0"> MaxEnt.</Paragraph>
    <Paragraph position="1"> Natural language can be viewed as a stochastic process. We can use p(y|x) to denote the probability distribution of what we try to predict y (.e.g. part-of-speech tag, Named Entity tag) conditioned on what we observe x (e.g. previous POS or the actual word). The Maximum Entropy principle can be stated as follows: given some set of constrains from observations, find the most uniform probability distribution (Maximum Entropy) p(y|x) that satisfies these constrains:</Paragraph>
    <Paragraph position="3"> In the above equations, f</Paragraph>
    <Paragraph position="5"> ) is a binary valued feature function, and l j is a weight that indicates how important feature f j is for the model. Z(x i ) is a normalization factor. We estimate the weights using the improved iterative scaling (IIS) algorithm. For our task, we first compare a character-based MaxEnt model to a word-based model. Since recognition errors also lead to segmentation errors which in turn have an adverse effect on the NER performance, we experiment with disregarding the word boundaries in the ASR hypothesis and instead resegment using a MaxEnt segmenter. We also compare an approach of one-pass identification/classification to a two-pass approach where the identified NE candidates are classified later. In addition, we propose a hybrid approach of using one-pass identification/classification results, discarding the extracted NE tags, and reclassifying the extracted NE in a second pass.  We exclude from the present focus the slight improvements that are usually possible to obtain by combination of multiple models, usually through ad hoc methods such as voting.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.3 Experimental setup
</SectionTitle>
      <Paragraph position="0"> We use two annotated corpora for training. One is a corpus of People's Daily newspaper from January 1998, annotated by the Institute of Computational Linguistics of Beijing University (the &amp;quot;PFR&amp;quot; corpus). This corpus consists of about 20k sentences, annotated with word segmentation, part-of-speech tags and three named-entity tags including person (PER), location (LOC) and organization (ORG) . We use the first 6k sentences to train our NER system. Our system is then evaluated on 2k sentences from People's Daily and 1k sentences from the BBN ASR output. The results are shown in Tables 1 and 3.</Paragraph>
      <Paragraph position="1"> To compare our system to the IBM baseline described in (Jing et al. 2003), we need to evaluate our system on the same corpus as they used. Among the data they used, the only publicly available corpus is a human-generated transcription of broadcast news, provided by NIST for the Information Extraction - Entity Recognition evaluation (the &amp;quot;IEER&amp;quot; corpus). This corpus consists of 10 hours of training data and 1 hour of test data. Ten categories of NEs were annotated, including person names, location, organization, date, duration, and measure. A comparison of results is shown in Table 2.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML