File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0423_intro.xml

Size: 1,414 bytes

Last Modified: 2025-10-06 14:01:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0423">
  <Title>Named Entity Recognition with a Maximum Entropy Approach</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
0 otherwise
</SectionTitle>
    <Paragraph position="0"> The parameters aj are estimated by a procedure called Generalized Iterative Scaling (GIS) (Darroch and Ratcliff, 1972). This is an iterative procedure that improves the estimation of the parameters at each iteration.</Paragraph>
    <Paragraph position="1"> The maximum entropy classifier is used to classify each word as one of the following: the beginning of a NE (B tag), a word inside a NE (C tag), the last word of a NE (L tag), or the unique word in a NE (U tag).</Paragraph>
    <Paragraph position="2"> During testing, it is possible that the classifier produces a sequence of inadmissible classes (e.g., PER-B followed by LOC-L). To eliminate such sequences, we define a transition probability between word classes P(ci|cj) to be equal to 1 if the sequence is admissible, and 0 otherwise. The probability of the classes c1,...,cn assigned to the words in a sentence s in a document D is defined as follows:</Paragraph>
    <Paragraph position="4"> where P(ci|s,D) is determined by the maximum entropy classifier. The Viterbi algorithm is then used to select the sequence of word classes with the highest probability.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML