File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0423_intro.xml
Size: 1,414 bytes
Last Modified: 2025-10-06 14:01:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0423"> <Title>Named Entity Recognition with a Maximum Entropy Approach</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 0 otherwise </SectionTitle> <Paragraph position="0"> The parameters aj are estimated by a procedure called Generalized Iterative Scaling (GIS) (Darroch and Ratcliff, 1972). This is an iterative procedure that improves the estimation of the parameters at each iteration.</Paragraph> <Paragraph position="1"> The maximum entropy classifier is used to classify each word as one of the following: the beginning of a NE (B tag), a word inside a NE (C tag), the last word of a NE (L tag), or the unique word in a NE (U tag).</Paragraph> <Paragraph position="2"> During testing, it is possible that the classifier produces a sequence of inadmissible classes (e.g., PER-B followed by LOC-L). To eliminate such sequences, we define a transition probability between word classes P(ci|cj) to be equal to 1 if the sequence is admissible, and 0 otherwise. The probability of the classes c1,...,cn assigned to the words in a sentence s in a document D is defined as follows:</Paragraph> <Paragraph position="4"> where P(ci|s,D) is determined by the maximum entropy classifier. The Viterbi algorithm is then used to select the sequence of word classes with the highest probability.</Paragraph> </Section> class="xml-element"></Paper>