File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4028_intro.xml

Size: 1,981 bytes

Last Modified: 2025-10-06 14:02:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4028">
  <Title>Confidence Estimation for Information Extraction</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Conditional Random Fields
</SectionTitle>
    <Paragraph position="0"> Conditional random fields (Lafferty et al., 2001) are undirected graphical models to calculate the conditional probability of values on designated output nodes given values on designated input nodes. In the special case in which the designated output nodes are linked by edges in a linear chain, CRFs make a first-order Markov independence assumption among output nodes, and thus correspond to finite state machines (FSMs). In this case CRFs can be roughly understood as conditionally-trained hidden Markov models, with additional flexibility to effectively take advantage of complex overlapping features.</Paragraph>
    <Paragraph position="1"> Let o = &lt;o1,o2,...oT&gt; be some observed input data sequence, such as a sequence of words in a document (the values on T input nodes of the graphical model). Let S be a set of FSM states, each of which is associated with a label (such as COMPANYNAME). Let s = &lt;s1,s2,...sT&gt; be some sequence of states (the values on T output nodes).</Paragraph>
    <Paragraph position="2"> CRFs define the conditional probability of a state sequence given an input sequence as</Paragraph>
    <Paragraph position="4"> where Zo is a normalization factor over all state sequences, fk(st[?]1,st,o,t) is an arbitrary feature function over its arguments, and lk is a learned weight for each feature function. Zo is efficiently calculated using dynamic programming. Inference (very much like the Viterbi algorithm in this case) is also a matter of dynamic programming. Maximum aposteriori training of these models is efficiently performed by hill-climbing methods such as conjugate gradient, or its improved second-order cousin, limited-memory BFGS.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML