File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-3237_abstr.xml

Size: 1,423 bytes

Last Modified: 2025-10-06 13:44:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3237">
  <Title>Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lot</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> A novel technique for maximum &amp;quot;a posteriori&amp;quot; (MAP) adaptation of maximum entropy (MaxEnt) and maximum entropy Markov models (MEMM) is presented.</Paragraph>
    <Paragraph position="1"> The technique is applied to the problem of recovering the correct capitalization of uniformly cased text: a &amp;quot;background&amp;quot; capitalizer trained on 20Mwds of Wall Street Journal (WSJ) text from 1987 is adapted to two Broadcast News (BN) test sets -one containing ABC Primetime Live text and the other NPR Morning News/CNN Morning Edition text -- from 1996.</Paragraph>
    <Paragraph position="2"> The &amp;quot;in-domain&amp;quot; performance of the WSJ capitalizer is 45% better than that of the 1-gram baseline, when evaluated on a test set drawn from WSJ 1994. When evaluating on the mismatched &amp;quot;out-ofdomain&amp;quot; test data, the 1-gram baseline is outperformed by 60%; the improvement brought by the adaptation technique using a very small amount of matched BN data -- 25-70kwds -- is about 20-25% relative. Overall, automatic capitalization error rate of 1.4% is achieved on BN data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML