File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-1007_concl.xml

Size: 2,038 bytes

Last Modified: 2025-10-06 13:54:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1007">
  <Title>Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm</Title>
  <Section position="5" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have contrasted two approaches to discriminative language model estimation on a difficult large vocabulary task, showing that they can indeed scale effectively to handle this size of a problem. Both algorithms have their benefits. The perceptron algorithm selects a relatively small subset of the total feature set, and requires just a couple of passes over the training data. The CRF algorithm does a better job of parameter estimation for the same feature set, and is parallelizable, so that each pass over the training set can require just a fraction of the real time of the perceptron algorithm.</Paragraph>
    <Paragraph position="1"> The best scenario from among those that we investigated was a combination of both approaches, with the output of the perceptron algorithm taken as the starting point for CRF estimation.</Paragraph>
    <Paragraph position="2"> As a final point, note that the methods we describe do not replace an existing language model, but rather complement it. The existing language model has the benefit that it can be trained on a large amount of text that does not have speech transcriptions. It has the disadvantage of not being a discriminative model. The new language model is trained on the speech transcriptions, meaning that it has less training data, but that it has the advantage of discriminative training - and in particular, the advantage of being able to learn negative evidence in the form of negative weights on n-grams which are rarely or never seen in natural language text (e.g., &amp;quot;the of&amp;quot;), but are produced too frequently by the recognizer. The methods we describe combines the two language models, allowing them to complement each other.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML