File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1021_concl.xml

Size: 2,476 bytes

Last Modified: 2025-10-06 13:53:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1021">
  <Title>Training Connectionist Models for the Structured Language Model a0</Title>
  <Section position="8" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> By using connectionist models in the SLM, we achieved significant improvement in PPL over the baseline trigram and SLM. The neural network enhenced SLM resulted in a language model that is much less correlated with the baseline Kneser-Ney smoothed trigram than the Kneser-Ney smoothed SLM. Overall, the best studied model gave a 21% relative reduction in PPL over the trigram and 8.7% relative reduction over the corresponding Kneser-Ney smoothed SLM. A new EM training procedure improved the performance of the SLM even further when applied to the neural network models.</Paragraph>
    <Paragraph position="1"> However, reduction in PPL for a language model does not always mean improvement in performance of a real application such as speech recognition.</Paragraph>
    <Paragraph position="2"> Therefore, future study on applying the neural network enhenced SLM to real applications needs to be carried out. A preliminary study in (Emami et al., 2003) already showed that this approach is promising in reducing the word error rate of a large vocabulary speech recognizer.</Paragraph>
    <Paragraph position="3"> There are still many interesting problems in applying the neural network enhenced SLM to real applications. Among those, we think the following are of most of interest:  Speeding up the stochastic gradient descent algorithm for neural network training: Since training the neural network models is very time-consuming, it is essential to speed up the training in order to carry out many more interesting experiments.</Paragraph>
    <Paragraph position="4">  Interpreting the word representations learned in this framework: For example, word clustering, context clustering, etc. In particular, if we use separate mapping matrices for word/NT/POS at different positions in the context, we may be able to learn very different representations of the same word/NT/POS.</Paragraph>
    <Paragraph position="5"> Bearing all the challenges in mind, we think the approach presented in this paper is potentially very powerful for using the entire partial parse structure as the conditioning context and for learning useful features automatically from the data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML