File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2001_intro.xml

Size: 1,125 bytes

Last Modified: 2025-10-06 14:03:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2001">
  <Title>Factored Neural Language Models</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Neural Language Models
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> to a row in the matrixM. The output is next word's probability distribution.</Paragraph>
    <Paragraph position="3"> A standard NLM (Fig. 1) takes as input the previous n [?] 1 words, which select rows from a continuous word representation matrix M. The next layer's input i is the concatenation of the rows in M corresponding to the input words. From here, the network is a standard multi-layer perceptron with hidden layer h = tanh(i [?] Wih + bh) and output layer o = h [?] Who + bo. where bh,o are the biases on the respective layers. The vector o is normalized by the softmax function fsoftmax(oi) = eoiP|V |</Paragraph>
    <Paragraph position="5"> ters, including the M matrix, which is shared across input words. The training criterion maximizes the regularized log-likelihood of the training data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML