File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2001_intro.xml
Size: 1,125 bytes
Last Modified: 2025-10-06 14:03:32
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2001"> <Title>Factored Neural Language Models</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Neural Language Models </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> to a row in the matrixM. The output is next word's probability distribution.</Paragraph> <Paragraph position="3"> A standard NLM (Fig. 1) takes as input the previous n [?] 1 words, which select rows from a continuous word representation matrix M. The next layer's input i is the concatenation of the rows in M corresponding to the input words. From here, the network is a standard multi-layer perceptron with hidden layer h = tanh(i [?] Wih + bh) and output layer o = h [?] Who + bo. where bh,o are the biases on the respective layers. The vector o is normalized by the softmax function fsoftmax(oi) = eoiP|V |</Paragraph> <Paragraph position="5"> ters, including the M matrix, which is shared across input words. The training criterion maximizes the regularized log-likelihood of the training data.</Paragraph> </Section> class="xml-element"></Paper>