File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-1047_concl.xml

Size: 4,163 bytes

Last Modified: 2025-10-06 13:58:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1047">
  <Title>Learning a syntagmatic and paradigmatic structure from language data with a bi-multigram model</Title>
  <Section position="8" start_page="139256" end_page="139256" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> An algorithm to derive variable-length phrases assuming bigram dependencies between the phrases has been proposed for a language modeling task. It has been shown how a paradigmatic element could</Paragraph>
    <Paragraph position="2"> cluster, with a model allowing up to 5-word phrases (clusters are delimited with curly brackets) be integrated within this framework, allowing to assign common labels to phrases having a different length. Experiments on a task oriented corpus have shown that structuring sentences into phrases results in large reductions in the bigram perplexity value, while still keeping the number of entries in the language model nmch lower than in a trigram model, especially when these models are interpolated with class based models. These results might be further improved by finding a more efficient pruning strategy, allowing the learning of even longer dependencies without over-training, and by further experimenting with the class version of the phrase-based model.</Paragraph>
    <Paragraph position="3"> Additionally, the semantic relevance of the clusters of phrases motivates the use of this approach in the areas of dialogue modeling and language understanding. In that case, semantic/pragmatic informations could be used to constrain the clustering of the phrases.</Paragraph>
    <Paragraph position="4"> Appendix: Forward-backward algorithm for the estimation of the bi-multigram parameters Equation (4) can be implemented at a complexity of O(n~T), with n the maximal length of a sequence and T the number of words in the corpus, using a forward-backward algorithm. Basically, it consists in re-arranging the order of the summations of the numerator and denominator of Eq. (4): the likelihood values of all the segmentations where sequence sj occurs after sequence si, with sequence si ending at the word at rank (t), are summed up first; and then the summation is completed by summing over t. The cumulated likelihood of all the segmentations where sj follows si, and si ends at (t), can be directly computed as a product of a forward and of a backward variable. The forward variable represents the likelihood of the first t words, where the last li words are constrained to form a sequence: = The backward variable represents the conditional likelihood of the last ( T -t) words, knowing that they are preceded by the sequence \[w(t_zi+l)...w(0\]: = Assuming that the likelihood of a parse is computed according to Eq. (2), then the reestimation equation (4) can be rewritten as shown in Tab. 7.</Paragraph>
    <Paragraph position="5"> The variables a and/3 can be calculated according to the following recursion equations (assuming a start and an end symbol at rank t = 0 and t = T+I):  p(k+l)(s j \[Si) .- ~T=I O~(t, It) p(k)(Sj ISi) ~(t &amp;quot;1- lj, lj) 6i(t -- li -}- 1) 6j(t + 1) E, ~(t, li) l~(t, It) 6i(t--li+l) li and lj refer respectively to the lengths of the sequences si and sj, and where the Kronecker function 5k(t) equals 1 if the word sequence starting at rank t is sk, and equals 0 if not.  for 1 &lt; t &lt; T+ 1, and 1 &lt; ii &lt;_ n:</Paragraph>
    <Paragraph position="7"> In the case where the likelihood of a parse is computed with the class assumption, i.e. according to (6), the term p(k)(sj \[st) in the reestimation equation shown in Table 7 should be replaced by its class equivalent, i.e. by p(k)(Cq(, D ICq(,,)) p(k)(sj \[ Cq(,D). In the recursion equation of ~, the term p(\[W~)_t,+l)\]l\[Wft_Tt'__~+l\]) is replaced by the corresponding class bigram probability multiplied by the class conditional probability of the sequence \[W~_)t,+l)\]. A similar change affects the recursion equation of ~, with P(tW~::~l\]ltW~:)b+,)\]) being replaced by the corresponding class bigram probability multiplied by the class conditional probability of the sequence</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML