File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1047_evalu.xml
Size: 3,101 bytes
Last Modified: 2025-10-06 14:00:30
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1047"> <Title>Learning a syntagmatic and paradigmatic structure from language data with a bi-multigram model</Title> <Section position="7" start_page="303" end_page="139256" type="evalu"> <SectionTitle> 4.2 Results </SectionTitle> <Paragraph position="0"> The perplexity scores obtained with the non-class, class and interpolated versions of a bi-multigram model (limiting to 2 words the size of a phrase), and of the bigram and trigram models are in Table 5. Linear interpolation with the class based models allows us to improve each model's performance by about 2 points of perplexity: the Viterbi perplexity score of the interpolated bi-multigrams (43.5) remains intermediate between the bigram (54.7) and trigram (38.6) scores. However in the trigram case, the enhancement of the performance is obtained at the expense of a great increase of the number of entries in the interpolated model (139256 entries).</Paragraph> <Paragraph position="1"> In the bi-multigram case, the augmentation of the model size is much less (63972 entries). As a result, the interpolated bi-multigram model still has fewer entries than the word based trigram model (75511 entries), while its Viterbi perplexity score comes even closer to the word trigram score (43.5 versus 40.4). Further experiments studying the influence of the threshold values and of the number of classes still need to be performed to optimize the performances for all models.</Paragraph> <Paragraph position="2"> and of class-word bigrams and trigrams: Test perplexity values and model size.</Paragraph> <Section position="1" start_page="139256" end_page="139256" type="sub_section"> <SectionTitle> 4.3 Examples </SectionTitle> <Paragraph position="0"> Clustering variable-length phrases may provide a natural way of dealing with some of the language disfluencies which characterize spontaneous utterances, like the insertion of hesitation words for instance. To illustrate this point, examples of phrases which were merged into a common cluster during the training of a model allowing phrases of up to n = 5 words are listed in Table 6 (the phrases containing the hesitation marker &quot;*uh*&quot; are in the upper part of the table). It is often the case that phrases differing mainly because of a speaker hesitation are merged together.</Paragraph> <Paragraph position="1"> Table 6 also illustrates another motivation for phrase retrieval and clustering, apart from word prediction, which is to address issues related to topic identification, dialogue modeling and language understanding (Kawahara et al., 1997). Indeed, though the clustered phrases in our experiments were derived fully blindly, i.e. with no semantic/pragmatic information, intra-class phrases often display a strong semantic correlation. To make this approach effectively usable for speech understanding, constraints derived from semantic or pragmatic knowledge (like speech act tag of the utterance for instance) could be placed on the phrase clustering process.</Paragraph> </Section> </Section> class="xml-element"></Paper>