File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1167_concl.xml

Size: 1,814 bytes

Last Modified: 2025-10-06 13:53:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1167">
  <Title>Statistical Language Modeling with Performance Benchmarks using Various Levels of Syntactic-Semantic Information</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusions and Research
Direction
</SectionTitle>
    <Paragraph position="0"> We presented the effect of incorporating various levels of syntactic information in a statistical language model that uses the mathematical framework called syntactically enhanced LSA.</Paragraph>
    <Paragraph position="1"> SELSA is an attempt to develop a unified framework where syntactic and semantic dependencies can be jointly represented. It generalizes the LSA framework by incorporating various levels of the syntactic information along with the current word. This provides a mechanism for statistical language modeling where the probability of a word given the semantics of the preceding words is constrained by the adjacent syntax. The results on WSJ corpus sets a set of benchmarks for the performance improvements possible with these types of syntactic information. The supertag based information is very fine-grained and thus leads to a large reduction in perplexity if correct supertag is known. It is also observed that the knowledge of the phrase type also helps to reduce the perplexity compared to LSA. Even the knowledge of the content/function word type helps additionally in each of the SELSA based language models. These benchmarks can be approached with better algorithms for predicting the necessary syntactic information. Our experiments are still continuing in this direction as well as toward better understanding of the overall statistical language modeling problem with applications to speech recognition.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML