File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/w97-0605_concl.xml

Size: 1,786 bytes

Last Modified: 2025-10-06 13:57:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0605">
  <Title>AUTOMATIC LEXICON ENHANCEMENT BY MEANS OF CORPUS TAGGING</Title>
  <Section position="9" start_page="31" end_page="31" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> The aim of this study was the automatic production of a lexicon from corpus dedicated to some specific areas. The results obtained satisfy this goal. Indeed, taking into account all the occurrences of the unknown words of a text corpus permits us to automatically produce lexicons containing, for each entry, a list of possible syntactic classes with frequency information.</Paragraph>
    <Paragraph position="1"> The integration of these lexicons within a linguistic module, points out the problem of the dynamic adaptation of the language model. This should be dealt with by means of a cache-based language model (Kuhn, 1990). The resultant lexicons produced contain very few incorrect syntactic classes for each item which is represented in the corpus by a sufficient number of occurrences.</Paragraph>
    <Paragraph position="2"> This lexicon-extraction module has been used within the Text-To-Speech system developed at LIA : before the grapheme-to-phoneme transcription phase, we first extract a lexicon of all the OOV words of the text to process. Then, we add this lexicon to our general lexicon and we use the syntactic labels given to each word to constrain the grapheme-to-phoneme transcription rules as well as the liaisongeneration rules.</Paragraph>
    <Paragraph position="3"> Finally, it is important to point out that the approach chosen in this study remains independent of the processed language, as long as the hypotheses made by the morpho-syntactic Devin are satisfied.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML