File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2902_evalu.xml

Size: 1,356 bytes

Last Modified: 2025-10-06 13:59:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2902">
  <Title>Analysis and Processing of Lecture Audio Data: Preliminary Investigations</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Ongoing and Future Activities
</SectionTitle>
    <Paragraph position="0"> The technical language of academic lectures and lack of in-domain spoken data for training makes lecture transcription a significant challenge, that will require new methods for deriving a vocabulary and language model.</Paragraph>
    <Paragraph position="1"> To enable effective use of comparable textual material as a surrogate for in-domain spoken data, we plan to investigate techniques to transform written text into a conversation style that can be used for language modelling. We are also exploring a lecture-independent recognizer structure that uses a small number of words common to lecture discourse along with a sub-word model to represent subject-specific words.</Paragraph>
    <Paragraph position="2"> Finally, we plan to continue to collect and compile lecture material into a comprehensive annotated corpus.</Paragraph>
    <Paragraph position="3"> It is our plan to make this resource available to the research community, in the hope that it will facilitate speech and language processing research in this area.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML