File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2902_evalu.xml
Size: 1,356 bytes
Last Modified: 2025-10-06 13:59:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2902"> <Title>Analysis and Processing of Lecture Audio Data: Preliminary Investigations</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Ongoing and Future Activities </SectionTitle> <Paragraph position="0"> The technical language of academic lectures and lack of in-domain spoken data for training makes lecture transcription a significant challenge, that will require new methods for deriving a vocabulary and language model.</Paragraph> <Paragraph position="1"> To enable effective use of comparable textual material as a surrogate for in-domain spoken data, we plan to investigate techniques to transform written text into a conversation style that can be used for language modelling. We are also exploring a lecture-independent recognizer structure that uses a small number of words common to lecture discourse along with a sub-word model to represent subject-specific words.</Paragraph> <Paragraph position="2"> Finally, we plan to continue to collect and compile lecture material into a comprehensive annotated corpus.</Paragraph> <Paragraph position="3"> It is our plan to make this resource available to the research community, in the hope that it will facilitate speech and language processing research in this area.</Paragraph> </Section> class="xml-element"></Paper>