File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0104_concl.xml
Size: 1,802 bytes
Last Modified: 2025-10-06 13:54:09
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0104"> <Title>Automatic Acquisition of Feature-Based Phonotactic Resources</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> An important pre-requisite for the development of robust multilingual speech technology applications is the availability of language resources at varying levels of granularity. This paper has presented generic techniques for acquisition of language-specific phonotactic resources. The techniques were exemplified using a small data set for Italian,4 but scale to larger data sets and can be applied to any language. Although the induction techniques as described here assume that data is annotated at the syllable level, only very few corpora are actually annotated at this level; a more usual annotation is at the phonemic level. As a result, a cyclical learning procedure has been developed which learns as syllable annotation is being performed and uses the phonotactic automaton developed thus far to predict syllable boundaries for annotation support (Kelly, 2004b). The work presented in this paper represents one specific step towards the provision of fine-grained representations for speech recognition and 4Due to space constraints this paper only includes selected examples of the acquired resources.</Paragraph> <Paragraph position="1"> Additional information is publicly available at http://muster.ucd.ie/sigphon/. This includes the complete annotation alphabet (phoneme and feature set), the typed feature system and complete state diagrams for all phonotactic automata.</Paragraph> <Paragraph position="2"> synthesis based on a combination of data-driven and user-driven techniques.</Paragraph> </Section> class="xml-element"></Paper>