File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-1005_intro.xml

Size: 3,061 bytes

Last Modified: 2025-10-06 14:01:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1005">
  <Title>Automatic Acquisition of Names Using Speak and Spell Mode in Spoken Dialogue Systems</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> In the past, many researchers have worked on letter-to-sound algorithms for text-to-speech conversion (Damper et al., 1998). More recently, research is beginning to emerge in bi-directional sound-letter generation and phoneme-to-grapheme conversion. These topics are important for application to speech recognition, for the purpose of automatically transcribing out-of-vocabulary (OOV) words at the spoken input.</Paragraph>
    <Paragraph position="1"> In (Meng et al., 1996), a hierarchical approach was used for bi-directional sound-letter generation. On the Brown Corpus, it achieves word accuracies of 65% for spelling-to-pronunciation and 51% for pronunciation-tospelling. Rentzepopoulos (Rentzepopoulos and Kokkinakis, 1996) describes a hidden Markov model approach for phoneme-to-grapheme conversion, in seven European languages on a number of corpora. The algorithm gave high accuracies when applied to correctly transcribed words but was not applied to real recognition output. The work of Marchand and Damper (Marchand and Damper, 2000) addresses both phoneme-to-grapheme and grapheme-to-phoneme conversion using a fusion of data-driven and pronunciation-by-analogy methods, obtaining word accuracies of 57.7% and 69.1% for phoneme-to-grapheme and grapheme-to-phoneme experiments respectively. These were performed on a corpus of words from a general dictionary.</Paragraph>
    <Paragraph position="2"> Some work has focused on proper names, since names are a particularly challenging open set. In (Ngan et al., 1998), the problem of generating pronunciations for proper names is addressed. A 45.5% word error rate is reported on a set of around 4500 names using a decision tree method. Font Llitjos (Font Llitjos and Black, 2001) reports improvements on letter-to-sound performance on names by adding language origin features, reporting 61.72% word accuracy on 56000 names. Galescu (Galescu and Allen, 2002) addresses bi-directional sound-letter generation using a data-driven joint a0 -gram method on proper nouns, yielding around 41% word accuracy for letter-to-sound and 68% word accuracy for sound-to-letter.</Paragraph>
    <Paragraph position="3"> Few have attempted to convert a spoken waveform with an unknown word to a grapheme sequence. Using a Dutch corpus, Decadt et al. (Decadt et al., 2002) use a memory-based phoneme-to-grapheme converter to derive graphemic output from phonemic recognition hypotheses. Results showed 46.3% accuracy on training data but only 7.9% accuracy on OOV recognition test data. In a German system, Schillo (Schillo et al., 2000) built a grapheme recognizer for isolated words, towards the goal of unconstrained recognition in German. Accuracies attained are up to 72.89% for city names.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML