File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1111_evalu.xml
Size: 1,829 bytes
Last Modified: 2025-10-06 14:00:36
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1111"> <Title>Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS</Title> <Section position="6" start_page="678" end_page="678" type="evalu"> <SectionTitle> 5 Implementation and Experiment </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="678" end_page="678" type="sub_section"> <SectionTitle> Results </SectionTitle> <Paragraph position="0"> We implemented simple phrase-break detection rules from break and POS tagged corpus collected from recording and transcribing broadcasting news. The rules reflect the fact that average length of phrases in Korean is 5.6 words and over 90% of breaks are after 6 specific POS tags, described in the texts.</Paragraph> <Paragraph position="1"> We constructed a 1,992 entry morpheme phonetic pattern dictionary for OOV morpheme processing using standard Korean phonological rules. The morpheme phonetic dictionary was constructed for only the morphemes that are difficult to handle with these standard rules.</Paragraph> <Paragraph position="2"> The two dictionaries are indexed using POS tag and morpheme pattern for fast access. To model the boundary phonemes' connectablity to one another, the phoneme connectivity table encodes 626 pair of phonologically connectable morphemes.</Paragraph> <Paragraph position="3"> The 2030 entry rule set for CCV conversion was automatically learned from phonetically transcribed 9,773 sentences. The independent phonetically transcribed 4,973 sentences are used to test the performance of the grapheme-to-phoneme conversion. Of the 4,973 sentences, only 2.5% are incorrectly processed (120 sentences out of 4,973), and only 0.1% of the graphemes in the sentences are actually incorrectly converted.</Paragraph> </Section> </Section> class="xml-element"></Paper>