File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1055_metho.xml
Size: 7,157 bytes
Last Modified: 2025-10-06 14:13:50
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1055"> <Title>Phonological Parsing for Bi-directional Letter-to-Sound/Sound-to-Letter Generation 1</Title> <Section position="4" start_page="289" end_page="289" type="metho"> <SectionTitle> A HIERARCHICAL LEXICAL REPRESENTATION </SectionTitle> <Paragraph position="0"> It has long been realized from research in speech synthesis that a variety of linguistic knowledge sources play an important role in determining English letter/sound correspondences \[8\]. For example, part-of-speech causes the noun and verb forms of &quot;record&quot; to be pronounced differently. A morphological boundary causes the letter sequence &quot;sch&quot; in &quot;discharge&quot; to be realized differently from that in &quot;school&quot; or &quot;scheme&quot;. Stress changes the identity of vowels in a word, e.g. &quot;define&quot; vs. &quot;definition&quot;. Also, syllabic constraints are expressible in terms of the sequential ordering of distinctive features - sonority sequencing in manner features, and phonotactic constraints in place and voicing features. Furthermore, there are graphemic constraints for letter to letter transitions.</Paragraph> <Paragraph position="1"> A novel feature Of our system is that multiple layers of representation are incorporated to capture short and long distance constraints. These include word class, morphs, syllables, manner classes, phonemes and graphemes.</Paragraph> <Paragraph position="2"> We created a framework which describes the spelling and pronunciation of English words using only a small inventory of labels associated with the aforementioned WORD 1. Top-level PRE ROOT SUF ISUF 2. Morphology I i J I SSYL1 RSYL RSYL 3. Syl. Stress o. o.oc r oloc co r, .oc coo, STOP VOW STOP VOW STOP VOW STOP VOW STOP 5. Broadclesses i I I I I I i \] i d d 0 k d 6. Phonemes ill Iiiil d * d I C/ a te * d 7. Graphemes layers indicated numerically.</Paragraph> <Paragraph position="3"> morphological and phonological units. These units are organized as a hierarchical tree structure, where the various levels of linguistic knowledge are collectively used to describe orthographic-phonological regularities. Figure 1 illustrates the description of the word &quot;dedicated&quot;. The higher levels encode longer distance constraints, while the lower levels carry more local constraints. By allowing the terminal nodes to be dual in nature (i.e., representing either phones or letters), we can create direct symmetry between the letter-to-sound and sound-to-letter generation tasks simply by swapping the input/output specification.</Paragraph> <Paragraph position="4"> One should note in Figure 1 that \[*\] is a graphemic &quot;place-holder&quot; introduced to maintain consistency between the representations of the words &quot;dedicate&quot; and &quot;dedicated&quot;, where an inflexional suffix \[ISUF\] has been attached to the latter word. Another noteworthy detail is the special \[M-ONSET\] category, which signifies that the letter 'c' should belong to the root &quot;-dic-&quot;,2 but has. become a moved onset of the next syllable due to syllabification principles such as the Maximal Onset Principle and the Stress Resyllabification Principle. a</Paragraph> </Section> <Section position="5" start_page="289" end_page="290" type="metho"> <SectionTitle> THE PARSING ALGORITHM </SectionTitle> <Paragraph position="0"> We are adopting a technique that represents a cross between explicit rule-driven strategies and strictly data-driven approaches. About 100 generalized context-free rules, such as those illustrated in Table 1 are written by hand, and training words are parsed using TINA \[9\], according to their marked linguistic specifications. The parse trees of format as show in Figure 1 are then used to train the probabilities in a set of &quot;layered bigrams&quot; \[10\]. We have chosen a probabilistic parsing paradigm for four reasons: First, the probabilities serve to augment the known structural regularities that can be encoded in simple rules with other structural regularities which may be automatically discovered from a large body of training data. Secondly, since the more probable parse theories are distinguished from the less probable ones, search efforts can selectively concentrate on the high probability theories, which is an effective mechanism for perplexity reduction. Thirdly, probabilities are less rigid than rules, and adopting a probabilistic framework allows us to easily generate multiple parse theories. Fourthly, the flexibility of a probabilistic framework also enables us to automatically relax constraints to attain better coverage of the data.</Paragraph> <Section position="1" start_page="290" end_page="290" type="sub_section"> <SectionTitle> Training Procedure </SectionTitle> <Paragraph position="0"> The layered bigrams formalism attaches probabilities to sibling-sibling transitions in context-free grammar rules. It has been shown to achieve a low perplexity at the linguistic level within the ATIS domain \[10\]. For our current sub-word application, we have modified the layered-bigrams in two ways: (1) parse trees are generated in a bottom-up fashion instead of top-down, and (2) the contextual information used in bottom-up prediction includes the complete history in the immediate left column.</Paragraph> <Paragraph position="1"> Our experimental corpus consists of the 10,000 most frequent words appearing in the Brown Corpus \[11\], where each word entry contains a spelling and a single unaligned phoneme string. We used about 8,000 words for training, and a disjoint set of about 800 words for testing.</Paragraph> <Paragraph position="2"> The set of training probabilities are estimated by tabulating counts using the training parse trees. 4 It includes bottom-up prediction probabilities for each category in the parse tree, and column advancement probabilities for extending a column to the next terminal. The same set of probabilities are used for both letter-to-sound and sound-to-letter generation.</Paragraph> </Section> <Section position="2" start_page="290" end_page="290" type="sub_section"> <SectionTitle> Testing Procedure </SectionTitle> <Paragraph position="0"> In letter-to-sound generation, the system takes in a spelling as an input, generates a parse tree in a bottom4See \[1\] for a more detailed description of this process. up left-to-right fashion, and derives a phonemic pronunciation from the complete parse: In sound-to-letter generation, the system accepts a string of phonemes as input, and generates letters. An inadmissible stack decoding search algorithm is adopted for its simplicity. If multiple hypotheses are desired, the algorithm can terminate after multiple complete hypotheses have been popped off the stack. These hypotheses are subsequently re-ranked according to their actual parse score. Though our search is inadmissible, we are able to obtain multiple hypotheses inexpensively with satisfactory performance.</Paragraph> </Section> </Section> class="xml-element"></Paper>