File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-3099_metho.xml
Size: 7,989 bytes
Last Modified: 2025-10-06 14:12:32
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-3099"> <Title>A Syntactic and Morphological Analyzer for a Text-to-Speech System</Title> <Section position="4" start_page="0" end_page="444" type="metho"> <SectionTitle> 3 UTN Formalism </SectionTitle> <Paragraph position="0"> Morphographemic and morphophonetic rules are written ill a Kimmo-style formalism \[4 i. Unlike the original two-level model, a word grammar is used to parse the lexical strings and to determine the category of the overall word formed by several morphs..To express word and sentence grammars, we have developed a grammar formalism, called Unification-based Transition Networks (UTN), I~s skeleton are nondeterministic reeursive transition networks (RTNs), which are equivalent to comex~free gramnmrs. A transition network speciIies the linear precedence and immediate dominance relation within a constituent. Each label of a transition denotes a preterminal, a constituent or an e-transition. As opposed to labels in RTNs, which are monadic, labels in UTNs are complex categories (features matrices). Each transition contains a set of attribute equations, which specify the constraints that must be satisfied between complex categories in a network. Our notation of attribute equations is very similar to that commonly used in unification-based rule formalisms such as PArR \[5\]. The UTN formalism is fully declarative. It ix based on concatenation and recursion, which is reflected in the topology of the networks, and unification, which is used for matching, equality testing and feature passing. Although the UTN formalism is somewhat similar to ATNs !6!, it is much more concise and elegant because of its simplicity and declarativeness. The implementation of several grammars for German syntax and morphosyntax revealed that transition networks are well-suited to design 1 and test grammars. We believe that this formalism meets the general criteria of linguistic naturalness and mathematical power.</Paragraph> <Paragraph position="1"> in addition, the parsing experiment reported below shows that efficient parsers can be implemented for the UTN formalism.</Paragraph> <Paragraph position="2"> The design of our TTS system requires efIicient parsing algorithms and a flexible parser environment to compare several search and rule invocation strategies. Active chart parsing 18\] is well-suited for that purpose. We have implemented a general chart parser that can be parameterized fox several search and rule invocation strategies. The aim of the experiment reported below was to investigate to what extent a parser can be directed by using the FIRST, FOLLOW and REACHABILITY relations \[9,8} and combinations thereof, thereby reducing the nunrber of edges, the nmnber of applications of the fundamental rule and parsing time.</Paragraph> <Paragraph position="3"> strategy, uses the FIRST relation to test whether the next input symbol is in the FIRST set of the active edge each time an empty active edge is created. Strategy T3, a top-down strategy with lookahead, uses the FOLLOW set to test whether the next im put symbol belongs to the FOLLOW set of the im active edge each time an inactive edge is created.</Paragraph> <Paragraph position="4"> Strategy T4 combines the selectivity of strategy T2 and lookahead of strategy T3. Strategy B1 implements a left-corner algorithm \[19\]. Strategy B2 is ,~ left-corner parser directed by a top-down filter based on the tlEACHABILITY relation \[10\]. Strategy B3 implements a left-corner algorithm with lookahead similar to that of strategy T3, while strategy B4 adds a top-down filter and lookahead to the left-corner al-gorithm. null For the experiment presented h~re, we used a gram-mar (GI) for German syntax:&quot; that has been developed for our TTS system and a grammar (Eli) for English syntax a (GII) to compare our ~esults with those of other experiments (\[7,11,10\]). Our s~entence sets consist of 35 German sentences (set SI, with an average sentence length of 9.8 words) and 39 English sentences (set SII, with an average sentence length of 15.3 words) from Tomita \[7\], pp. 185-189.</Paragraph> <Section position="1" start_page="443" end_page="443" type="sub_section"> <SectionTitle> 4.1 Rule Invocation Strategies </SectionTitle> <Paragraph position="0"> We compared eight parsing strategies, i.e., four top-down (T1 to T4) and four bottom-up (El to B4) strategies. The top-down strategies are variants of Earley's algorithm, the bottom-up strategies variants of the left-corner algorithm \[9\]. T1, a pure top-down strategy, implements Earley's algorithm without lookahead. Strategy T2, a directed top-down 1To compare the UTN formalism with rule-based formalisms, we translated several grammars to transition networks. As an example, the grammar GIII found in Tomita's book \[7\] with about 220 rules was translated to a strongly equivalent network grarrm~ar of 37 transition networks. We got the impression that it is easier to write and modify a network grammar of several dozen networks (that can be displayed and edited graphically) than one of several hundreds of rules.</Paragraph> </Section> <Section position="2" start_page="443" end_page="444" type="sub_section"> <SectionTitle> 4.3 Results </SectionTitle> <Paragraph position="0"> Tables 1 and 2 show the results of parsing sets SI and SII with grammars GI and GII, respectively. We measured for all strategies (T1 to B4) the number of active (AE) and inactive (IE) edges, the total number of edges (TOT = AE+IE) and parsing time 4 (TIME). Since the UTN formalism is based on unification, a time- and space-consuming operation, we also indicate the number of applications of the fundamental rule (FR) to show the relation between parsing strategy and FR applications.</Paragraph> <Paragraph position="1"> Our experiments confirm the results of Shann and Wir6n \[10,1111 that parsing efficiency depends heavily on the grammar, the language, the grammar formalism and the sentence set. Nevertheless, by carefully tuning a parsing strategy, a significant increase in efficiency is gained.</Paragraph> <Paragraph position="2"> frndireeted top-down parsing performs better than undirected bottom-up. This coincides with the results of Wir6n. Directed strategies 5 outperform undirected strategies with respect to parsing time and memory. This holds for top-down and bottom-up strategies.</Paragraph> <Paragraph position="3"> Previous experiments \[11,10,7\] did not investigate the influence of lookahead in top-down parsing. However, using lookahead (the FOLLOW relation) sig:tfificantly reduces the number of edges, the numbex of applications of the fundamental rule and parsing time.</Paragraph> <Paragraph position="4"> Directed top-down parsing with lookahead is as fast as left-corner parsing with top-down fih.ering and lookahead. The difference between the two strategies is statistically insignificant when considering all experiments conducted with all German grammars and several sentence sets. However, it is uncertain to what extent this slatement can be generalized to other types of grammars and languages. Ba~sed on the results of our experiments, both strategies (T4 and B4) are suited as main strategies in our TTS system.</Paragraph> </Section> </Section> <Section position="5" start_page="444" end_page="444" type="metho"> <SectionTitle> 5 Concluding Remarks </SectionTitle> <Paragraph position="0"> We have presented a language~-independenl model for syntactic and morphological analysis. Special emphasis has been laid on the description of the UTN formalism and a parser experiment which compar{'d different rule invocation strategies. The analyzer is fully implemented in Common Lisp and its application in a text-to-speech system has signitleantly improved the quality of the synthetic speech. Since the grapheme-to-phoneme conversion is bidireclional, our approach may also be promising for speech recognition.</Paragraph> </Section> class="xml-element"></Paper>