File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2016_metho.xml
Size: 14,682 bytes
Last Modified: 2025-10-06 14:12:25
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-2016"> <Title>Integrating Stress and Intonation into a Concept-to-Speech System</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. The Surface Generator </SectionTitle> <Paragraph position="0"> The deep structure which forms the input to the surface generator consists of a hierarchical structure of essentially two building blocks: CLAUSEs, which roughly correspond to entire sentences and PHRASEs like NPs, PPs or APs (fig.</Paragraph> <Paragraph position="1"> 1). A PHRASE can be modified by other PHRASE type noun, adj head *lxm*wid ....</Paragraph> <Paragraph position="2"> mods <phrase>, <clause> I feats ..... I PHRASE-FEATURES det def, indef ....</Paragraph> <Paragraph position="3"> betont t, nil vorfeld t, nil pron., t, nil ........</Paragraph> <Paragraph position="4"> case e-zero, ,., num si,ng, plur, 10 ....</Paragraph> <Paragraph position="5"> Fig. 1 I_l PHRASEs or CLAUSEs, thus forming a hierarchical structure for complex utterances (Dorffner, Kommenda & Frenkenberger 1988). Surface generation now works on this hierarchical structure of building blocks and transfers it into a surface structure consisting of phonemic strings which are subsequently synthesized. Our generator differs from the often encountered two-step approach - generate the syntactic tree with lexical items as its leaves and morphological and other features attached to them, then scan all its leaves and synthesize the lexical elements (see e.g. McDonald 1983) - in an important way, for reasons of efficiency and plausibility. The deep structure, as introduced above, was designed so as to already correspond to the surface structure of the sentence 1), except for aspects of order and function words. In other words, the (unordered) hierarchy of deep structure building blocks is isomorphic (after order has been imposed) to the syntactic tree structure of the surface sentence. This can be easily achieved in German, where constituent order is much less strict than in other languages, such as English. As a result of this property of German, the position of phrases within a sentence is not tied to their functional role and thus does not have to be reflected in the deep syntactic structure. This design of a deep structure as being isomorphic to surface structure implies a simplification in the surface generator, compared to the two-step approach mentioned above: The surface tree does not have to be produced entirely before lexical items can be synthesized, but can unfold while the hierarchy of building blocks is scanned recursively.</Paragraph> <Paragraph position="6"> The process of surface generation is as follows: For each CLAUSE or PHRASE, a corresponding surface building block (e.g. an NP) is generated, depending on their features and lexical heads (fig. 2). Such a building block contains slots for either pointers to other building blocks or lexical items in their correct order. Now each slot can be scanned and synthesized (if it contains a lexical item) or recursively treated like the other building blocks is available during the process, but the syntactic tree never exists in its entirety. Furthermore, indices have to be produced (during synthesis of lexical items) before the remainder of the syntactic structure has unfolded. At first sight this looks like a major restriction and reduction of available information. As it turns out, however, the approaches of Kiparski and Bierwisch can both be modified so as to fit into this scheme. An interesting side-effect is that synthesis of speech, starting :from deep structures, works in a strict left-to-right manner, which seems psychologically very plausible.</Paragraph> <Paragraph position="7"> 3. Insertion of Kiparski Stress Markers Kiparski (1973) introduced two rules for computing stress markers based on a syntactic tree: (1) (a) ttead stress rule: the first (left-most) node keeps its index, all others are incremented by 1 (b) Tail stress rule: the last (right-most) node keeps its index, all others are incremented by 1 The algorithm works as follows: (2) -assign the index 1 to each stressable lexical item -scan the tree bottom-up and apply rule (\]a) or (lb) to each significant node This algorithm works strictly bottom-up and thus requires the entire syntactic tree. As a result, it cannot be integrated into our generator in this form. It is, however, possible to rewrite the algorithm so that it works top-down and depth-first so as to fit into the generation scheme described above. The new algorithm is the following: O) Introduce a pair of indices and maintain it as follows while scanning the tree top down. At the root, start with the pair (1 1).</Paragraph> <Paragraph position="8"> - at each significant node that has at least two significant successor nodes, do the following, given the index pair (n m): - with head stress rule: assign the pair (n m+l) to the first successor assign the pair (n+m 1) to all the others - with tail stress rule: assign the pair (n re+l) to the last successor assign the pair (n+m 1) to all the others - at the leaves of the tree (= lexical entry), with assigned pair (n m): - n is the Kiparski marker for the lexical item If one considers the preferred successor (head or tail, depending on the rule) as the winner of the rule and all others as losers, algorithm (3) can be interpreted as follows: The second index of a pair (m) counts how often a node is on the winning side. All losers have to increment their marker by that amount. Thus, at each decision, the winner keeps its marker (n), while the markers of all the others have to be increased by m (n+m). As there can be only one leaf that is on the winning side each time, it is ensured that only one lexical item receives marker 1.</Paragraph> <Paragraph position="9"> A similar algorithm could be applied to yield the stress pattern within complex words (which are quite numerous in German). However, as the lexicon of the generator contains morphemes and complex lexernes with pointers to each morpheme, a decision about stress within a word can be stored lexically and no algorithmic treatment is necessary. A syllable now receives a Kiparski marker if - it is in a stressable morpheme (lexical feature) - it is marked by the lexical entry of the (possibly complex) word AND - algorithm (3) has assigned an index pair to the lexical entry The so computed marker is inserted into the phonemic string during the morphologic synthesis of the word.</Paragraph> <Paragraph position="10"> 4. Insertion of Bierwisch Boundary Indices Bierwisch (1973) suggests inserting a marker at each word boundary to express how many significant nodes dominate both words involved. His algorithm was designed in a bottom-up fashion. We show again that it can be formulated top-down (as required in our system): (4) Assign an index to each node. At the root, start with 1. For each node with index i for each successor do, left to right: - if the successor is a lexical item, synthesize it and append i as boundary marker - if the successor is a significant node, assign index i+ l - otherwise assign index i when all nodes on that level have been processed, - overwrite the index that was written last with i The problem that a left-to-right process cannot know whether the following word is on the same level in the tree is solved by permitting to overwrite a marker already written.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 5. Acoustic Realization of Prosodic Patterns </SectionTitle> <Paragraph position="0"> Starting from the above stress and boundary markers, the prosodic structure of a sentence is derived by applying a phonological rule set. In particular, some of the previously computed boundaries are deleted, others receive a pause marker. Furthermore, the resulting phrases are provided with an intonation contour, which, according to Bierwisch (1973), is specified in terms of so-called SON values. In a subsequent phonetic component the phrasal structure and the SON values are exploited to generate the acoustic correlates of the prosodic information, in particular, the duration of phonetic segments and pauses and the pitch values for all voiced phones.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 6. An Example </SectionTitle> <Paragraph position="0"> An annotated example shall illustrate the process of generation. Take the following sentence: Betr gt die Spannung am Kondensator 10 Volt? (Is the voltage at the capacitor equal to 10 Volts?) The deep structure of the sentence, which is the input to the surface generator is depicted in fig.4, Each building block in the dependency structure (to the left) has a feature case which indicates the conceptual role of the element (adapted from Engel 1982). e-zero, for example, refers to the nominative phrase or subject of a sentence. The structure to the right consists of the surface building blocks. Each slot (drawn as a box) corresponds to a possible position which can be filled with a lexical item or another building block, depending on the features of CLAUSE and PHRASE. Slots that remain empty are ignored during synthesis. One can see in this example that the tree of CLAUSEs and PHRASEs has a corresponding isomorphic tree of S and NP-PPs (there are other surface elements like AP, as well), with the exception that in the former ca,;e there is no order information yet. This illustrates the above mentioned isomorphism between deep and surface structure.</Paragraph> <Paragraph position="1"> Generation starts at the root of the deep structure, the CLAUSE. A Kiparski pair (1 1) and a Bierwisch index 1 are assigned, The corresponding surface building block, S, is generated, filled with the lexical item betrdgt (verb) and with the two PHRASES in their correct position (which can be determined by looking at the features and using some default heuristics as in Engel 1982). The structure at this point looks like the one in fig.5: Ixrn: Kondensator case: location Fig. 5 Note that betrgigt can already be synthesized, even though the rest of the syntactic structure has not unfolded yet. For algorithm (3), actually three nodes in Kiparski's notation are comprised in S: Satz, S and D. Therefore, for (3) the structure has to be viewed as if it looked like the one in fig. 6. (3) applied to Satz yields the pair (1+1 1) for betrgigt and (1 1+1) for S (tail stress). S has only one successor, therefore (3) does not apply. It does, however, apply to D, where the pairs (1+2 1) and (1 2+1) are computed for the two PHRASEs (tail stress). The Bierwisch index is simply incremented by 1 for both PHRASEs. Thus the string in the lower left of fig.6 can already be written (phonemes are given in an ASCII representation of IPA notation, stress markers are preceded by &quot;, boundary indices by #).</Paragraph> <Paragraph position="2"> Fig. 6 The process now recursively continues by generating the left PHRASE (Kiparski pair (3 1), Bierwisch index 2). As above, a corresponding surface building block (NP-PP) is generated and filled with lexical items and the modifying PHRASE (&quot;am Kondensator&quot;). The structure so produced is shown in fig.7.</Paragraph> <Paragraph position="3"> Algorithm (3) is applied once (tail stress) and yields a stress marker 3 for Spannung. The Bierwisch index is incremented once again for the nested PHRASE 2 (note that the Kiparski pair for that PHRASE is the same as for a loser although it is behind the 'tail'. Kiparski, in his original article, did not mention post-head modifiers). This PHRASE will subsequently be generated accordingly. The lower right of fig. 7 shows the result at this stage. The determiner die is not a stressable item and therefore does not receive a stress :marker. The noun, on the other hand, is provided with the marker 3.</Paragraph> <Paragraph position="4"> After the final lexical item of PHRASE 2, Kondensator, a boundary marker 3 will be written. Now the last part of (4) comes to bear. As it is the end of the phrase, it is overwritten by the marker of the dominating phrase (NP-PP 1), 2. It is also the end of NP-.PP 1, so it is finally overwritten by the marker assigned to S, which is 1. The output at this stage is the following: #0 b$tr&quot;2Egt #1 dl #2 Sp&quot;3an=N #2 Ham #3 k0nd$ns&quot;4Ator #1 After that, PHRASE 3 - the next one attached to S - is generated, in an analogous fashion.</Paragraph> <Paragraph position="5"> 7. Discussion and Conclusion The experiences with the described generator have shown thai: synthesis of German utterances in a concept-to-speech system is possible while both synthesizing intonation patterns using syntactic information and maintaining the efficient process structure of the generator designed for the specifics of the German language. The assumptions under which it was applied are a single-sentence system without contextual or pragmatic information.</Paragraph> <Paragraph position="6"> Problems rooted in the lack of such information have therefore not been solved. The speech produced this way shows considerable improvement over monotonous versions or versions which cannot make full use of syntactic information. Furthermore, the approach can easily be extended to include additional aspects of intonation such as emphasis of elements over others.</Paragraph> <Paragraph position="7"> Despite the success of the system described in this paper, some limitations have been discovered. In the test domain long sentences with complex and multiply nested phrases were quite frequent. Some of them included post-head modifiers such as &quot;rechts unten&quot; (= &quot;to the lower right&quot;), in additon to other modifiers like several adjectives. The algorithm by Bierwisch produced boundary markers between the beginning and the end of &quot;rechts unten&quot; that were only slightly greater than the surrounding ones. Synthesis of the utterance, however, revealed that the modifier was spoken with an unnaturally high pitch and a pause that was too short. \]Manually altering the indices to lower values, which would mean that &quot;rechts unten&quot; is a constituent on sentence level rather than a noun modifier, lead to better results. Thus, the top-down scheme of the algorithm would have to be broken in this case.</Paragraph> <Paragraph position="8"> Future work will be required to discover other limitations and to adapt the process to overcome them.</Paragraph> </Section> class="xml-element"></Paper>