File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2160_metho.xml

Size: 9,118 bytes

Last Modified: 2025-10-06 14:13:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2160">
  <Title>THE PARSODY SYSTEM : AUTOMATIC PREDICTION OF PROSODIC BOUNDARIES FOR TEXT-TO-SPEECH</Title>
  <Section position="3" start_page="0" end_page="992" type="metho">
    <SectionTitle>
THE PARSODY SYSTEM
</SectionTitle>
    <Paragraph position="0"> Our approach has been to implement a rule-based method on top of a chart parser. This was a purely practical decision, as an efficient chart parser had been developed in-house. Also the larger and more detailed descriptions of the rule-based methods in the literature provided an adequate starting point on which to build.</Paragraph>
    <Paragraph position="1"> The resulting system, the Parsody (from Parser + Prosody) system is designed to provide a test-bed for investigating the interface between syntactic parse structures and the performance structures of actual speech. Results from the Parsody system are directly led into BT's TTS system, Laureate.</Paragraph>
    <Paragraph position="2"> The fact that Laureate will be a commercial TPS system places several requirements on the parser, it must be robust, it must be fast, and it must predict prosodic boundaries with a reasonable degree of accuracy. At present the emphasis is on the prediction of the location of prosodic boundaries rather than on the strength of the boundaries.</Paragraph>
    <Paragraph position="3"> The Parsody system is implemented in C, under X-windows on a Sun Sparc station, and allows for interactive editing of intermediate results throughout the parsing/prosodic marking process. This provides us 2A phonological word is one which effectively functions as one spoken item, as the internal word-word boundaries are resistant to pausing \[7\]. Typical examples are determiner-noun word groups, such as &amp;quot;the+man&amp;quot;.</Paragraph>
    <Paragraph position="4">  with a useful tool for investigating our algorithms, as A typical tree is shown in Figure I. The partition nodcs well as a debngging aid. are denoted by the two &amp;quot;s&amp;quot; labels in this tree. A description of the main aspects of tile parser and the prosodic marking comlx)nents is now given.</Paragraph>
  </Section>
  <Section position="4" start_page="992" end_page="992" type="metho">
    <SectionTitle>
THE PARSER COMPONENT
</SectionTitle>
    <Paragraph position="0"> It is interesting to note tlmt one of thc sentences in tile Bachenko and Fitzpatrick appendix of sentences was not parsed bccausc of &amp;quot;tot) many parse t)roblems&amp;quot;.</Paragraph>
    <Paragraph position="1"> Obviously this is not acceptable for a conunercial tcxtto-speech system. Thc Parsody parser is designcd always to produce one result through a combination of stochastic word tagging and partial parsing with a minimal grammar. All processing is performed on a chart data structure back-bone incorporating packing.</Paragraph>
    <Paragraph position="2"> This overall approach rcsnlLs in a very fast and efficient parser.</Paragraph>
    <Paragraph position="3"> A word's part-of-speech is important for TTS as it may affect tile word's pronunciation. Stochastic word tagging enables the parser always to choose one word tag, althtmgh this may t)r may ut)t be the correct one (the current accuracy is al)proximately 95% correct - this figure heing given tm the Bachenko and Fitzpatrick sentences and on other test sentences). Forttmately for i)ronunciation ptu'l)oses , the lmml)er of words having mttltiple prontmciatitms is quite small - between 1 anti 2% of words in our lexicon. Initial investigations have shown that there is less than a 0.3% chance of picking the wrong pronunciation li)r a word.</Paragraph>
    <Paragraph position="4"> Anothcr importaut aspect of the Parsody word-lagging approach is that ill-limned input can be accommodatexl, and the prosodic marking component can still function tt) protluce a result. Some speech is better than none, even if it sounds sffangc.</Paragraph>
    <Paragraph position="5"> The minimal grannnar also helps the parser tt) prt)duce only tree, anti always one, output. The granmlar is a silnple LNP/PP grammar augmented by special 'partition' rules. An LNP is simply a 'longest noun 12hrase' which is an unambiguous intcrpretation of the longest NP in the parse result. A PP is a 12rcpositional I!hrase. An example of a partition node is one which is inscrtcd I)ctween two immediately adjacent 'longest NPs' in the parse strncture 3.</Paragraph>
    <Paragraph position="6"> .f.~sigma---..~ since prio felt ill tMt ~ d~t ~ celled round she da9 her friend Figure I. Example Parse &amp;quot;Free produccxt by Parsody 3\[:or this reason, cach of the prosodic rules to be described works within the t)artition nodes. This is because the lx)undary between each partition node seems to mark the largest boundaries within the sentence, and tl~e later in the analysis lhey are joined, the larger the prosodic botmdary will be. It may well be that some analysis, perhaps verb adjacency, shoukl take place across partition node boundaries. Further research will examine this.</Paragraph>
    <Paragraph position="7"> Tile partial syntactic tree is then passed to the prosodic m~u'king system.</Paragraph>
  </Section>
  <Section position="5" start_page="992" end_page="993" type="metho">
    <SectionTitle>
THE PROSODY COMPONENT
</SectionTitle>
    <Paragraph position="0"> The prosodic nmrking algorithms arc founded on thc Bachenko and Fitzpatrick extensions to the Gee and Grosjcan rules.</Paragraph>
    <Paragraph position="1"> There are essentially two main components in the Baehcnko and Fi~patriek model. The first, concerning boundary location is basically adhered to in Parsody. tl(mndary location entails the grouping of words into phonological words, and then into phonological phrases. The boundaries separating prosodic phrases form potential prosodic boundary location sites.</Paragraph>
    <Paragraph position="2"> The sccond component seeks to determine the boundary strengths via a scries of rules. Bachenko and Fi~patrick descril)e a verb-balancing rule which attempts to bahmce matcrial around a vcrb, and a verb adjacency rule which in effect extends the verb balancing rule, using 'lmndling' (the adjoining of adjaccnt phrases) to continue to centre material round a verb. ttere, Parsody cmploys two main departures from the Bachenko and Fitzpatrick rules. The first is in the domain of verb adjacency. Parsody's verb adjacency algorithm retains the notion of grouping nodes to lorm a balanced tree, bnt extends this rule to cover all nodes (with the exception of the vcry final PP). The basic algorithm is also different.</Paragraph>
    <Paragraph position="3"> By extending the grouping of nodes to cover all nodes, the confusion of Bachenko and Fitzpatrick's &amp;quot;general bundling rulc&amp;quot; is avoided, since all nodes will have becn grouped at completion. The change to the algorithm is more subtle, yielding thc rule:</Paragraph>
    <Paragraph position="5"> This makes explicit the assumption in Bachenko and Fitzpatrick's algorithm that tile adjoining of phrases pr(xluces a balancezt tree. The above approach continues to balance the structure created so far, with the phonological phrases which have not yet been joined into the structure. By doing this, the boundary values (strengths/salience) remain dependent on the values of the constituent prosodic phrases.</Paragraph>
    <Paragraph position="6"> in the example shown in Figure 2, tile left-to-right nature of application of this rule ensures that earlier material will generally be gronpcd lower in the structure than later material. This ties in with Gee and Gmsjean's work on discourse semantics: the later in the sentence  Ihe information, the greater the prosodic offset. It is at this stage that PP's which p}ecede verbs are added into the structure (assuming they haven't been already). The proviso is continued that PP's should always join to the left, rather than the right. The exception to this is the PP at the end of the sentence, if there is one, which remains untouched.</Paragraph>
    <Paragraph position="7">  The second main clmnge to the Bacbenko and Fitzpatrick algorithm concerns boundary value assignment. Bachenko and Fitzpatrick choose to use the absolute boundary values as their reference. Parsody does not do this, since, according to the algorithm, the longer tile sentence, the larger the values on each boundary 4 (varying from a maximum of 5 in small sentences, to 13 in the larger sample sentences). Does this mean that s,nall sentences should have smaller boundaries, perhaps none? According to Gee and Grosjean \[7; footnote 10\], &amp;quot;It turns out (importantly) that the actual pause duration of the longest pause in each sentence does not correlate all that well (is not a factor of) the overall length of the sentence (for example, it is possible for a short, less complex sentence to have a longer main break than a longer, more complex sentence)&amp;quot;. For this reason, in Parsody a normalisation algorithm is applied, so that sentences of varying lengths may trove their boundaries mapped to reasonable values.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML