File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-1207_evalu.xml
Size: 4,265 bytes
Last Modified: 2025-10-06 14:00:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1207"> <Title>Semantic and Discourse Information for Text-to-Speech Intonation</Title> <Section position="9" start_page="52" end_page="53" type="evalu"> <SectionTitle> 5 Results and Conclusions </SectionTitle> <Paragraph position="0"> The system was designed and debugged using a set of five single-paragraph texts. It was then tested using several new single-paragraph texts, excerpted from news articles and encyclopedia entries. Sample output is shown in Figures 4 and 5, where prominence, defined as a multiplier of the default nuclear accent, is shown directly below the associated pitch accent.</Paragraph> <Paragraph position="1"> These preliminary test results indicate using information structure in conjunction with WordNet can produce intonational patterns with context-appropriate variation in pitch accent type and prominence. In general, LWH* accents occur on items deemed to be thematic, and H* accents occur on rhematic items. WordNet proved to be fairly success\[ul at identifying words which were &quot;given&quot; via inference, thus allowing the program to correctly reduce the pitch accent prominence assigned to these words. For example, in Figure 4, the prominence of the pitch accent on &quot;achievement&quot; is lowered because of its relationship to &quot;feat.&quot; In Figure 5, the prominence of the accent on &quot;soil&quot; is lowered because of its relationship to &quot;ground.&quot; To a lesser extent, Word-Net was also able to identify appropriate contrastive relationships, such as the relationship between &quot;difficult&quot; and &quot;easy&quot; in Figure 5. Consequently, our program places a slightly more prominent accent on &quot;difficult&quot; than it would have if &quot;easy&quot; had not occurred within the same segment.</Paragraph> <Paragraph position="2"> While quite encouraging, these preliminary results have also identified many opportunities for improvement. The current implementation is limited by the absence of a full parse tree.</Paragraph> <Paragraph position="3"> It is also limited by the current heuristic approach to phrase segmentation, and therefore often produces L- phrasal tones in improper places. Substituting better tools for both parsing and phrase segmentation would improve the overall performance.</Paragraph> <Paragraph position="4"> The system's accuracy level for WordNet synonym and contrast identification can be improved in two ways: by incorporating word sense disambiguation, and by using a more sophisticated approach for generating a &quot;match.&quot; Presently, WordNet results are searched in order of most common to least common word senses, thus biasing matches towards common word senses, rather than determining the most likely context. Incorporating a sense disambiguation algorithm, such as that discussed in (R.esnik, 1995), is a logical next step. Word matches are also limited to comparisons between individual words within a single l~art-of-speech category.</Paragraph> <Paragraph position="5"> Extending consideration to adjacent words and semantic roles would greatly reduce the number of spurious matches generated by the system.</Paragraph> <Paragraph position="6"> Another area for improvement concerns the prominence of pitch accents. Based on our preliminary results, we believe that the L-t-H* accents should be somewhat lower than those shown in Figures 4 and 5. Once we have completed our analysis of the Boston University radio news corpus (Ostendorf, Price, and Shattuck-Hufnagel, 1995), we expect to modify the accent prominences based on our findings.</Paragraph> <Paragraph position="7"> Our assessment of system performance is based on human listeners qualitative measurements of the &quot;comprehensibility&quot; of output from our system in comparison with the standard TrueTalk output. Although adequate for preliminary tests, better performance measurements are needed for future work. Possibilities include testing listener comprehension and recall of speech content, and comparing the system's output with that of several human speakers reading the same text.</Paragraph> </Section> class="xml-element"></Paper>