File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1046_metho.xml

Size: 13,413 bytes

Last Modified: 2025-10-06 14:14:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1046">
  <Title>Pronouncing Text by Analogy</Title>
  <Section position="3" start_page="0" end_page="268" type="metho">
    <SectionTitle>
2 Psychological Background
</SectionTitle>
    <Paragraph position="0"> In the standard dual-route model of reading aloud (Coltheart, 1978), there is a lexical route for the pronunciation of known words and a parallel route utilising abstract letter-to-sound rules for the pronunciation of unknown ('novel') words. Arguments for dual-route theory cite the ability to pronounce pseudo-words (non-words conforming to the spelling patterns of English), latency difference effects between regular and exception words, and apparent double dissociation between the two routes in dyslexia (see Humphreys and Evett, 1985). However, all these observations can arguably be explained by a single route. One pervasive idea is that pseudowords are pronounced by analogy with lexical words that they resemble (Baron, 1977; Brooks, 1977; Glushko, 1979; 1981; Brown and Besner, 1987). Glushko, for instance, showed that &amp;quot;exception pseudowords&amp;quot; like tave take longer to read than &amp;quot;regular pseudowords&amp;quot; such as taze. Here, taze is considered as a &amp;quot;regular pseudoword&amp;quot; since all its orthographic 'neighbours' (raze, gaze, maze etc.) have the regular vowel pronunciation/el/. By contrast, tave is considered to be an &amp;quot;exception pseudoword&amp;quot; since it has the exception word (have,/hay/) as an orthographic neighbour. Thus, according to Glushko (1979), the &amp;quot;assignment of phonology to non-words is open to lexical influence&amp;quot;. This is at variance with the notion of two independent routes to pronunciation. Instead: &amp;quot;it appears that words and pseudowords are pronounced using similar kinds of orthographic and phonological knowledge: the pronunciation of words that share orthographic features with them, and specific spelling-to-sound rules for multiletter spelling patterns.&amp;quot;  There are two tbrms o1' PbA: explicit mmlogy (Baron, 1977) is a conscious strategy of recalling a similar word and modifying its pronunciation, whereas in implicit analogy (Brooks, 1977) a pronunciation is derived from generalised phonographic knowledge about exisling words. The latter has obvious commonalities with most single-route, conncctionist models (e.g. Seinowski and RosenhErg, 1987) in which the generalised knowledge is learned (e.g. by backpropagation) its it set of weights, and the network has no holistic notion of the concept 'word'.</Paragraph>
    <Paragraph position="1"> Until the recent advent of computational PbA models, analogy 'theory' could only be considered seriously underspecilied. Clearly, its operation nmst depend critically on some measure of similarity, and &amp;quot;without a metric for similarity and without a specification of how similar is similar enough, the concept of analogy by similarity offers little insight&amp;quot; (Glushko, 1981, p. 72). Further, as detailed by Brown and Besner (1987), the operation of IExical analogy must be  consmfined by factors such as: * the size of the segment shared between novel and lexical word; * its position in the two strings; * its tiequency of occurrence in the hmguagc; * and the frequency of occurrencE of the words containing it;  none of which had then received serious consideration. Accordingly, they write: &amp;quot;Extant analogy models are not capable el: predicting the ot|tconte Of assembly operations for all possiblc strings.&amp;quot; In particular, the 'theory' gives no principled way el' deciding the orthogral~hic neighbours of it novel word which are deemed to intluence its pronunciation whereas a computational model must (spccilically or otherwise) do so.</Paragraph>
  </Section>
  <Section position="4" start_page="268" end_page="269" type="metho">
    <SectionTitle>
3 Existing PbA Programs
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="268" end_page="269" type="sub_section">
      <SectionTitle>
3.1 Dedina and Nusbaum's System
</SectionTitle>
      <Paragraph position="0"> Tim overall structure el' PRONOUNCf~; is as shown in Fig. 1. The Icxical datalmsc consists o1' &amp;quot;approximately 20,000 words based on Webster's tbcket Dictionary&amp;quot; in which text and phonemes have been autodeg matically aligned. Dedina and Nusbaum acknowledge the crude natnre of their alignment procedure, saying it &amp;quot;was carried out by a simple Lisp program that only uses knowledge about which phonemes are consonants and which are w)wels.&amp;quot; An input string is matched in turn against all orthographic entries in the lexicon. The process starts with the input string and the current dictionary entry leftaligned. Ilfformatinn about matching letter substrings  - and their corresponding phoneme substrings in the dictionary entry under consideration - is entered into a pronunciation lattice its detailed below. The shorter of tile two strings is then shifted right by one letter and the process repeated. This continues until the two are right-aligned, i.e. the number of right shifts is equal to the difference in length between the two strings. The process is repeated for all words in the dictionary.</Paragraph>
      <Paragraph position="1"> A node of the lattice represents a matched letter, Li, at some position, i, in the input, as illustrated in Fig. 2. The node is labelled with its position index i and with the phoneme which corresponds to Li in the matched suhstring, Pim say, for the mth matched subsiring. An arc is placed from node i to node j if there is a lnatched substring starting with Li and ending with L i. The arc is labelled with the phonemes intermediate between l'/m and Pj,,, in tim phoneme part of the matched substring. Note that the empty string labels arcs corresponding to bigrams: the two symbols of the bigram label the nodes at either end. Additionally, arcs are labelled with a &amp;quot;frequency&amp;quot; count which is incremented by one each time that substring (with that pronunciation) is matched during the pass through the lexicon. Finally, there is a Start node at position 0 and an End node at position one greater than the length of the input string.</Paragraph>
      <Paragraph position="2">  A possible pronunciation for the input corresponds to a complete path through its lattice from Start to End, with the output string assembled by concatenating in order the phoneme labels on the nodes/arcs. The set of candidate pronunciations is then passed to the decision function. Two (prioritised) heuristics are used to rank the pronunciations, and the top-ranking candidate selected as the output. The first is based on path length. If one candidate corresponds to a unique shortest path (in terms of number of arcs) through the lattice, this is selected as the output. Otherwise, candidates that tie are ranked on the sum of their arc &amp;quot;frequencies&amp;quot;. Dedina and Nusbaum tested PRONOUNCE on 70 of Glushko's (1979) pseudowords, which &amp;quot;were four or five characters long and were derived from mono-syllabic words by changing one letter&amp;quot;. Seven subjects with phonetics training were asked to read these and give a transcription for the first pronunciation which came to mind. A 'correct' pronunciation for a given pseudoword was considered to be one produced by any of the subjects. A word error rate of 9% is reported.</Paragraph>
    </Section>
    <Section position="2" start_page="269" end_page="269" type="sub_section">
      <SectionTitle>
3.2 Sullivan and Damper's System
</SectionTitle>
      <Paragraph position="0"> Sullivan and Damper employ a more principled alignment procedure based on the Lawrence and Kaye (1986) algorithm. By pre-computing mappings and their statistics, they implemented a considerably more 'implicit' form of PbA: there is no explicit matching of the input string with lexical entries. Their pronunciation lattice differs, with nodes representing junctures between symbols and arcs representing letter-phoneme mappings. They also examine different ways of numerically ranking candidates, taking into account probabilities estimates for the letter-phoneme mappings used in the assembled pronunciation.</Paragraph>
      <Paragraph position="1"> Given the improved alignment and candidateranking methods, better performance than Dedina and Nusbaum might be expected. On the contrary, Sullivan and Damper's best result on the full set of 131 pseudo-words from Glushko (1979) (plus another 5 words see section 5.1) is only 70.6% (1993, p. 449). This is an error rate of ahnost 30%, as compared to Dedina and Nusbaum's 9% on the smaller test set of size 70.</Paragraph>
      <Paragraph position="2"> Differences in test-set size and between British and American English, the transcription standards of the phoneticians, and the lexicons employed seem insufficient to explain this.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="269" end_page="270" type="metho">
    <SectionTitle>
4 Re-Implementing PRONOUNCE
</SectionTitle>
    <Paragraph position="0"> Our purpose was to re-implement PRONOUNCE, assess its performance, and study the impact of various implementational choices on this performance.</Paragraph>
    <Paragraph position="1"> However, the described alignment algorithm is problematic (see pp. 71-73 of Sullivan, 1992) and needs to be replaced. Rather than re-implement a flawed algorithm, we have used manually-aligned data. Since manual alignment generally produces a better result than automatic alignment, we ought to produce an even lower error rate than Dedina and Nusbaum's claimed 9%.</Paragraph>
    <Paragraph position="2"> The performance on lexical words (temporarily removed from the lexicon) has not previously been assessed but seems worthwhile. Arguably, 'real' words form a much more sensible test set for a PbA system than pseudowords, not least because they are multisyllabic. Temporary removal from the lexicon means that the pronunciation must be assembled by the~analogy process rather that merely retrieved in its entirety. Hence, we believe it is sensible and important to test any PbA system in this way.</Paragraph>
    <Section position="1" start_page="269" end_page="270" type="sub_section">
      <SectionTitle>
4.1 Lexical Databases
</SectionTitle>
      <Paragraph position="0"> To examine any impact that the specific lexical data-base might have on performance, we have used two in this work: the 20,009 words of Webster's Pocket Dictionary and the 16,280 words of the Teacher's Word Book (TWB) (Thorndike and Lorge, 1944).</Paragraph>
      <Paragraph position="1"> In both cases, letters and phonemes have previously been hand-aligned for the purposes of training back-propagation networks. The Webster's database is that used by Sejnowski and Rosenberg (1987) to train and test NETtalk. The TWB database is that used by Mc-Culloch, Bedworth and Bridle (1987) for NETspeak.</Paragraph>
      <Paragraph position="2">  The phoneme inventory is of size 52 in both cases, including the null phoneme but excluding stress symbols. We leave the very important problem of stress assignment for later study.</Paragraph>
    </Section>
    <Section position="2" start_page="270" end_page="270" type="sub_section">
      <SectionTitle>
4.2 Re-hnplementation Details
</SectionTitle>
      <Paragraph position="0"> The re-implementation was programmed in C on a Hewlett~Packard 712/80 workstation running HP-UX.</Paragraph>
      <Paragraph position="1"> A 'direct' version scores candidates using Dedina and Nusbaum's method with its two prioritised heuristics: we call this model D&amp;N. Two other methods l'or scoring have also been implemented. In one, we replace the second (maximum sum) heuristic with the maximum product of the arc frequencies: we call this model PROD. (It still selects primarily on the basis of shortest path length.) We have also inlplemented a version which uses a single heuristic. This takes the product along each possible path from Start to End of the mapping probabilities for that arc. These are computed using Method 1 (a priori version) of Sullivan and Damper (1993, pp. 446-447). For all paths corresponding to the same pronunciation, these wdues are summed to give an overall score for that pronunciation. We call this the MP model. The final product score is not a proper probability for the assembled pronunciation, since the scores do not sum to one over all the candidates.</Paragraph>
      <Paragraph position="2"> The 'best' pronunciation is found by depth-lirst search of the lattice, implemented as a preorder tree traversal. For the D&amp;N and PROD models, paths were pruned when their length exceeded the shortest \[i)und so far for that input, leading to a uselul reduction in run times. A similarly motivated pruning was carried out for the MP model. If any product fell below a threshold during traversal, its corresponding path was discarded.</Paragraph>
      <Paragraph position="3"> The threshold used was e times the maximum product score found so far, with ~ set by at 10 -3. While this may have led to the pruning of a path contributing to the 'best' pronunciation, its contribution would be very small. Again, this gave a very significant improvement in run times for the testing of lexical words (section 5.2 below) but was unnecessary \['or the testing of pseudowords. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML