File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/88/c88-1023_abstr.xml
Size: 13,050 bytes
Last Modified: 2025-10-06 13:46:29
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-1023"> <Title>LEXICON PHONOTACTI(&quot; Hit T</Title> <Section position="1" start_page="0" end_page="107" type="abstr"> <SectionTitle> 4~{00 BJelefeld </SectionTitle> <Paragraph position="0"> In this paper miif hzation and transductien mocha n i~m.% are applied in a new approach to ohono\].ogical parching. It is shown that unification in the sense of Kay as used in unification grammars, and tl m~.sda,:t\[oa, a p~o~;ess deriving from automata theory, {~ ~-~ both valuable tools for use in computational pb,.m,.,k~gy. By way of illustration, a brief outline of the allophsni.c parser described by Church is given.</Paragraph> <Paragraph position="1"> Then a linea~ unification partier for English syllables \[s intro./urea. This parser takes phonetic input in the ~orm of feature bundles and use~ phcnologlcal ules rep~ ~ !:~;en t~l b~ networks of transduction relathm:5 together with unification, and an iterat:\[ve \[in i te-~:d;ate process to produce phonemic output with marked sy\].l~ble boundar ies. A fundamental d b%inct ion is made between two domains: the .</Paragraph> <Paragraph position="2"> representations at the phonetic and phonological levels, and the proc~.ssing of these representations.</Paragraph> <Paragraph position="3"> On this basis, a d\[~itinctisn is made between networks of tran~.di~ction relations <e.g. between allophones and phonemes), and a .%et of possible processors (i.e.</Paragraph> <Paragraph position="4"> parsers and transducers) for the interpretation of such networks.</Paragraph> <Paragraph position="5"> 1. 'F~e~ne~du~tion and Unification in Phonology The proposal to use finlte-state transducers in morphology and phonology has been advocated in recent years by Kaplan and Kay /1981/, Koskenniemi /1983/ and others, It has been suggested /Gibbon 1987! that finlte-state transducers are the most appropriate devices for use in other areas of computational phonology. In Koskennlemi's system, single finite-state transducers act as parallel fllters in the analysis of Finnish morphology. However, in his morphophsnological analysis Koskennleml has been critlcised for uslng monadlc segments rather than the feature bundles which play such an important role in phonology /Gazdar 1985:601/. In the proposal presentcxl below, segments regarded as feature bundle~ are essential components in the model. The quesLion as to whether it is better to represent the phonological rules as a cascade of transducers or to incorporate them into a single transducer will not be considered here. Kaplan and Kay /1981/ have already put fowar'd a method of compiling the series uf transducers into a sing\].(-: transducer (described by Kay /1982/)), Below, for discussion purposes, a single transducer is assumed.</Paragraph> <Paragraph position="6"> Furthermore 1 would llke to stress that on the phonological level I will discuss network \['epresen tahions of phonotact ic and allophonic cons'traln ts. The transitions in these networks consist of transduction relations. In the proce.~s domain a finite-state transducer will be used to interpret the networks. This is a distinction which is not always made but is beneficial for abstracting the attributes of the model from the processing of the model. Below more emphasis will be placed on the representation domain as it is this whlch is most interesting for&quot; discussion purposes. The actual implementation of the processing domain as a program is regarded, theoretically, as a secondary but by no m~ns a minor issue.</Paragraph> <Paragraph position="7"> Unification is a concept which has become common in linguistics in recent years due to the important role it plays in current syntactic theories such as FUG, LFG and GPSG. However, it has not as yet played an explicit part in phonological analysis. Below I propose that, by employing elementary unification mechanisms, assimilation and dis~imllation can be dea\].h wlth in a most satisfactory way. The unification used in this connection is based on the functional description unification described by Kay I1984/.</Paragraph> <Paragraph position="8"> Here I will give an informal definition of unification based on contradiction and set union and in terms of feature bundles, since this is the representation which will be used below. Two feature. bundles composed of attribute-value pairs may be said to unify if for each attribute in their union there does not exist an attribute of the same name with a contradicting vahJe. Where a variable, .,lay X, is found in place of a value in ~I~e featnre bundle~ this variable wilt be. assigned permanently the va\].ue from the corresponding attribute--value pair in the nther bundle if this exists. This definition of unificati.en, and its implements lion, differs from Prolog term unification.</Paragraph> <Paragraph position="9"> 2. All i)p\]lorie~-PhslIi~-;\[m: ~. Tran'.-iducti(in \[n the proposal presented here, segments regarded a.~i feature bundles are esseutial ci~mponen Ls. The feature bundles used ill fih\[s model are sets Fir attribute-valtie pairs ill \]ine witll tradlt/snal distinctive fealure terminology, The feat.urea are not complex and are generally based un those of Ghlmlsky alld llalle /196{~/. A fully :.;pec:\] tiled feature bUlidJe contains at\] the features, t~igel;hm with l.heil value.'G needed tcl describe cme pSI'lieu\[FaY .~Klliltd. WheFe a phouPS.q;ic s,ymbol occurs :i.n the text below tills i~; met-ely an abhrev iati ca convention far' a fully specified feature bundle. \[Tether&quot; tiian being&quot; fully specified, a feature buad\]e may be. underspec\[fied. That \[s h~ say, only those feahures appear in the feature bundle wi~ich are necessary I:o describe a class a:\[ ssund.s which participate in a pat'l;ienlar phonetic pr'oce.,~s, For example, the unde.rspec ifi~t feature bundle {\[4 volt, \[-- cons\]} descv\[be.~ at1 vowels, The feature bundles are generaliGatl.on'.~ for sets of&quot; i.npllt symbols, and resemble the classification in t(~vm.q of ~\[ aad V features found ill ,qyn Lax whicll a\] lows generalJ.sai:ion over cat;egllliet~, &quot;I'hcy are thus termed C- \[eatur'e.~ (.for Catt~gory- featur-eCD, In (;hu~cch /I.983/ tile clail~l h-; made that alloph~m~c cues can be extremely usefu\] in plumolegica\], parsing. Selkirk /19',12/ al.'~l maintain',i that investigatirm of allophonlc variation may be advantageeus tot sy}.iab\].e analysis .,~ince the realisatil~n el particular aliophones of a languaSe is strsngly dependent PSm their&quot; pusitlon wiLhln the ~yllable. Thus in order te take advantage of allophonic cues a distinction must be made between variant and invaFianL features. Variant features, .'such a,<:i it aspb:'atilnl\], fxx.'ur whell di,.icu.,<ising a\]lopiu:uie,.i o\[ /pl for clamp\]e. Thus underspeclfi(.~i fPS~ture bundles also contg, in variant featul'e5 iu order for u.<; ttl incorporate allephonic \] nformation into t~uF classification.</Paragraph> <Paragraph position="10"> U,,~in 5 variant and iavarlant features~ fo\] iswlni{ Church /1988/, the arm i.s, g;\[ven phonetic input in the form of fully specified feature bundlc~, di~;card al.lophonlc information (varlaat feai:ures) and produce phonemic ,mtput else in f~I;ure bundle form ;.~Ith syllable tlaundarie~ marked. Ghureh's /I.Q83/ sy*.;henl has a number of stages from phone~ic ~aput te the point where phonemic output is matched wlLil a syllable dictionary. A phonetic, feature lattice incorporating generalisatlons about allophones i,'; input to a bottomtip chart parser. This chart parser, whicii works (In a similar basis to the GYK al~{orlthm, provides the phonetic .\[npll L with a syllable structure. A canonicalisec then dlscar'ds the allophonlc infsrmation and outputs a phgmem.ie feature lattice preeierving the syllable structure. It is this structure which then complises the input l;e tile lexicai matcher.</Paragraph> <Paragraph position="11"> Tokin~ a ohm;el look aC the canonical.D~er tile \[ii:7;L thing which springs to mind is a .';simp\]e transduction places<3, that is to .~iay, a translation from pilunes t~J phonem~:;. The chart; parser has the ta,':ik of ptevidi.ng&quot; syllable structur'e using phlmotactlc and allophonic constraints. Iluwever, the question here is, are twe separate procedure~5, namely parst nit and canonicalisation, realty necessary ,i~ (:an they be incorporated i nto a G/ll~Le \[)rocf~ei.~'/ Below \[ w\] \] J sketch ,~ plepr)sal which, wJl;h Lhe hi!Jp f)\[ a ~iilil;~ tltate trans(lucez: doe.~i just thh;.</Paragraph> <Paragraph position="12"> {~. 1Jlt(llilli. 7~(;{ i() Net;,'-i Let tl.% \[ \[l',% I;, (;on~g:l der tile ri;p; ~!:ieui;a t ion levi!l, Following tile on lille t~d~;in-f. ~ ,%(.,ecif/c,Y=~#.don x'~:~col{ni,,ser /&Jr ~'n~JLqh f;),llabie,'; plesctn{L-~i in G\[10ben 119851 ~ .%yllable tt~fllplato, was cflnlrltruGteli Lt.<:-J a (li.<i(:{Jhl\]P.iil, hlii ne tlqork (HI the basis of phonotac tic ru le.~;, Lhu!; Wfilkin\[~ (in tilt! principle tif &quot;allowabkg' Glrnllbili:~i:ion..; ef phonenle{i rather than limiting acceptab\]e F>I;~ i.ugs i;~.J tht~e c\]llGters whicti actual\[y occur. <dyHab\]et~ ~re iirll; di.<-icu~.<~ed explicitly in refills; oPS rm.,.;ei;, \[)e*tk and (xlda in th\].q rood\[el, Rath(n- theist: fgub- StfHCttlres and \[;lie phonotactlc and all.ophfmic r-uies which depeild t~n i;h(:lli ilre implicit in the net;wolk. The s;trucLuYe.<-,, \]lewev(w, can be derived immediately from the t.optlkogy ot the llet, wtl\[\]i a~, reple:senl.ed ill o \[;rtsfl.%iLiun ilia#tiara, Ti~h~ IK~I;WII\[ k \[L; l C fel-f l~.(1 tCI as a phi }llQl,ac {;i G ueL. All.rlphontc CfUlStlsint,~il were tiletl introduced a~;; piarl; of the :input; ~:qr)ecifications, Each tiansitkm \[n I;he phunl!tacCtc fleL moiK!h:i ,a phonemic ::gegmenf Tin! ativclnta{~'~# ef~ th(3 \['(~atuve \[)llll(i\]f! reprl~ae.ntatlon is tilat t~el{me.nt~; can be viewc~d in t(.'r/tl:; (if natural classes, which *;i mpli fie.<; the netw(ir-k con,~dderably, The tlYtll.~xit\[on \]abel.if; tilt' the afar;work consJ..';t of a pail&quot; ef feature bundles each containing O-l'eatl/l-es, One of the,~;e blnldle5 repFesellts Jilpllt .'xpecifications and the ether output specifications; both are in geueral undevspecified, Fer example, the bundle i:lf G features which de~icribeL; lille veicelesu plosive con.'~onants is {\[ cent\], \[-- voice\], \[ seal, \[ stYtdi), }lowever, where we need I;o dealt ilia the aspiral;ed allophones ef the v\[lice\]ess plosives the w~viant feaLur'e \[t asp\] must be addc~l: {\[ cent}, \[- voice\], \[- son\], \[- strid\], \[+ asp\]). Therefore when a particular transition in the network is responsible for remevin 5 this allophonic information the input transition specification is {\[- cent\], \[- voice\], \[- s~bn\], \[- strid\], \[+ asp\]), and the output transition specification is {\[- cent\], \[~- voice\], \[- son\], \[- strid\]} (see Fig.l). When this phonotactic net is interpreted by a particular parser the phonetic input is generally a string ef fu\]ly specifl~xl feature bundles *~nd in order to u'~e the output for recognition purposes the phonemic output will also be fully specified. It i.% here that unification plays an important role.</Paragraph> <Paragraph position="13"> indeed the features themselves may not be recosnisable. This facility is advantageous for workin~ with feature detectors at the front end as it is still possible to analyse what is known. Thls, of course, leads to underspecifk~l output which may be used in connection with a lexicon for recol{nltlon hypothesisin 5 . In such cases the underspecified output, althoush representin~ classes of phonemes in the various positions, will only allow those combinations of such classes which actually exist, thus llmltJng' possibilities available for hypothesis.</Paragraph> <Paragraph position="14"> Thus it is not necessary to check the lexicon for forms which accordln 5 to the rules of the language cannot exist.</Paragraph> <Paragraph position="15"> Tran~it\[un acceptln~ voiceless ospirated ploslvos ~;hen attemptin 5 to traverse the network the fully specified input feature bundle must unify with the input transition speelflcatlon (in terms of C-features) of the current transition. If unification succeeds, the fully specific4 output bundle must contain the output transition specifications together with all those features from the fully specified input bundle not contained in the input transition specification. In set the\[~retic terms, let us call the fully specified input feature bundle lnFB, the input and output transition specifications ITS and OTS respectively; if unification of InFB with ITS succeeds, the fully specified output bundle OutFB is OTS ~ (InFB / ITS).</Paragraph> <Paragraph position="16"> The phonetic input feature bundles may be also underspecified however. This allows for circumstances where the values of some features may not be known or</Paragraph> </Section> class="xml-element"></Paper>