File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1078_metho.xml
Size: 15,116 bytes
Last Modified: 2025-10-06 14:11:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1078"> <Title>Controlling Lexical Substitution in Computer Text Generation 1</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. Lexical Substitution </SectionTitle> <Paragraph position="0"> In \[3\], Halliday and I-lasan cat.~log and discuss many devices used in English to acmove cohes,on. Fhese include refe;ence, substitution ellaDsis, and conjunction. Another f.t, mily ut devices they discuss is know,-&quot; as lexical substitulion. \]he lexlcal substitution devices incorporated into Paul are pronommalizatior,, s.perordinate substitution, and definite noun phrase reiteration.</Paragraph> <Paragraph position="1"> Superordinate substitution is the replacement of an etement with a noun or phrase that ps a .;ore general term for the element As an example, consPder Figure 1, a sample hierarchy the system uses to generate sentences.</Paragraph> <Paragraph position="2"> .................................................</Paragraph> </Section> <Section position="5" start_page="0" end_page="381" type="metho"> <SectionTitle> ANIMAL MAMMAL REPT ILE i POSSUM SKUNK TURTLE </SectionTitle> <Paragraph position="0"> 2. HEPZIBAH IS A FEMAt.\[ SKUNK.</Paragraph> <Paragraph position="1"> 3. CItURCHY IS A M~LE TURTLE.</Paragraph> <Paragraph position="2"> 4. POSSUMS ARE SHALL, GREY MAMMALS.</Paragraph> <Paragraph position="3"> 5, SKUNKS ARE SMALL, BLACK MAMMALS.</Paragraph> <Paragraph position="4"> 6. TURILES ARE SMALL, GREEN REPTILES, 7. MAMMALS ARE FURRY ANIMALS.</Paragraph> <Paragraph position="5"> B, REPTILES ARE SCALED ANIMALS, Figure Ib: A S~mple llierarchy for Paul will s~ppor t.</Paragraph> <Paragraph position="6"> The n,echanlct~ Io, performing superord~nate substdutio:'~ is fairly {,asy. All ,)+~e no(nil,; tO Of: ;S t++ t'l++;~tO, a list C}t s+q'~++ior,flllm~!:~ try tr~ICllSg up the hi~:rarch+cal bet!. an~J Cub~l;,,rfly c l~(;ose It(,i;x C!},s list. t Iowever. lhere are sev(:l,d i~\[;uob that IrlUbI I,e dddrr;sbcrJ to prevcllt s;,perorjir~ate SUbStitutIOn florrl hell&quot;i{j alll~)lgtlL)llS or rY!n,,,,ln{j ('lloneous CO;HK)tatiOrlS. The etrofle(Als CO~H)otatlunS ~'CCLI r It Ih(~ h:';t O! L;upelordlnL~,lu+. t i~% allowed to extot+d too long An ex:lmpIn will t:l,+kc ih;:4 c:ltLff. Let us ;\]PS~umo that we have a h~C'ralchy in wn=++t'~ th+,le is ar~ (:~drv ! Hi It. ll'le superor'dlnate Of \[t~ED iS MAf4. t~Jf A,I,It# t'/t}t,t,,'~N. ANIMAL for tlfJM.'~IV. :rod rilING for ANIM,1L. fhorefore, the superordu,ate hsl for hR~.D ~s IMAN tlUMAN AHIM4L THINGS. Whilo retenin{I to frcd as llle man seems fmc, calling h~m the ,~tuman seems a Iitl=e z, tran{je. And lurtherlF~ore, using the animal o+ + the thing to refer to Fred ~s actually insulting.</Paragraph> <Paragraph position="7"> \]+he reason these superordinates have negative connalations is that there are e~sentKd quahttes that hH+;rans p,':,ssess that s,+p~;rate ,is from ell;or animals. Calhug FrEd an &quot;anlIi;id&quot; m+1111es that he lai-ks tar,so quahhea, al\]:.f is tt;oreiore insulhog. &quot;l.h+man&quot; sotJnds change because it is the hvihest e=rlry in the seln~mtic hterlrrchy that exhibits these qualities. lalk,:g about &quot;the humnr~&quot; tl~ves erie the feeling that there are other creatules in the d=scourse that aren't human.</Paragraph> <Paragraph position="8"> Paul is senmtive to the connotabons that are possible Ihrough superordinate substitubon. The+ system tdeobfies an es;~e+~tial quality, usu\[\]liy ir=telligence, wilich acts as a block for further supurordinate subsbtution. If the item to be replaced with a superordmate has the prou.~rty of intelhgence, either d~reclly or through semantic inheritance, a superordinate list is made only {)f tho..:e entnes that have themselves the quality el intothgenco, a{j.qir, either d~rectly or through inheritance. If the item does=rt have intelhgence the list is allowed to extend as far as the hierarcl~ical entries will allow. Once the proper list of superordinates =3 established, Paul randomly chooses one, preventing repetition by remembering previous choices.</Paragraph> <Paragraph position="9"> The other problem with superordinato substitution is that it may =ntroduce ambiguity. Again cons=tier Figure 1. If we wanted to perform a superord.\]ato subshhlho;+ for POrJO. we would have the sup~'rordJt13te hst (POSSUM MAMMAL ANIM4L ) to choose from. But HEPZlI\]AH is also a nlammal, so the rnammal cauld refer to either POGO or HEPZIBAH. And not only are both POGO e,r}d ItEPZIBAtl anunals, but sn is CtlURCHY, so the armnat could be any o,}e of them. \]herefore, saying lhe matnmal or the arr+mal would form an ambiguous refecence which the listener or reader would have rio v,,,ay to ur~derstand.</Paragraph> <Paragraph position="10"> Paul reco{.lnizPSPS \[hts ambiguity. Once the superordinate has been selected, it ~s tested against all the other nour~s mentioned so far in the text If any other noun is a rn{;mbet of th.e superordu+ale set m question, the reference is ambl,~!uous. 1his reference can be disarnbiguated by using some feature ot the eh:ment be,to replaced as a modilier. In our example of Figure 1. we hrd that all possums are grey. and therefore POGO ~s grey. Thus. the grey mamma! can refer only to POGO, and is not atnb=guous. In the Pogo world, the features the system uses to d~sarr;oiuuate these references are gender, s~ze, color, and skin type (furry. scaled, of foath{,~('d). Or+co the leature ~s arb~trC.rily selected and the correct value has been determined. ~t ~s tested to see that it genuinely diba+nb~guales the reference, tt any of the nouns that were members of the :,t;pcrordmate set have the same value to~ this feature, it cannot be use,') to (f~s.~mb~guate the reference, arid il is relected. For instance, tl~e size of POGO ~s small, but s~ying the .~n',all mammal ~3 still ambiguous bec~use HEPZll~Atl is also small, and the phrase could just as likely refer to her. The search for a disambiguatmg ieature continues until one is found.</Paragraph> <Paragraph position="11"> Pronominalizat+on, the use of personal pronouns in place of an element, is mechan~c~dly simple. The selecbon of the appropriate persnnal pronoun is strictly gramm;-~lical. Once lhe syntactic case, the oendor, and the number of the element are known, the correct pronoun is dictated by the language.</Paragraph> <Paragraph position="12"> the final ~ex~cal substitution available to Paul is the definite noun phrase, the use of a dehnite artielr~, t,'~e m English, as opposed to an indefinlle article, a or some The definite ~rticle clearly marks an item as erie that has been pre,~iously mentioned, and is therefore old information. &quot;f:',e .'~rlefu,te oracle 31mllatiy marks an item as not havlnq been pre..qc~usiy mentioned. ,~d therefore is new information. 1&quot;his capacity of the defimte article makes ils use required with superordinates.</Paragraph> <Paragraph position="13"> {2} My collie is smart. The dog fetches my newspaper every day.</Paragraph> <Paragraph position="14"> &quot;My collie is smart. A dog fetches my newspaper every day.</Paragraph> <Paragraph position="15"> Willie the mocharlisms for performing the various lexical substitutions are conceptualiy slra~ghtforward, they don't solve the entire problotn uf usin~.l le,:icdl suOstltuhon. Nolhing has been said about how the system chooses WlllCh IOxICUl substilutlor'i to use. This is a serious issue because lexlcGI sLJbsbtutiol~ dOWCOS ace nc;t interchangeable. This is tru.,3, bec;~u:;e le~Jcal substiluhons, as Wltll most cohesive devices, create text by using pze:;uppo-~t;d dependencies tor Iheir inlerpreti'|tioi1s, as we have seeri. If those prPS~Supposod elemeats do not exist, or if it is not possible to Correctly idcnhly whtch of the m~'.ny possiDle elements is the one presuppns,.xi, then it is imoossiblo to correctly int(,rpret the element, arid the only possd.)le rC/su!t ~s cunlus~on. A computer text generation symptom mat incorporates lexical substituhon in its output must insure that tne presupposed element ex:sts, and that it can be readily identified by the reader.</Paragraph> <Paragraph position="16"> Pa~d controls the se!ection of lexicai substitution devices by conceptually dividing the p+ helen rote two I'.,sks. &quot;rho first is to ~dentify the strength of antecedence rucov'crv of toO lexical substitution devices. The second ~s to iderztffy the str~..ngth el pote~:hal arrteceder~ce of each element in the passage, and determine which il any Icxical substitution would be appropriate.</Paragraph> </Section> <Section position="6" start_page="381" end_page="381" type="metho"> <SectionTitle> 4. Strength of Antecedence Recovery </SectionTitle> <Paragraph position="0"> Each time a cohesive devic~ is used, a presupposition clependency is created. rhe itef~ tIlat i:; being presupposed must be correctly identified tor the correct interp~etabon of the element. The relative ease with wh=ch one c3n recover this pre~supposed item from the cohesive element is called the strength el antecedence recove,y. The stronger an eleraent's strength of antecedence recovery, the easier it is to identify the presupposed element.</Paragraph> <Paragraph position="1"> The lexical substitution with the highest strength of antece-lonce recovery is the dehnite noun. This is because the element is actually a recetition of the original item, w~th a definite article to mark the fact that it is old information. There is no real need to refer to the presupposed element, since all the reformation is being repeated.</Paragraph> <Paragraph position="2"> Superordinate subslitution is the lexical substitution witl; the next highest strength of antecedence recovery. Presupposition oepondency genuinely does ernst with Ihe use of superordmates, because some intorrnation is lost When w* ~. move up the semanhc hierarchy, all the traits that are specihc to the element in question are test. To recover this and fully understand the ret(;rence at Ilano. we must trace back to the original element in the hierarchy. Fortunately, the manner in which Paul pedorms suporordmate substitution faohtates this recovery. By insunng that the superordmate substitt;tlon will never be ambiguous, the system only generates suporofdmate ~L, bstttutlons that are readily recoverable.</Paragraph> <Paragraph position="3"> The th,d device used by Paul. ~he personal pronoun, has the lowest strength of antecedence recovery. Pronouns genuinely ~re nothing more tharl plat:e holders, variables that lea=tHole the pnsihotls Of the elements they are replacing A pronoun contains no real semahhc irdormation. The only readily available p~eces of iniormation from a pronoun are the syntactic role Jn the currenl sentence, the gender, and the number of the replaced item. For this mason, pronouns are the hardest to recover of the substitutions discussed.</Paragraph> </Section> <Section position="7" start_page="381" end_page="382" type="metho"> <SectionTitle> 5. Strength of Potential Antecedence </SectionTitle> <Paragraph position="0"> Wl~tle the forms of lexical substitution provide clues (tO various degrees) teat aid the reader in recovering the presupposed elemeflt, the actual way m which the e!orr;er;t =S currerttly being used, how ;t was prev;:)usly used. its cir,,:um,~ tances within the current sentence and within the eqt~re text, can prowce addit;on31 clues. These factors combine to give tne 5pecIhc reference a s~ret;gth el potentiat antecedence. Some etemer~ts, try the ;,ature of their current and previous us~.~ge, will be easier to recover u;depetl~ont of u~e fox,cat subst~lutton dewce selected.</Paragraph> <Paragraph position="1"> Strength of potential antecedence involves several factors, One is the syntachc role the element ~s pl~ying in tr}e current sentence, as well as in the previous relere;ice. Anoti~er is the d~stance of the previous reference from the current. Here distance is defined as the number of clauses between the references, and Paul arbitrarily uses a distance of no more than two clauses as an acceptable distance. The current expected focus of the text also affects an element's potential strength of antecedence. In order to identify the current expected locus, Paul uses the detailed algorithm for focus developed by Sidner \[10\].</Paragraph> <Paragraph position="2"> Paul identifies five classes of potenhal antecedence strength. Class I being the strongest and Class V the weakest, as well as a sixth &quot;nonclass&quot; for elements being mentioned for the first time. These five classes are shown in Figure 2.</Paragraph> <Paragraph position="3"> Class h 1. The sole referent of a given gender and number (singular or plural) last menbo~lod within an acceptable distance. OR 2. The locus or the head of the expected locus list for the previous sentence.</Paragraph> <Section position="1" start_page="382" end_page="382" type="sub_section"> <SectionTitle> Class Ih </SectionTitle> <Paragraph position="0"> The last relerent el a g=ven gender and number last mentioned w;thin an acceptable distance.</Paragraph> <Paragraph position="1"> Class IIh An element that filled the same syntactic role in the previous sentence.</Paragraph> <Paragraph position="2"> Class IV: 1. A referent that has been previously mentioned, OR 2. A referent that is a member of a previously mentioned set that has been mentioned within an acceptable distance. Class V: A referent that is known to be a part of a previously mentioned item. F~gure 2: The Five Classes of Potential Antecedence Once an element's class of potential antecedence is identified, Ihe selection of the proper toxical substitubon IS easy. TI~O stronger an element's potenbal a~teceder, ce. the weaker the antecedence of the lexJcal subslrtutior) I-igule 3 illustrates the mappings lrom potential antecedence to lex,c:ll 3ut)stltut~on devices. Note that Class I11 elements are unusual i~ that the device used to replace them can vary. If the previous instance of the element was of Chtss I. if it was replaced with a pronoun, then the Cunent instance =s replaced with a pror~oun, too.</Paragraph> </Section> </Section> <Section position="8" start_page="382" end_page="382" type="metho"> <SectionTitle> 6. An Example </SectionTitle> <Paragraph position="0"> To see the effects of controlled lexical substitution, and to help clarify the ideas discussed, an example is provided. The following is an actual example of text generated by Paul Tile domain is the so-called children's story, and the example discussed here is one about characters frorn Walt Kelly's Pogo comic strip, as shown in Figure 1 above.</Paragraph> <Paragraph position="1"> Figure 4 contains the semantic representation for the example story to be generated, in the syntax of NL P \[4\] records. P .................................................</Paragraph> <Paragraph position="2"> If the SIOFy were to be generated without any lexical subslitutions at all, it would look like the following.</Paragraph> </Section> class="xml-element"></Paper>