File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2022_metho.xml
Size: 16,884 bytes
Last Modified: 2025-10-06 14:12:25
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-2022"> <Title>Generating from a Deep Structure *</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The basic algorithm </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 A brief introduction to UCG </SectionTitle> <Paragraph position="0"> In UOO the basic linguistic unit is a sign which includes phonological, syntactic, semantic and ordering information. In the sequel, a sign will be represented either by a complex feature structure or as Pho :Synt :Sera: Drder.</Paragraph> <Paragraph position="1"> The phonological field of a sign contains its orthographic string. The syntactic field is categorial i.e. it can be either basic (e.g s,np,n etc) or complex in which case, it will be of the form C/Sign where C is a syntactic field and Sign is a sign. Moreover, any basic category can be assigned some morphosyntactic information. For instance s\[fin\] denotes the category sentence with morpholog-y feature value: finite. The semantic field contains the semantics of the expression whereby the semantic representation language is a linear version of Discourse Representation Theory in which each condition is preceded by a sorted variable called the index. As in most unification based grammars, the semantics of any expression results from the unification of the semantics of its subparts. Finally, the Order field is a binary feature with value either pre or post which constrains the applicability of grammar rules.</Paragraph> <Paragraph position="2"> Grammar rules in uco are of two types: binary and 128 2 unary. Binary rules include forward and backward functional application. These are stated below.</Paragraph> <Paragraph position="3"> if the order value of Sign i8 post Unary rules are of the form c~ --+ fl where c, and fl are signs. Unary rules are used for the treatment of unbounded dependencies, syntactic forms of type-raising and subcategorlsation for optional modifiers.</Paragraph> <Paragraph position="4"> aetive(SignO,Active), apply(SignOoActive,Result), retrleve(DS,SubDS,NewDS), generate(SubDS, Active), reduee(Result,Sign,NewDS).</Paragraph> <Paragraph position="5"> The algorithm presented above makes many simplifying assumptions which are incompatible with a wide coverage uoc grammar. To produce a complete generator with respect to uoo we need to extend the basic algorithm to account for type-raised NPs, identity semantic functors, lexical modifiers and unary rules. For more details on the general content of these extensions see Ill. For their implementation cf. the listing of the generation algorithm given in the appendix.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 A sketch of the algorithm </SectionTitle> <Paragraph position="0"> Following work by \[11, \[5\] and \[3\], the algorithm we present here follows a mixed top-down and bottom-up strategy.</Paragraph> <Paragraph position="1"> The generation process starts with a deep structure DS and a sign Sign whose syntax embodies the goal category (e.g. sentence(finite)), get deepstr info extracts from the deep structure some semantic (Sere) and syntactic (Synt) information on the next sign to be generated. create sign creates a new sign Sign0 on the basis of Sem and Syn~. Lexlcal look-up on Sign0 returns a sign with instantiated syntax and phonology. The call to reduce ensures that this lexical sign is reduced to the goal sign Sign in the process instantiating the generated string.</Paragraph> <Paragraph position="3"> There are two main ways of reducing a sign Sign0 to a goalsign Sign. The base case occurs when Sign0 unifies with Sign and the deep-structure is empty i.e. all the input semantic material has been made use of in generating the result string. The recursive case occurs when Sign0 is a syntactic functor. If the syntax of Sign0 is of the form Result/Active, we apply Result/Active to Active thus getting a new sign Result. retrieve non-deterministically retrieves from the current deep structure DS, a substructure SubDS and returns the remaining deep-structure NewDS. The argument Active is then generated on the basis of the extracted sub..structure SubDS with a new goal sign whose syntax is that predicted by the syntactic functor Sign0. The resulting sign Result is recursively reduced to the original go,~l sign Sign.</Paragraph> <Paragraph position="4"> reduce (Sign. Sign, \[ \[\] 0 \[\] \] ) .</Paragraph> <Paragraph position="5"> reduce (SignO, Sign, DS) :-</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Bilingual Generation </SectionTitle> <Paragraph position="0"> Consider the following synonymous sentences.</Paragraph> <Paragraph position="1"> a The mou.,m misses the cat b Le chat manque ~l la souris (Lit. the cat misses to the mouse) (1) There are two main differences between (la) and (lb). First, a NP (the mouse) translates to a PP ( g~ la souria). Second, a structural transfer occurs i.e. the object NP in (la) becomes a subject in (lb) and vice-versa. For the generator described above, this poses no particular problem. Because DSs encode thematic rather than grammatical dependencies, structural transfer is no issue. Further, since at DS all arguments are represented as NPs x, the generation of (la) is straightforward. Generating (lb) is a little more intricate but results naturally from the interaction of the generator with the grammar =. Note that if the PP were represented as such in the DS, then generation would fail for the English sentence. This suggests that the deep structures we generate from offer the right level of abstraction for generation to be possible in several languages.</Paragraph> <Paragraph position="2"> The case of structural transfer illustrated in (1) is a good example of the problems that occur with generators that are unable to deal with non-canonical input. To illustrate this consider the following situation. Suppose that given two grammars, one for Engllsh(G~) and one for French (GF), (la) and (lb) each have one unique derivation with resulting semantics as in (2).</Paragraph> <Paragraph position="3"> a The(mouse(m), the(cat(c), miss(re,c)) b The(cat(c), the(mouse(m), tulsa(re,c))) (2) Furthermore, suppose (3a) is non-canonlcal with respect to C,~ (i.e. (an) is not derivable under C,~) and (3b) is non-canonic.M wrt GE. For any generator G that cannot deal with non-canonical input, this means that G cannot be used in a system where parsing occurs on one language IThis is in accordance with the view that prepositions ocdeg curing within argumental PPs have no semantic content.</Paragraph> <Paragraph position="4"> and generation on another. More to the point, if G is coupled with the grammar GE, then it will fail to generate when given (2b) as input - and similarly when coupled with GF and given input (2a). To understand why deep structures allow for grammar independent generation, let us first examine why traditional top-down/bottomup generators such as the one described in \[1\] fail on non-canonical input. 3 Consider the case where we try to generate under Gs the English sentence in (la) from the semantic (2b) and- as already mentioned- (2b) is non-canonical wrt GE. The main steps of the generation process will be as follows. 4 Suppose the goal sign is SignO with category s\[fin\]. First, a sign Sig~l is created whose semantics is as in (2b). Lexical access on Signl returns the sign for 'the'. On the basis of the syntactic and semantic predictions made by Signl, the sign Sign2 for 'cat' is then generated. Reduction of Signl with Sign2 yields a new sign Sign3 with phonology 'the cat'and syntax C/(C/np) 5. In turn, Sign3 makes some predictions which lead to the generation of a new sign Sign4 with syntax C/(C/np) and phonology 'the mouse'. Finally, on the basis of Sign4, the sign Sign5 for 'miss' is generated. At this point in generating, the two signs in (3) must combine to reduce to a sign with category C/np.</Paragraph> <Paragraph position="6"> But under ti~e UCG rules of combination (see 3.1), these two signs cannot combine because of the unification clash occuring between the semantics of the accusative NP in the verbal sign (c.NP2) and that of the NP sign within aNote that in this case, reduction to normal form is no longer a possible solution even if we were able to define a normal form for our semantic representation language. For suppose that (2a) is the normal form, then (lb) is not derivable and if (2b) is, then (la) is not derivable.</Paragraph> <Paragraph position="7"> 4For more information on the details of the generation procedure, see \[1\].</Paragraph> <Paragraph position="8"> ~For the sake of clarity, the syntactic part of Sign3 is here simplified in that non-syntactic fields (Phonology, Semantics etc.) are omitted. Note also that in UCG, NPs are typeraised i.e they are assigned the syntactic category C/(C/np) as opposed\[ to just np.</Paragraph> <Paragraph position="9"> the sign for 'the mouse' (m.mouse(m)). Hence generation fails. Consider now how the problem is dealt with when generating from deep structures. Rather than being as indicated in (2b), the input to the generator is 6 head(miss(m, e), \[specifier(the, head(moose(m), \[\], \[\])), specifier(the, head(cat(e), \[l, \[l))\] \[\])</Paragraph> <Paragraph position="11"> Roughly, generation will proceed as follows. Suppose the goal sign SignO has category s\[fin\]. First, the semantics corresponding to the head of the clause (i.e. mi,Js(m, c)) is extracted from (3) and a sign Signl is created with semantics miss(re, c). Lexical access on Signl returns the sign given in (3) above. Signl must then be reduced to SignO with category s\[fin\]. At this stage, the remaining DS is \[specifler(the, head(mouse(m), \[\], \[\])), speci/ier(the, head(cat(c), \[l, \[\]))\] To generate the first argument null of Signl, we then have the choice between generating on the basis of specifier(the, head(mouse(m), \[\], \[\])) or of specifier(the, head(cat(c), \[\], \[1)) 7 As demonstrated above, if we generate the sign for 'the mouse' first, reduction cannot apply and generation will fail. But here, failure is only temporary and on backtracking, the sign for 'the cat' will eventually be generated; it will then reduce with Signl to generate Sign2 with phonology 'misses the cat'. At this point, the remaining DS will be \[specifier(the, head(mouse(m), \[\], \[\]))\]. This will trigger the generation of Sign3 with phonology 'the mouse' which will then combine with Sign2 to reduce to SignO with resulting phonology 'the mouse misses the cat'.</Paragraph> <Paragraph position="12"> To generate the French sentence 'Is chat manque h la 8ouris ', the same generation process applies but this time in connection with GF and in a reverse order i.e. the sign for 'Is souris'(the mouse) is generated before the sign corresponding to the NP 'Is chat' (the cat). Further, because in the French lexicon 'manque' (miss) subcategorises for a dative NP, the preposition ~ is generated and combined with the sign for 'Is souris' before reduction of the thus obtained PP with the verb. Because DSs make no assumption about the linear ordering of the constituents to be generated, the problem raised by non-canonicity simply does not arise.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Comparisons with Re\]\[ated </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Research </SectionTitle> <Paragraph position="0"> To compare our algorithm with previous work, we first show how it can be amended to phrase structure grammars. Consider the following extension to reduce.</Paragraph> <Paragraph position="1"> generate_sisters (Kids, DS, NewDS), reduce(gem, Sign, NewDS).</Paragraph> <Paragraph position="2"> gene:t:'ate_sisters(\[\] , DS, DS).</Paragraph> <Paragraph position="3"> gene:t:afie_sisfiers(\[HIT\], DS, NewDS) :index (tI, Idx), me I; eh (Idx, DS, SubDS, NewDS 1 ), generate(SubDS, H), generate sisters(T, NewDS1, llewDS).</Paragraph> <Paragraph position="4"> This clause is very similar in structure to the second clause of reduce, the main difference being that the new claus(, makes fewer assumptions about the feature struotures being manipulated, rule enmnerates rules of the grammar, its first argument representing the mother con. stitu~ut, its second the head daughter and its third a list of non-head daughters which are to be recursively generated by the predicate generate sisters. The behaviour of this clause is just like that of the clause for reduce which implements the uc(; rules of function application. On tire basis of the generated lexical sign Sign0 an applical.ion of the rule is hypothesised and we then attempt to prove that ru\]e application will lead to a new sign gem whiel, reduces to the original goal Sign.</Paragraph> <Paragraph position="5"> Having generalised our basic algorithm to phrase structure ~ran\]mars, we can now compare it to previous work by \[5} and \[3\] Van Iqoord's Bottom-Up Generator (BUG) is very similar in structure to our basic algorkhm. Closer examination of the. two programs however reveals two differences. The first is that daugthers in a rule are separated into those that })recede the semantic head and those that follow it. The ,'.econd more meaningful difference involves the use of a 'link' predicate implementing the transitive closure of the semantic head relation over the grammar rules. The link predicate is similar in purpose to reachibility table~ in parsing algorithms and contributes to reducing the search space by producing some syntactic information on the sign to be generated. Itowever, such a predicate is of litt.le use when generating with a categorial grammar in particular and with any strongly lexicalist linguistic theory in general since in these, the grammar rules are extremely schematised. Their information content is so impoverished that the computation of and resort to a link predicate cannot be expected to reduce the search space in an/meaningflfl way. In the algorithm presented above however~ this shortcoming is redressed by exploiting the syntactic information contained in the deep-structure we start from.</Paragraph> <Paragraph position="6"> In \[5\], Shieber et al. present a &quot;semantic-head-driven&quot; generation algorithm that is closely related to van Noord's. In contrast to Van Noord's algorithm however, this ~dgorithm also operate on grammars violating the sema~dic head con.~traint (SHC) according to which any sema~tic representation is a further instantiation of the semantic representation of one of i~s constituents called the semantic head. This is achieved as follows. First, a distlnction is made between chain--rules and non-chain-rules whereby non-chain-rules are used to introduce semantic material syncategorematically. The distinction between the two types of rules can be sketched as follows. null i. Chain-rule (Sem, lhs --> Head(Sem), Sisters) 2. Non-Chain-rule (Sem, lhs(Sem) --> Daughters) (1) indicates that given a semantic Sere, a chain rule will be such that Sere unifies with the head daughter's semantics whilst (2) shows that non-chMn-rules are such that the input semantics must unify with the semantics of the lhs of the rule. The intuition is that non-chain-rules will help find the lowest node in the derivation tree whose semantics unify with the input semantics. Furthermore, the top-down base case for non-chain-rules corresponds to the case in which the lhs of the rule has no non-terminal daughters i.e. to lexieal look up. Consider now the top call to generate.</Paragraph> <Paragraph position="7"> generate(Root) :non_chain rule(Root,Pivot,P.hs), generate rhs(Rhs)0 connect(Pivot,Root).</Paragraph> <Paragraph position="8"> Two cases obtain with regard to the applicatlon of the non- chain-rule predicate. Either the base case occurs and lexical look-up takes place exactly as in our algorithm or a non-chain-rule is triggered top-down before the conatituents in the rhs are generated by a recursive call to generate. Hence the solution to the introduction of syncategorematic material is essentially a reintroduction of the top-down generation strategy. The result is that there is no guarantee that the algorithm will terminate. This point seems to have been overlooked in \[5 t . Therefore, the extension may be of less utility than it appears to be at first sight although it may well be the case for linguistically motivated grammars that termination problems never arise.</Paragraph> </Section> </Section> class="xml-element"></Paper>