File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-0806_metho.xml
Size: 25,934 bytes
Last Modified: 2025-10-06 14:07:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0806"> <Title>An algorithm for efficiently generating summary paragraphs using tree-adjoining grammara0</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Searching for concise, coherent </SectionTitle> <Paragraph position="0"> paragraphs</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Aggregation is a global optimisation </SectionTitle> <Paragraph position="0"> problem Our aim is to generate paragraphs which are well aggregated. This notion should be defined in terms of conciseness and coherence, which terms are not formally definable. However, some reasonable approximation to them can be achieved by specifying preferences for certain types of syntactic constructions over others (Robin and McKeown, 1996), possibly by giving each generatable construction a score which reflects its relative preferability. We then define the best aggregated paragraph to be that which achieves the best sum of its constituent constructions' preference scores.</Paragraph> <Paragraph position="1"> Robin and McKeown's (1996) system, STREAK, generates aggregated, fact-rich sentences. It adds facts in order of the preferability of their best possible realisation. It revises its syntactic choices every time an extra fact is aggregated. This is computationally expensive, and makes multi-sentence generation by the same means prohibitively slow. They do suggest how to deal with this when many of the facts occur in fixed positions, but this is not the case in our corpus.</Paragraph> <Paragraph position="2"> CASPER (Shaw, 1998) delays syntactic choices until after it has decided where sentence boundaries should fall. It thereby gains computational efficiency at the cost of its sentences being less optimal aggregations.</Paragraph> <Paragraph position="3"> Our algorithms are an attempt to avoid these problems and to achieve greater efficiency by precompiling detailed syntactic decisions, the results of which we store in the form of explicit mappings from fields to surface forms. At generation time, we search for the optimal selection of realisations. This approach deviates from the pipeline architecture of NLG systems which, it has been observed, is not wholly suited to the generation of aggregated texts.</Paragraph> <Paragraph position="4"> The first author to discuss this was Meteer (1992), who showed that the microplanning stage of the pipeline is constrained by surface realisation in two ways. First, a microplan must be realisable in the target language; second, a realisable microplan must make best use of the capacities the target language for concise expression.</Paragraph> <Paragraph position="5"> More generally, in order to generate aggregated text, constraints imposed by and opportunities afforded by the surface form may be taken into account at any stage in the pipeline. Reape and Mellish (1999) provided examples of different systems each of which takes aggregation decisions at a different stage. It may not be easy to determine what effect a decision taken at an early stage will have at the surface; and decisions taken at one stage may preclude at a later stage a choice which results in a more aggregated surface form. Similarly, it may not be easy to make a decision at an early stage which makes best use of the surface possibilities.</Paragraph> <Paragraph position="6"> Consider the examples of figures 1 and 2. Both summarise the same set of fields; figure 2 summarises additionally the field &quot;subject = science&quot;. Both paragraphs summarise their fields in the most concise and coherent manner possible (although this is, of course, a subjective judgement). Note that they treat the fields they have in common differently with respect to ordering and distribution between the sentences.</Paragraph> <Paragraph position="7"> Various types of constraints cause this. Syntactic constraints include: &quot;science&quot; may be used as an adjective to pre-modify &quot;lesson plan&quot;. Semantic constraints include: &quot;Constellations&quot; lasts 4 hours, but ProLog does not. Stylistic constraints include: 'Maureen Ryff wrote ...' is preferable to '... was written by Maureen Ryff'. We suggest, as do Stone and Doran (1997), that integrating these constraints simultaneously is more efficient then pipelining them. We additionally suggest that representing these constraints in a unified form can provide further efficiency gains.</Paragraph> <Paragraph position="8"> &quot;Constellations&quot; is a 4-hour lesson plan published by online provider ProLog. Maureen Ryff wrote it for small group teaching.</Paragraph> <Paragraph position="9"> of fields of figure 3 in an aggregated manner.</Paragraph> <Paragraph position="10"> &quot;Constellations&quot; is a science lesson plan which lasts 4 hours. Maureen Ryff wrote it for small group teaching and ProLog, an summarises the set of fields of figure 1, together with field subject = &quot;science&quot;. Notice the non-linear effect the addition of a single extra proposition can have on the structure of the paragraph.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Modeling paragraphs with TAG </SectionTitle> <Paragraph position="0"> Our approach uses, as its primary representation, TAG formalism extended to include unification based feature structures (Vijay-Shanker and Joshi, 1991). Joshi (1986) describes the advantages TAG possesses as a syntactic formalism for NLG.</Paragraph> <Paragraph position="1"> In generation, the TAG model has usually been applied to generating clauses and sentences. Recently, Webber et al. (1999) outlined the benefits of modelling longer strings of text by the same means.</Paragraph> <Paragraph position="2"> The most important characteristic of TAG for our purposes is the local definability of dependencies: constraints between the nodes of elementary trees are preserved under adjoinings which increase the distances between them. For example, in the sentence fragment &quot;Springer published ...&quot;, which might be modelled by a single initial tree, the object is constrained to be some entity published by Springer. If an adjunction is made so that the fragment becomes &quot;Springer, the largest publishing company in Europe, published ...&quot;, this constraint is undisturbed.</Paragraph> <Paragraph position="3"> Our approach presupposes the existence of a TAG whose string set is exactly those paragraphs which are comprehensible summaries of subsets of fields. We do not discuss the creation of such a TAG here. We have made progress with designing one; we believe that it is the flatness of the input data which makes it possible.</Paragraph> <Paragraph position="4"> Let us restate the problem somewhat more formally. Suppose that we have a set a4a6a5 of a7 fields whose values we may be required to express.</Paragraph> <Paragraph position="5"> Suppose that for every a8a10a9a11a4 a5 there is a template which expresses a8 . A template is a paragraph in which certain words are replaced by slots. A slot is a reference to a field-name. A template a12 expresses a set of fields a8 if the name of every element of a8 is referenced by a slot, and every slot refers to an element of a8 . We say thata12 expresses a8 and that the resulting paragraph is the expression of a8 with respect to a12 . See figure 3.</Paragraph> <Paragraph position="6"> Let a13a15a14a16a8a18a17 denote the template which &quot;best&quot; expresses some a8 a9a19a4a6a5 . Suppose also that we have a TAG a12a20a5 with string set a21a22a14a23a12a24a5a25a17 such that a13a15a14a16a8a18a17a20a27a28a8a29a9a11a4a30a5a32a31a33a9a34a21a35a14a23a12a24a5a25a17 . The creation of a12a24a5 is not discussed here. Every string in a21a35a14a23a12a20a5a25a17 is the yield of a36 , some derived tree of a12a20a5 .</Paragraph> <Paragraph position="7"> Each of a TAG's derived trees is represented by a unique derivation tree. Typically a derivation tree represents several (differently ordered) sequences of compositions of (the same set of) elementary trees, all of which result in the same derived tree.</Paragraph> <Paragraph position="8"> Hence, a derived tree a36 of a TAG a12 with elementary trees a37 is the result of some sequence of compositions of the elements of a37 which is equivalent to some derivation tree a38 . We write a36a40a39a41a38a42a14a43a37a44a17 or just a36a45a39a46a38a15a14a23a12a47a17 , Hence, our problem is, given a12a24a5 and some a8a40a9a48a4a30a5 , find some a38 such that a38a42a14a23a12a24a5a49a17a50a39a46a13a51a14a16a8a18a17 . There are two parts to the problem of finding a13a15a14a16a8a18a17 . First we must recognise a13a51a14a16a8a18a17 , which we may do, as described in section 2.1, by defining a13a15a14a16a8a18a17 to be the paragraph which achieves the best sum of its constituent constructions' preference scores. Second, since the search space of derivation trees grows exponentially with the number of trees in the grammar, we must find its derivation in a reasonable amount of time.</Paragraph> <Paragraph position="9"> For each field to be expressed, the slot which refers to it may be expressed by means of one of several different syntactic constructions. So each slot will contribute one of several possible preference scores to the paragraph in which it occurs, depending on the syntactic form in which it is realised. However, the syntactic forms by which a slot slot slot slotis a published by slot slot slot. wrote it for .</Paragraph> <Paragraph position="10"> slot may be expressed are an implicit property of the grammar: it requires search to discover what they are, and then further search to find their optimal configuration.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 The search space </SectionTitle> <Paragraph position="0"> We model possible paraphrases with a TAG, the paraphrases being the elements of its string set.</Paragraph> <Paragraph position="1"> The nodes in the search space are TAG derivation trees, and an arc from a52 to a53 represents the composition into the partial derivation corresponding to a52 , of an elementary tree, with result the partial derivation corresponding to a53 . The size of the search space may be reduced by collapsing certain paths in it, and by pruning certain arcs.</Paragraph> <Paragraph position="2"> These operations correspond to specific lexicalisation and the removal of redundant trees from the grammar respectively.</Paragraph> <Paragraph position="3"> A tree in the grammar is redundant if it cannot contribute to a paragraph which expresses the required fields; or if it cannot contribute to a 'best' paragraph which expresses those fields. We will expand on redundancy removal after describing the specific lexicalisation algorithm. The specific lexicalisation algorithm converts a TAG into a specific-lexicalised version of itself, in which certain paths in the search space, which it can be known will be used in any derivation, are collapsed. null</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The algorithms </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Specific lexicalisation: creating a clausal </SectionTitle> <Paragraph position="0"> lexicon We begin by introducing some notation, and defining some properties of TAGs and their elementary trees. Let a54a43a12a56a55a58a57a60a59 denote a12 , a TAG, and a57 , some set of elementary trees, not necessarily in a12 . A leaf node of an elementary tree may be labelled as a slot, which is a special case of the lexical anchor.</Paragraph> <Paragraph position="1"> An elementary tree is specific-lexicalised if at least one1 of its leaf nodes is a slot. An elementary tree is a61 -lexicalised if it is specific-lexicalised or if it has no substitution nodes or foot nodes.</Paragraph> <Paragraph position="2"> A TAG is specific-lexicalised if all its elementary trees are a61 -lexicalised.2 Given some a54a62a12a56a55a58a57a63a59 , let a64 be an element of a12a10a65a66a57 which is not specific-lexicalised. Let a67 be an elementary tree of a12 . Suppose that there is some composition of a67 into a64 and that the resulting tree is specific-lexicalised. Then we say that a64 is single-step-lexicalisable in a54a23a12a56a55a58a57a60a59 . We call any such resulting tree a single-step-lexicalisation at a7 of a64 in a54a23a12a56a55a58a57a60a59 , where a7 is the node at which the composition occurred.</Paragraph> <Paragraph position="3"> We now present the algorithm for our transformation Specific Lexicalisation. null ature. A TAG is lexicalised (Joshi and Schabes, 1991) if it is specific-lexicalised according to this definition. The implication does not necessarily hold in reverse.</Paragraph> <Paragraph position="4"> then 6: if a64a78a73a79a12 then 7: Remove a64 from a12 .</Paragraph> <Paragraph position="5"> 8: Add a64 to a57 .</Paragraph> <Paragraph position="6"> 9: end if 10: For some node of a64 , a7 , add all the single-step-lexicalisations at a7 of a64 in a54a62a12a77a55a58a57a60a59 , to a12 .</Paragraph> <Paragraph position="7"> 3 11: end if 12: end for 13: until a54a23a12a56a55a58a57a60a59a78a73a80a71 14: a12 is a specific-lexicalisation4 of a12a24a5 . To illustrate this procedure, we have provided some figures. Consider the TAG a12a30a81 , whose elementary trees are shown in figure 4. We have chosen, for reasons of space and simplicity, not to show the feature structures attached to each node of these trees. Their approximate form can perhaps be deduced by examination of the templates modelled by a12a6a81 , shown in figure 6. A specific lexicalised version of the TAG, a12a20a82 is shown in figure 5. We have named each elementary tree in a12a20a82 by concatenating the names of its constituents from a12 a81 . The templates generated by a12 a81 (and hencea12 a82 ) are shown in figure 6.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Redundancy removal </SectionTitle> <Paragraph position="0"> We can further remove redundancy in a specificlexicalisation, a12 , of some TAG. Let a54a23a12a56a55a58a57a63a59 be a pair as in the previous section. The following three subsets of the elementary trees of a12 are redundant. First, those trees a64a83a73a68a12 which are not rooted on the distinguished symbol and for which there is no a67a84a73a41a12a48a65a63a57 such that a64 can be composed into a67 . Second, those a64a44a73a66a12 which have a substitution node into which no a67a18a73a75a12a75a65a85a57 can be substituted. Third, those a64a35a73a80a12 such that for each tree a86 which is the result of the composition of some a67a85a73a79a12a87a65a88a57 into a64 , a86a89a73a90a12a91a65a88a57 . Our program 3Note that there is a choice at this step. Our implementation of this algorithm chooses a92 such that the number of single-step-lexicalisations at a92 is maximised. But different choices result in a transformed grammar with different properties. null 4We claim that a specific-lexicalisation of a TAG is indeed specific-lexicalised. Note that there does not necessarily exist a specific-lexicalisation of a TAG. For certain pathological examples of TAGs, the algorithm does not terminate. Note also that if a specific-lexicalisation exists, it is not necessarily unique. Further work is required to discover the properties of the various specific-lexicalisations in these cases.</Paragraph> <Paragraph position="1"> which implements the algorithm in fact removes these redundancies, not only after completion, but also after every iteration.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Finding the (approximately) global optimum </SectionTitle> <Paragraph position="0"> Specific-lexicalisation causes the (previously implicit) grammatical constructions by which an element of a8 may be expressed to become explicit properties of the transformed grammar. Specifically, each element of a8 occurs as the anchor of each of a number of elementary trees. Let us refer to the set of elementary trees in the transformed grammar anchored by a93a60a73a10a8 as a94a24a14a43a93a95a17 .5 Each of these trees corresponds to a grammatical form in which the element may be realised. Hence, rather than performing an exhaustive search of the space of derivation trees a96a97a14a23a12 a82 a17 , specific-lexicalisation allows us to instead perform a best first search.</Paragraph> <Paragraph position="1"> That is, we choose exactly one element of a94a32a14a43a93a95a17 for each a93a47a73a60a8 . Let a93a99a98a101a100a103a102a104a93a99a13a25a67a42a14a16a8a18a17 denote the set of all sets which contain exactly one element of a94a32a14a43a93a95a17 for each a93a47a73a60a8 . Recall that we may assign to each syntactic form in which an element of a8 may be realised a preference score, and that each element of a94a24a14a43a93a99a17 corresponds to some syntactic form. So, for each element of a93a99a98a105a100a49a102a106a93a107a13a49a67a15a14a16a8a89a17 we may sum the preference scores of its elements. Hence, we may impose an order on the elements of a93a107a98a105a100a103a102a104a93a99a13a25a67a42a14a16a8a18a17 according to their sum of preference scores. We may then refer to each element of a93a99a98a105a100a49a102a106a93a107a13a49a67a15a14a16a8a89a17 as a108 a86a109a13a49a110a50a14a16a8a111a55a112a102a113a17 , where a102 is the element's position in the order, with a108 a86a109a13a25a110a50a14a16a8a111a55a95a114a25a17 being first. We then search, in order, the spaces of possible compositions of the a108 a86a115a13a49a110a50a14a16a8a111a55a112a102a113a17 s combined with some necessary supporting trees which are not anchored by an element of a8 . Call these spaces</Paragraph> <Paragraph position="3"> those trees which might be redundant with respect to the search for a13a15a14a16a8a18a17 removed. We begin the search with a108 a86a109a13a25a110a115a116a99a14a16a8a111a55a95a114a25a17 . It is not guaranteed that a13a15a14a16a8a18a17 is in this space. If it is not, we repeat the search using a108 a86a109a13a25a110a115a116a95a14a16a8a111a55a118a117a109a17 , and so on. At worst tution nodes are indicated with a '+'; foot nodes are indicated with a '*'; the distinguished symbol is 's'. Slots are shown as '@<reference>', where &quot;reference&quot; is the field to which the slot refers. Note that the feature structures which are associated with each node, which prohibit certain compositions, are not shown. Note also that this is not a lexicalised TAG (LTAG). This is somewhat unusual; we intend, as part of our ongoing work, to apply our techniques to an established LTAG, such as XTAG.</Paragraph> <Paragraph position="4"> Each tree's name is below it, in bold. Note that, since the feature structures are not shown, it is not apparent why certain trees which the algorithm seems to imply do not occur in this set. in figure 3 with respect to template 6 is the first sentence of the paragraph of figure 1. In fact, since the a108 a86a109a13a25a110a115a116a25a14a16a8a111a55a112a7a30a17 s do not partition a96a97a14a23a12 a82 a17 , in the worst cases this procedure is slower than an exhaustive search. However, a13a51a14a16a8a18a17 is defined in terms of maximal preference scores, so it is likely to be found ina108 a86a109a13a49a110a50a14a16a8a111a55a112a102a113a17 for &quot;low&quot; a102 . For illustration, refer again to the specific-lexicalisation in figure 5. Notice that @<publisher.name> occurs as the anchor of more than one tree.6 These trees, predication part:participleclause:item name and adj pred partp:participleclause:item name which we will refer to as a64a120a81 and a64a121a82 respectively, represent the forms in which that slot may be expressed. Hence, @<publisher.name> may be realised as a predication in its own right using a64a120a81 , as in templates 7 and 8 in figure 6, or as an adjunct to another predication using the second, as in templates 2, 3, 5 and 6. Suppose that our preference scores rate a64a121a82 more highly than a64a120a81 , and that we must include all four slots. Then the system would first search the space of compositions of the trees of a12a20a82 without a64a118a81 , and generate template 6. The second choice, a12a24a82 without a64a121a82 leads to the generation of the concatenation of templates 4 and 8, which expresses the same fields but is less aggregated. This is as we would wish.</Paragraph> <Paragraph position="5"> 3.4 Redundancies in the search space Specific-lexicalisation is a transformation which operates on a complete TAG a12a6a81 and its result is another TAG a12a24a82 whose string set is the same as a12a30a81 's. Also, the feature structures on the nodes of the elementary trees of a12a20a82 contain fewer unbound variables. Unbound variables represent 6We are ignoring the tree item adjective adjoin:adjective:item title, which is not usable due to its features, which are not shown.</Paragraph> <Paragraph position="6"> dependencies between parts of the grammar. A search of the space of compositions of elementary trees may make a long chain of compositions before discovering that the composed structure is forbidden by such a dependency. The forbidden chain of compositions is redundant, and specific-lexicalisation removes it from the search space.</Paragraph> <Paragraph position="7"> Importantly, specific-lexicalisation may also take as a parameter a8 , the set of fields to be expressed. It then removes from a12a6a81 all elementary trees which are anchored on slots which do not refer to elements of a8 and operates on this reduced TAG, with result a12a20a82 . And if a13a51a14a16a8a89a17a74a73a60a21a22a14a23a12a30a81a122a17 then a13a51a14a16a8a89a17a66a73a123a21a35a14a23a12 a82 a17 . Then, in effect, specificlexicalisation, as well as removing general redundant dependencies, is specifically removing some of those parts of the grammar which are redundant with respect to the search for a13a15a14a16a8a18a17 .</Paragraph> <Paragraph position="8"> Redundancy occurs in a grammar for two reasons. First, it is written, by hand, with linguistic rather than computational efficiency concerns in mind. It is too complex for its writer to be able to spot redundancies arising from long chains of dependencies between its parts. So specific-lexicalisation may be regarded as automatic bug removal. Second, the grammar is written to be able to model all the templates which express some a8a10a9a69a4 a5 . So for any particular a8 , the grammar will contain information about how to express items not in that set. Specific-lexicalisation highlights this redundancy.</Paragraph> <Paragraph position="9"> We have conducted some preliminary experiments using several small TAGs in which, for each TAG and for its specific-lexicalised equivalent, we measured the time our system takes to generate the modelled sentences. The results showed a decrease in the generation time after lexicalisation of orders of magnitude, with the best observed reduction being a factor of about 3000.</Paragraph> <Paragraph position="10"> The specific-lexicalisation of a TAG has the property of having the same string set (and possibly the same tree set) as the original, but a smaller space of possible compositions. We have not proved either clause of this statement, but on the basis of experimental evidence we believe both to be true. Also, the following argument supports the case for the second.</Paragraph> <Paragraph position="11"> Recall that a feature structure attached to a non-terminal symbol in some rule (tree in the case of TAG) of a grammar is an abbreviation for several similar rules. For example, if a node has associated with it a feature structure containing three features each of which may be in one of two states and none of which are currently instantiated, then it abbreviates a117a125a124a87a39a127a126 nodes. So each tree in a TAG with feature structures is an abbreviation for a7 trees, where a7 is the number of possible configurations of the feature structures on its nodes.</Paragraph> <Paragraph position="12"> Hence, when we search the space of possible compositions of some number a128 of trees, we are in fact searching the space of compositions of a94a51a128 trees, where a94 is some factor related to the number of possible configurations of the feature structures on the trees. Specific-lexicalisation identifies exactly which of the (non-featured) trees for which a tree with feature structures is an abbreviation are irrelevant to a search by instantiating unbound variables in its features.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Further work and discussion </SectionTitle> <Paragraph position="0"> The precise circumstances under which the techniques described are effective are still to be established. In particular, it is our intention to repeat our experiments with a standard LTAG; and with TAGs induced automatically from our corpus.</Paragraph> <Paragraph position="1"> To summarise, we claim that the generation of an optimally aggregated summary paragraph requires the ability to move facts across sentence boundaries. A difficulty to achieving this is the exponential relationship between the number of possible paraphrases of a summary of a set of facts and the number of facts in that set. Our algorithm addresses this by transforming a TAG to better model the search space.</Paragraph> </Section> class="xml-element"></Paper>