File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1417_intro.xml
Size: 12,961 bytes
Last Modified: 2025-10-06 14:01:03
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1417"> <Title>OLAP context Dimensions</Title> <Section position="3" start_page="125" end_page="129" type="intro"> <SectionTitle> 2 Research focus: content aggregation </SectionTitle> <Paragraph position="0"> in natural language generation Natural language generation system is traditionally decomposed in the following subtasks: content determination, discourse-level content organization, sentence-level content organization, lexical content realization and grammatical content realization. The first three ......................... subtasks together ate_often=referred toas.Jzontent planning, and the last two together as linguistic realization. This separation is now fairly standard and most implementations encapsulate each task in a separate module (Robin 1995), (Reiter 1994).</Paragraph> <Paragraph position="1"> Another generation subtask that has recently received much attention is content aggregation.</Paragraph> <Paragraph position="2"> However, there is still no consensus on the exact scope of aggregation and on its precise relation with the five standard generation tasks listed above. To avoid ambiguity, we define aggregation here as: grouping several content units, sharing various semantic features, inside a single linguistic structure, in such a way that the shared features are maximally factored out and minimally repeated in the generated text.</Paragraph> <Paragraph position="3"> Defined as above, aggregation is essentially a key subtask of sentence planning. As such, aggregation choices are constrained by discourse planning decisions and they in turn constrain lexical choices.</Paragraph> <Paragraph position="4"> In HYSSOP, aggregation is carried out by the sentence planner in three steps: 1. content factorization, which is performed on a tabular data structure called a Factorization Matrix (FM) ; 2. generation from the FM of a discourse tree representing the hypertext plan to pass down to the lexicalizer; 3. top-down traversal of the discourse tree to detect content units with shared features occurring in non-adjacent sentences and</Paragraph> <Section position="1" start_page="126" end_page="127" type="sub_section"> <SectionTitle> 2.1 Content faetorization i,iHYSSOP </SectionTitle> <Paragraph position="0"> The key properties of the factorization matrix that sets it apart from previously proposed data structures on which to perform aggregation are that: (r) it fully abstracts from lexical and syntactic information; q. ~...it. focuses, on, ,two =types,:ofAnformation. kept separate in most generators, (1) the semantic features of each sentence constituent (generally represented only before lexicalization), and (2) the linear precedence constraints between them (generally represented only late during syntactic realization); (r) it visually captures the interaction between the two, which underlies the factorization phenomenon at the core of aggregation.</Paragraph> <Paragraph position="1"> In HYSSOP, the sentence planner receives as input from the discourse planner an FM representing the yet unaggregated content to be conveyed, together with an ordered list of candidate semantic dimensions to consider for outermost factoring. The pseudo-code of HYSSOP's aggregation algorithm is given in Fig. 10. We now illustrate this algorithm on the input example FM that appears inside the bold sub-frame of the overall HYSSOP input given in Fig. 3. For this example, we assume that the discourse planner directive is to factor out first the exception dimension, followed by the product dimension, i.e., FactoringStrategy = \[except,product\]. This example illustrates the mixed initiative choice of the aggregation strategy: part of it is dictated by the discourse planner to ensure that aggregation will not adversely affect the high-level textual organization that it carefully planned.</Paragraph> <Paragraph position="2"> The remaining part, in our example factoring along the place and time dimensions, is left to annotate them as anaphora.</Paragraph> <Paragraph position="3"> Such annotations are then used by the lexicalizer to choose the appropriate cue word to insert near or in place of the anaphoric item. - : :-:. : the:initiative.~f'~the:-sentence planner. The. first step of HYSSOP's aggregation algorithm is to shift the priority dimension D of the factoring strategy to the second leftmost column of the FM. The second step is to sort the FM rows in (increasing or decreasing) order of their D cell values. The third step is to horizontally slice the * FM into row groups withidentical D cellvalues. The fourth step is to merge these identical cells and annotate the merged cell with the number of cells that it replaced. The FM resulting from these four first steps on the input FM inside the bold sub-frame of Fig. 3 using exception as factoring dimension is given in Fig. 6.</Paragraph> <Paragraph position="4"> * The fifth step consists,.,oPreetlrsi~vely'eaHingthe entire aggregation algorithm inside each row group on the sub-FM to the right of D, using the remaining dimensions of the factoring strategy. Let us now follow one such recursive call: the one on the sub-FM inside a bold sub-frame in Fig. 6 to the right of the exception column in the third row group. The result of the first four aggregation steps of this recursive call is given in Fig. 7. This time it is the product dimension that has been left-shifted and that provided the basis for row sorting, row grouping and cell merging. Further recursive calls are now triggered. These calls are different from the preceding ones, however, in that at this point all the input constraints provided by the discourse planner have already been satisfied. It is thus now up to the sentence planner to choose along which dimension to perform the next factorization step. In the current implementation, the column with the lowest number of distinct values is always chosen. In our example, this translates as factoring along the time dimension for some row groups and along the space dimension for the others. The result of the recursive aggregation call on the sub-FM inside the bold frame of Fig. 7 is given in Fig. 8. In this case, factoring occurred along the time dimension. The fully aggregated FM its cells reflects exactly the left to right embedding of the phrases in the natural language summary of Fig. 4 generated from it.</Paragraph> </Section> <Section position="2" start_page="127" end_page="129" type="sub_section"> <SectionTitle> 2.2 Cue word generation in HYSSOP </SectionTitle> <Paragraph position="0"> Once content factorization is completed, the sentence planner builds in two passes the discourse tree that the lexicalizer expects as input. In the first pass. the sentence planner patterns the recursive structure of the tree (that itself-prefigures the output-text linguistic constituent structure) after the left to right and narrowing embedding of sub-matrices inside the FM.</Paragraph> <Paragraph position="1"> cell except Fig. 6 - Left shift, row grouping and cell merging along the exception dimension</Paragraph> <Paragraph position="3"> Fig. 9 - Final fully aggregated FM after all recursive calls In the second pass, the sentence planner traverses this initial discourse tree to enrich it with anaphoric annotations that the lexicalizer needs to generated cue words such as &quot;again&quot;, &quot;both&quot;, &quot;neither&quot;, &quot;except&quot; etc. Planning cue planner output discourse tree built formthe .....</Paragraph> <Paragraph position="4"> aggregated FM of Fig. 9 is given in Fig. 12. The discourse tree spans horizontally with .its root to the left of the feature structure and its leaves to the right. Note in Fig. 12 the cue word directive: \[anaph=loccur=2 ~a, repeated=\[product, region\]\]\]. It indicates that this is the second mention in the text of a content unit with produc( .= .&quot;Birch words can be considered?art--of-,aggregation ......... Beer&quot;~afrd:~regiow=:tiation.~T-heqexica:i~zer~useg .... since it makes the aggregation structures explicit this annotation to generate the cue word &quot;again&quot; to the reader and prevents ambiguities that may before the second reference to &quot;nationwide otherwise be introduced by aggressive content Birch Beer sales&quot;. factorization. A fragment of the sentence factor(Matrix, FactoringStrategy) variables: Matrix = a factorization matrix end.</Paragraph> <Paragraph position="5"> buildFactoringStrategy(Matrix): returns inside a list a pair (Dim, increasing) where Dim is the matrix's dimension (i.e., column) with the lowest number of distinct values.</Paragraph> <Paragraph position="6"> leftShiftColumn (Matrix, Dim1): moves Dirn I to the second leftrnost column next to the cell id co/urnn. sortRows(Matrix, Diml,0rder): sorts the Matrix's rows in order of their Dim1 cell value; Order specifies whether the order should be increasing or decreasing.</Paragraph> <Paragraph position="7"> horizSlice(Matrix, Dim 1): horizontally slices the Matrix into row groups with equal value along Dim I. rnergeCetls(RowGroup,Diml): merges (by definition equal valued) cells of Dim1 in RowGroup. cut(RowGroup,Diml): cuts RowGroup into two sub-rnatrices, one to the/eft of Dim1 (including Dim1) and the other to the right of Dim1 paste(LeftSubMatrix, FactoredRightSubMatrix, Diml): pastes together/eft and right sub-matrices. update(Matrix, RowGroup): identifies the rows R~ of Matrix whose cell ids match those of RowGroup RG and substitute those RM by RG inside Matrix Fig. 10 - HYSSOP's aggregation algorithm A special class of aggregation-related cue mentioning the group's cardinal. An example phrases involves not only the sentence planner summary front page generated using such a and the lexicalizer but also the discourse strategy is given in Fig. 11. The count annotation planner. One discourse strategy option that in the cell merging function of HYSSOP's HYSSOP implements is to precede each aggregation algorithm are computed for that aggregation group by a cue phrase explicitly purpose. While the decision to use an explicit count discourse strategy lies within the discourse planner, the counts are computed by the sentence planner and their realization as cue phrases are carried out by the lexicalizer.</Paragraph> <Paragraph position="8"> Last year, there were 13 exceptions in the beverage product line.</Paragraph> <Paragraph position="9"> The most striking was Birch Beer's 42% national fall from Sep to Oct. The remaining exceptions clustered around four products were: * Again, Birch Beers sales accounting for other two national exceptions, both decreasing mild values: 1. a 12% from Jun to Jul; 2. a 10% from Nov to Dec; deg .Cola's sales accountingofor.four.exceptions: ......... ......... &quot; : &quot; ........... : .......... - ' - . ..... = ...... 1. two medium in Colorado, a 40% from Jul to Aug and a 32% from Aug to Sep; 2. two mild, a 11% in Wisconsin from Jul to Aug and a 30% in Central region from Aug to Sep; deg Diet Soda accounting for 5 exceptions: 1. one strong, a 40% slump in Eastern region from Jul to Aug; 2. one medium, a 33% slump in Eastern region from Sep to Oct; 3. three mild: two increasing, a 10 % in Eastern region from Aug to Sep and a 19% in Southern region from Jul to Aug; and one falling, a 17% in Westem region from Aug to Sep; (r) Finally, Jolt Cola's sales accounting for one mild exception, a 6% national fall from Aug to Sep. Fig. 11 HYSSOP's frontpage output using discourse strategy with explicit counts cat = aggr, level =1, ngroup =2, nmsg =2 common I Exceptionallity = high %% The most atypical sales variations from one moth to the next occurred = I for distinct = I cat =msg, attr =\[product =&quot;Birch beer&quot;, time =9, place =nation, vat=+42\] I %%Birch Beer with a 42% national increase from Sept to Oct cat =msg, attr =\[product =&quot;Diet Soda&quot;, time =7, place -=east, var=-40\] %%Diet Soda with a 40% decrease in the Eastern region from Jul to Aug cat =aggr, level=l, ngroup=2, nmsg=3 common I exceptionallity = medium %%At next level of idiosyncrasy came: = I distinct = \] cat =aggr, level =2, ngroup =2, nmsg=2, I common /prdegduct=Cdegla' place=Colorado %% Cola's sales = distinct = i I cat=msg, attr=\[time=7, var =-40\] %% failing 40% from Jun to Jul I I cat=msg, attr=\[time=9 var =-32 %% and then a further32 from Sep to Oct l cat =msg, attr =\[product =&quot;Diet Soda&quot;, time =9, place =east, var=-33 anaph \[occurr =2nd, repeated=\[product, place\] %% again Diet Soda Eastern sales, falling 33% from Sep to Oct I cat =aggr .... %% Less aberrant but still notably a~/pical were: ... Fig. 12 - Fragment of LIFE feature structure representing the discourse tree output of the sentence planner and input to the lexicalizer.</Paragraph> </Section> </Section> class="xml-element"></Paper>