File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2093_metho.xml

Size: 17,374 bytes

Last Modified: 2025-10-06 14:07:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2093">
  <Title>Planning texts by constraint satisfaction</Title>
  <Section position="5" start_page="642" end_page="642" type="metho">
    <SectionTitle>
3 tlaragraph
4 section
</SectionTitle>
    <Paragraph position="0"> The meanings of 'section' and :paragral)h' are the usual ones, excellt that section titles are ignored: a section is simt)ly a sequence of one or more I)aragraphs. Following Nunberg (1990), :text-sentenee' denotes a unit normally Imnetuated with a capital letter and a flfll stop; this is distinguished froin the syntactic concept of 'senten('e', which depends on syntactic formation rules. Thus the following paragraph consists of three text-sentences which contain, respectively, one, zero, and two syntactic sentences: He entered the room. l)isaster. The safe was ol)en and the money had gone.</Paragraph>
    <Paragraph position="1"> A (;ext-clause is a unit that would nornmlly be Imnetuated with a semicolon; the text-sentence you are now reading contains two text-clauses, but tile seeend semicolon does not appear because it has been 'absorbed' into the flfll-stop that marks the whole text-sentence. Wil;hin a text-clause, hierarchy is determined by syntax rather than text-structm'e, so all units within a text-clause are assigned the minimal TI~XT-LF, VEI~ of zero.</Paragraph>
    <Paragraph position="2"> The tmrt)ose of INDENTATION is to allow indented text structures like lmlteted lists; the feature takes values in the range 0, 1, 2 ..., where unindented text has INDI,~NTATION ~ ()~ * a list item has INI)ENTATION = l * a list item within a list item has 1NI)ENTA-</Paragraph>
  </Section>
  <Section position="6" start_page="642" end_page="642" type="metho">
    <SectionTitle>
TION ---~ 2
</SectionTitle>
    <Paragraph position="0"> and so forth. To siml)lit~y the presentation, we will assume for now that all nodes have INI)ENTAT1ON = 0, so that text-categories are distinguished only by TI~XT-I,EVEI,.</Paragraph>
    <Paragraph position="1"> intormally, a text structure is well-formed if it resl)ects the hierarchy of textual levels, so that sections are coml/osed of paragraphs, i)aragraphs of textsentences, atl(l so forth. An examt)le of all ill-formed stru(:ture would be one in which a text-sentence, contained a paragrat)h; such a structure can occur only when the. paragrat)h is indented - - a possibility we are excluding here. Formally, the text-structure tbrmation rules are as follows:  1. A text structure is an ordered tree in which each node i has a TEX'F-LENq~,I, Li iIl the range O..LMox.</Paragraph>
    <Paragraph position="2"> 2. If a node p has a (laughter node d, then p must have a 'I'EXT-IA,;Xfl.;L Olle rallk higher than d, unless t)oth no(les have the minimal level 0. In other words, either (a) L v=L,;+l,or (b) L v = Ld = 0 (From this it follows that any nodes that are sisters must have the stone level.) 3. All terminal nodes must have tilt: minimal  TIqXT-LEVEI, of 0.</Paragraph>
    <Paragraph position="3"> In most al)l)lications it would also inake sense to set a lower limit on the root node. For instance, we might at)I)ly the constraint L,C/.oot _&gt; 2 to ensure that the whole text is at least a text-sentence.</Paragraph>
  </Section>
  <Section position="7" start_page="642" end_page="643" type="metho">
    <SectionTitle>
3 Compatibility
</SectionTitle>
    <Paragraph position="0"> As well as being a welM'ormed text structure, a candidate solution must realize a rhetorical structure 'correctly', in a sense that we need to mak(: precise.</Paragraph>
    <Paragraph position="1"> Roughly, a correct solution should satist~y three coilditions: null  1. The terminal nodes of the TS should express all tim elementary propositions in tile RS; they may also contain discourse connectives expressing rhetorical relations in tile RS, although for some relations discourse commctives are optional. null 2. The TS must respect rules of syntax when it combines propositions and discourse connectives within a text-clause; tbr instance, a conjunction such as 'but' linking two text-phrases must be coordinated with tile second one.</Paragraph>
    <Paragraph position="2"> 3. Tile TS must be structurally compatible with the RS.</Paragraph>
    <Paragraph position="3">  The first two conditions are straightforward, but what is meant by 'structural compatibility'? We suggest the crucial criterion should be as follows: any grouping of the elementary propositions in the TS must also occur in the RS. In other words, the text-strncturer is allowed to eliminate groupings, but not to add any. More formally: * If a node in tile TS dominates terminal nodes expressing a set of elementary propositions, there nmst be a corresponding node in the RS dominating the same set of propositions.</Paragraph>
    <Paragraph position="4"> * Tile converse does not hold: for instance, an RS of the form R1(R2(pi,p2),p3) can be realized by a paragraph of three sentences, one for each proposition, even though this TS contains no node dominatillg the propositions (Pl and P2) that are grouped by R2. However, when this happens, the propositions grouped togettmr in the I7(S nmst remain consecutive in the TS; solutions in which Pa comes inbetween Pl and P2 are protfibited.</Paragraph>
  </Section>
  <Section position="8" start_page="643" end_page="644" type="metho">
    <SectionTitle>
4 Generating solutions
</SectionTitle>
    <Paragraph position="0"> Our procedure for generating candidate solutions is based on a technique for formulating text structuring as a constTvdnt satisfaction pTvblem (CSP) (Hentenryck, 1989). In general, a CSP is characterized by tim following elements:  * A set of variables V1..I/'N.</Paragraph>
    <Paragraph position="1"> * For each variable l/i, a finite domain Di of possible values.</Paragraph>
    <Paragraph position="2"> * A set of constraints on the wflues of the variables. (For integer domains these often use 'greater than' and 'less than'; other domains  usually rely on 'equal' or 'unequal'.) A solution assigns to each variable 17/ a value fl'om its domain Di while respecting all constraints. Depending on tile constraints, there may be multiple solutions, or there may be no solution at all. The difficulty in formulating a configuration task as a CSP is that we usually do not know in advance how many variables the solution will contain. Problems of this kind are sometimes called dynamic (Deehter and Dechter, 1988), because the set of relevant variables changes as the search for a solution progresses. The solution in figure 2, for examl)le, has nine TS nodes, each bearing a TEXT-LEVEL variable; different realizations of the same RS might have more nodes, or fewer. However, we have found that all candidate solutions can be generated by assigning four variables (TEXT-LEVEL, INI)ENTATION~ ORDER and CONNECTIVE) to each node of rhetorical structure, so obtaining a partial description that determines a unique TS. Intuitively, the idea is that this description should specify a subset of the nodes in the target TS; further nodes are then added, by a deterministic procedure, in order to satisfy the fornlation rules and accommodate any discourse con-nectives. null</Paragraph>
    <Paragraph position="4"> As an introduction to this nmthod, we will begin by working through a very simple example. Suppose that our aim is to find all TSs that realize the I{S in figure 3a in a paragraph, without using discourse connectives or indentation.</Paragraph>
    <Paragraph position="5"> Create solution variables The first step is to add TEXT-LEVEL and Oa-DER variables to each RS node. Since ORDER represents tile linear position of a text span in relation to its sisters, it can be omitted fi-om the root.</Paragraph>
    <Paragraph position="6"> Assign domains Each variable is assigned a finite domain of possine values (figure 3b). For TEXT-LEVEI. variables, tile donlain is O..LMax; for ORI)Ell variables it is 1..N, where N is the number of sisters. Since we have decided that the whole text  should be a paragrat)h, we can fix the TEXT-I,I.~VEL Oll the root directly (assigning it the wflue 3).</Paragraph>
    <Paragraph position="7"> Apply constraints Constraints over the solution variables are now applied. Informally, these are as follows: the root node should have a higher TEXT-LF.VEI. than its daughters; sister nodes should have the same vahms for TEXT-LIgVEL but different values {-'or ORDER; and since the 'cause' relation is not marked by a discourse connective, its arguments (the two prot)ositions) cannot be realized by text-t)hrases (the result would be syntactically ill-formed) --- in otlmr words, they must have TI~XT-LEVEL C/ 0. Collectively, these constraints reduce the TEXT-LEVEL domains for tim terminal nodes to {1,2}.</Paragraph>
    <Paragraph position="8"> Enumerate solutions The solutions can IIOW be enmnerated by computing all combinations of values that respect tile constraints. One example of a solution is shown ill figure 4a.</Paragraph>
    <Paragraph position="9"> Compute eomplete text structures For each solution, a complete TS can tie corntinted by adding any nodes that are required by the text-structure formation rules (figure 4b).</Paragraph>
    <Paragraph position="11"> In this simple case there are just four solutions, since tile TEXT-LEVI.3I, and ORI)EI/. variables oil the nucleus both have the domains {1,2}, and any setting of these variables fixes the corresponding variables on tile satellite. Here are texts that might result from the four solutions (L and O represent 'FEXT-LI,3VEL and ORDER; N and S represent nucleus and satellite):</Paragraph>
    <Paragraph position="13"> Elixir contains gestodene. It is banned by the FDA.</Paragraph>
    <Paragraph position="14"> The method for including discourse connectives has been described elsewhere (Power et al., 1999). Briefly, the lexical entry for a discourse connective must specify its syntactic category (at present we cover subordinating conjunctions, coordinating conjunctions and conjuuctivc adverbs) and whether it is realized Oil the nucleus or the satellite. For example, the relation cause can be marked by the subordinating conjunction 'since' (realized on tim satellite) or the coi\imlctive adverb 'consequently' (realized on the nucleus) -- among others. The choice of discourse connective strongly coustrains tile values of q'EXT-I~EVEL gild ORDI,~,R for tile arguments of tile relation. If cause is expressed by 'since', the argumerits may occur in any order, but they must be text-ptlrases: Since Elixir contains gestodene, it is banned by the FDA.</Paragraph>
    <Paragraph position="15"> Elixir is 1)armed by the FDA since it contains gestodene.</Paragraph>
    <Paragraph position="16"> #Elixir is banne.d by the FDA; since it contains gestodene.</Paragraph>
    <Paragraph position="17"> #Elixir is banned by the FI)A. Since i~ contains gestodene.</Paragraph>
    <Paragraph position="18"> If instead cause is expressed by :cousequently', the satellite nulst be placed before tile nucleus, and uuless tim style is very informal tile arguments should have TEXT-I,I~VEI, values above texl;-t)hrase: Elixir contains gestodene; consequently, it is brained by the FDA.</Paragraph>
    <Paragraph position="19"> Elixir is banned by the FDA. Consequently, it contains gestodene.</Paragraph>
    <Paragraph position="20"> ~Elixir is banned by the FDA, consequently it contains gestodene.</Paragraph>
  </Section>
  <Section position="9" start_page="644" end_page="646" type="metho">
    <SectionTitle>
5 Constraints
</SectionTitle>
    <Paragraph position="0"> We now state the text-structuring constraints precisely, including the feature CONNECTIVE but still onlitting INI)ENTATION. Before applying ttlese constraints, finite domains are assigned to each tlS node</Paragraph>
    <Paragraph position="2"> CONNECTIVE Oil the node cause (figure 3a) Ci = {~, since, consequently}; on a proposition node, Ci = 0. The value ~1 represents the option of using no discourse connective.</Paragraph>
    <Paragraph position="3"> As an example, possible domain assigmnents for figure 1 are shown in figure 5. The constraints are as follows:</Paragraph>
    <Section position="1" start_page="644" end_page="644" type="sub_section">
      <SectionTitle>
Root Domination
</SectionTitle>
      <Paragraph position="0"> The TEXT-LEVEL of the root node r must exceed that of any daughter d.</Paragraph>
      <Paragraph position="1"> L v &gt; Ld</Paragraph>
    </Section>
    <Section position="2" start_page="644" end_page="644" type="sub_section">
      <SectionTitle>
Parental Domination
</SectionTitle>
      <Paragraph position="0"> Tile TEXT-LEVEl, of&amp;quot; a parent node p Inust be equal to or greater than tile TIgXT-I,EVEL of any daughter d.</Paragraph>
    </Section>
    <Section position="3" start_page="644" end_page="644" type="sub_section">
      <SectionTitle>
Lp &gt;_ Ld
Sister Equality
</SectionTitle>
      <Paragraph position="0"> If nodes a and b are descended from the same parent, they must have the same TEXT-LEVEL.</Paragraph>
    </Section>
    <Section position="4" start_page="644" end_page="644" type="sub_section">
      <SectionTitle>
La =- Lb
Sister Order
</SectionTitle>
      <Paragraph position="0"> If nodes a and b are descended fi'om the same parent, they must have different values of ORDER. null O~ C/ O~,</Paragraph>
    </Section>
    <Section position="5" start_page="644" end_page="646" type="sub_section">
      <SectionTitle>
Argument Order
</SectionTitle>
      <Paragraph position="0"> If C v is a coordinating conjmlction or conjunctive adverb, the argument d (nucleus or satellite) on which the connective will be realised (according to its lexical entry) nmst have Od =</Paragraph>
      <Paragraph position="2"> The algorit:hm for completing the TS cannot be described fully here, trot as an exnmple we connnent Oll how the solution in figure 6a yields the TS ill figure fib.</Paragraph>
      <Paragraph position="3"> * If a parent is more than one level above its daughters (Lp - Ld &gt; 1), extra nodes are added beneath tim parent to bridge tile gap -- hence the paragraph node in figure 6b.</Paragraph>
      <Paragraph position="4"> * If a parent has the same level as its daughters (Lp = Ld), the daughters are raised to replace tim t)arent. Thus in figure 6b, the paragraph has three sentences, and a rhetorical grouping has been left unrealized iu the TS. Of course the reader might infer the intended RS from other evidence (e.g. semantic plausibility).</Paragraph>
      <Paragraph position="5"> * If a terminal node i has a level above text-phrase (Li &gt; 0), a chain of nodes is added to bring it 'down to earth' (e.g. the chain below the first text-sentence in figure 6b).</Paragraph>
      <Paragraph position="6">  * Discourse connectives are t)assed down to the text&gt;clause ill which they should be realized. This is decide(l (i) by l)assing the connective to the aI)l)ropriate argument (nucleus or satellite), according to its lexical entry, and (it) by thereafter 1)assing it; down to tile first constituent if the argument is complex (Power et al., 1999).</Paragraph>
      <Paragraph position="7"> After tactical generation, we might obtain tile following (rather poor) result: Elixir contains gestodene. Consequently, it; is banned by the FDA. Itowever, the FDA approves ElixirPhls.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="646" end_page="646" type="metho">
    <SectionTitle>
7 Style
</SectionTitle>
    <Paragraph position="0"> Having designed a procedure tha.t will generate all text structures meeting mininml standards of col rectness, we need to at)l)ly fllrther constraints ill order to eliminate solutions that are stylistically eccentric or at least ill-suited to the l)Url)ose at hand.</Paragraph>
    <Paragraph position="1"> In ICONOCI,AS'I', this call be done in two ways: * If a stylistic (lefect is regarded as fatal, it is exchlded 1)y a hard constraint on the sohltion variables, so that TSs with tiffs defect are never generated.</Paragraph>
    <Paragraph position="2"> * If a stylistic defect is regar(ted as non-fatal (i.e. unwelcome trot sometimes necessary), it is 1)enalize(1, by a sot;(; constraint, during a subsequent evaluation t)hase iil which the enunmrated solutions m'e ordered from best to worst.</Paragraph>
    <Paragraph position="3"> The user can iml)ose stylistic 1)retL'r(',n(:es by switching hard constraints on/off, and also by weigllting soft constraints (i.e. determining the imt)ortanc(~ of non-fat;al (lefects).</Paragraph>
    <Paragraph position="4"> We cannot discuss stylistic control in detail here, trot we will give onb. or two examples for each type of constraint.</Paragraph>
  </Section>
  <Section position="11" start_page="646" end_page="646" type="metho">
    <SectionTitle>
IIARI) CONSTI{AINTS
</SectionTitle>
    <Paragraph position="0"> Multil)le text-clauses: r\[k) obtain an infornml style without semicolons, senten('es c(mtaining more than one text-clause (:all l)e avoided by ilnt)osing the constraint Li C/ 1 on all nodes i.</Paragraph>
    <Paragraph position="1"> Nuelens-satellite order: For some rhetorical relations it in W l)e al)l)ropriate to fix the linear order of nucleus and satellite; for instmme, the satellite of a background relation shoul(1 precede the nucleus. This Call 1)C ensured by a constraint Os = 1. on the satellite node S.</Paragraph>
  </Section>
  <Section position="12" start_page="646" end_page="646" type="metho">
    <SectionTitle>
SOFT CONSTRAINTS
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
class="xml-element"></Paper>
Download Original XML