XML Viewer - p01-1023

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/p01-1023_metho.xml
Size: 16,281 bytes
Last Modified: 2025-10-06 14:07:40
<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1023">
  <Title>Empirically Estimating Order Constraints for Content Planning in Generation</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Our data
</SectionTitle>
    <Paragraph position="0"> Our research is part of MAGIC (Dalal et al., 1996; McKeown et al., 2000), a system that is designed to produce a briefing of patient status after undergoing a coronary bypass operation. Currently, when a patient is brought to the intensive care unit (ICU) after surgery, one of the residents who was present in the operating room gives a briefing to the ICU nurses and residents. Several of these briefings were collected and annotated for the aforementioned evaluation. The resident was 2These units can be loosely related to the concept of messages in (Reiter and Dale, 2000).</Paragraph>
    <Paragraph position="1"> equipped with a wearable tape recorder to tape the briefings, which were transcribed to provide the base of our empirical data. The text was subsequently annotated with semantic tags as shown in Figure 1. The figure shows that each sentence is split into several semantically tagged chunks.</Paragraph>
    <Paragraph position="2"> The tag-set was developed with the assistance of a domain expert in order to capture the different information types that are important for communication and the tagging process was done by two non-experts, after measuring acceptable agreement levels with the domain expert (see (McKeown et al., 2000)). The tag-set totalled over 200 tags. These 200 tags were then mapped to 29 categories, which was also done by a domain expert.</Paragraph>
    <Paragraph position="3"> These categories are the ones used for our current research.</Paragraph>
    <Paragraph position="4"> From these transcripts, we derive the sequences of semantic tags for each transcript. These sequences constitute the input and working material of our analysis, they are an average length of 33 tags per transcript (min = 13, max = 66, = 11:6). A tag-set distribution analysis showed that some of the categories dominate the tag counts.</Paragraph>
    <Paragraph position="5"> Furthermore, some tags occur fairly regularly towards either the beginning (e.g., date-of-birth) or the end (e.g., urine-output) of the transcript, while others (e.g., intraop-problems) are spread more or less evenly throughout.</Paragraph>
    <Paragraph position="6"> Getting these transcripts is a highly expensive task involving the cooperation and time of nurses and physicians in the busy ICU. Our corpus contains a total number of 24 transcripts. Therefore, it is important that we develop techniques that can detect patterns without requiring large amounts of data.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Methods
</SectionTitle>
    <Paragraph position="0"> During the preliminary analysis for this research, we looked for techniques to deal with analysis of regularities in sequences of finite items (semantic tags, in this case). We were interested in developing techniques that could scale as well as work with small amounts of highly varied sequences.</Paragraph>
    <Paragraph position="1"> Computational biology is another branch of computer science that has this problem as one topic of study. We focused on motif detection techniques as a way to reduce the complexity of the overall setting of the problem. In biological  terms, a motif is a small subsequence, highly conserved through evolution. From the computer science standpoint, a motif is a fixed-order pattern, simply because it is a subsequence. The problem of detecting such motifs in large databases has attracted considerable interest in the last decade (see (Hudak and McClure, 1999) for a recent survey). Combinatorial pattern discovery, one technique developed for this problem, promised to be a good fit for our task because it can be parameterized to operate successfully without large amounts of data and it will be able to identify domain swapped motifs: for example, given a-b-c in one sequence and c-b-a in another.</Paragraph>
    <Paragraph position="2"> This difference is central to our current research, given that order constraints are our main focus.</Paragraph>
    <Paragraph position="3"> TEIRESIAS (Rigoutsos and Floratos, 1998) and SPLASH (Califano, 1999) are good representatives of this kind of algorithm. We used an adaptation of TEIRESIAS.</Paragraph>
    <Paragraph position="4"> The algorithm can be sketched as follows: we apply combinatorial pattern discovery (see Section 3.1) to the semantic sequences. The obtained patterns are refined through clustering (Section 3.2). Counting procedures are then used to estimate order constraints between those clusters (Section 3.3).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Pattern detection
</SectionTitle>
      <Paragraph position="0"> In this section, we provide a brief explanation of our pattern discovery methodology. The explanation builds on the definitions below: hL;Wi pattern. Given that represents the semantic tags alphabet, a pattern is a string of the form ( j?) , where ? represents a don't care (wildcard) position. The hL;Wi parameters are used to further control the amount and placement of the don't cares: every subsequence of length W; at least L positions must be filled (i.e., they are nonwildcards characters). This definition entails that L W and also that a hL;Wi pattern is also a hL;W + 1i pattern, etc.</Paragraph>
      <Paragraph position="1"> Support. The support of pattern p given a set of sequences S is the number of sequences that contain at least one match of p. It indicates how useful a pattern is in a certain environment. null Offset list. The offset list records the matching locations of a pattern p in a list of sequences. They are sets of ordered pairs, where the first position records the sequence number and the second position records the offset in that sequence where p matches (see Figure 3).</Paragraph>
      <Paragraph position="2"> Specificity. We define a partial order relation on the pattern space as follows: a pattern p is said to be more specific than a pattern q if: (1) p is equal to q in the defined positions of q but has fewer undefined (i.e., wildcards) positions; or (2) q is a substring of p. Specificity provides a notion of complexity of a pattern (more specific patterns are more complex). See Figure 4 for an example.</Paragraph>
      <Paragraph position="3"> Using the previous definitions, the algorithm reduces to the problem of, given a set of sequences, L, W, a minimum windowsize, and a support pattern: AB?D</Paragraph>
      <Paragraph position="5"> threshold, finding maximal hL;Wi-patterns with at least a support of support threshold. Our implementation can be sketched as follows: Scanning. For a given window size n, all the possible subsequences (i.e., n-grams) occurring in the training set are identified. This process is repeated for different window sizes.</Paragraph>
      <Paragraph position="6"> Generalizing. For each of the identified subsequences, patterns are created by replacing valid positions (i.e., any place but the first and last positions) with wildcards. Only hL;Wi patterns with support greater than support threshold are kept. Figure 5 shows an example.</Paragraph>
      <Paragraph position="7"> Filtering. The above process is repeated increasing the window size until no patterns with enough support are found. The list of identified patterns is then filtered according to specificity: given two patterns in the list, one of them more specific than the other, if both have offset lists of equal size, the less specific one is pruned3. This gives us the list of maximal motifs (i.e. patterns) which are supported by the training data.</Paragraph>
      <Paragraph position="8"> 3Since they match in exactly the same positions, we prune the less specific one, as it adds no new information.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Clustering
</SectionTitle>
      <Paragraph position="0"> After the detection of patterns is finished, the number of patterns is relatively large. Moreover, as they have fixed length, they tend to be pretty similar. In fact, many tend to have their support from the same subsequences in the corpus. We are interested in syntactic similarity as well as similarity in context.</Paragraph>
      <Paragraph position="1"> A convenient solution was to further cluster the patterns, according to an approximate matching distance measure between patterns, defined in an appendix at the end of the paper.</Paragraph>
      <Paragraph position="2"> We use agglomerative clustering with the distance between clusters defined as the maximum pairwise distance between elements of the two clusters. Clustering stops when no inter-cluster distance falls below a user-defined threshold.</Paragraph>
      <Paragraph position="3"> Each of the resulting clusters has a single pattern represented by the centroid of the cluster. This concept is useful for visualization of the cluster in qualitative evaluation.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Constraints inference
</SectionTitle>
      <Paragraph position="0"> The last step of our algorithm measures the frequencies of all possible order constraints among pairs of clusters, retaining those that occur often enough to be considered important, according to some relevancy measure. We also discard any constraint that it is violated in any training sequence. We do this in order to obtain clear-cut constraints. Using the number of times a given constraint is violated as a quality measure is a straight-forward extension of our framework. The algorithm proceeds as follows: we build a table of counts that is updated every time a pair of patterns belonging to particular clusters are matched.</Paragraph>
      <Paragraph position="1"> To obtain clear-cut constraints, we do not count overlapping occurrences of patterns.</Paragraph>
      <Paragraph position="2"> From the table of counts we need some relevancy measure, as the distribution of the tags is skewed. We use a simple heuristic to estimate a relevancy measure over the constraints that are never contradicted. We are trying to obtain an es-</Paragraph>
      <Paragraph position="4"> We normalize with these counts (where x ranges over all the patterns that match before/after A or B):</Paragraph>
      <Paragraph position="6"> will in general yield different numbers. We use the arithmetic mean between both, e = (e1+e2)2 , as the final estimate for each constraint. It turns out to be a good estimate, that predicts accuracy of the generated constraints (see Section 4).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> We use cross validation to quantitatively evaluate our results and a comparison against the plan of our existing system for qualitative evaluation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Quantitative evaluation
</SectionTitle>
      <Paragraph position="0"> We evaluated two items: how effective the patterns and constraints learned were in an unseen test set and how accurate the predicted constraints were. More precisely: Pattern Confidence. This figure measures the percentage of identified patterns that were able to match a sequence in the test set.</Paragraph>
      <Paragraph position="1"> Constraint Confidence. An ordering constraint between two clusters can only be checkable on a given sequence if at least one pattern from each cluster is present. We measure the percentage of the learned constraints that are indeed checkable over the set of test sequences. null Constraint Accuracy. This is, from our perspective, the most important judgement. It measures the percentage of checkable ordering  constraints that are correct, i.e., the order constraint was maintained in any pair of matching patterns from both clusters in all the test-set sequences.</Paragraph>
      <Paragraph position="2"> Using 3-fold cross-validation for computing these metrics, we obtained the results shown in Table 1 (averaged over 100 executions of the experiment). The different parameter settings were defined as follows: for the motif detection algorithm hL;Wi = h2;3i and support threshold of 3. The algorithm will normally find around 100 maximal motifs. The clustering algorithm used a relative distance threshold of 3.5 that translates to an actual treshold of 120 for an average inter-cluster distance of 174. The number of produced clusters was in the order of the 25 clusters or so. Finally, a threshold in relevancy of 0.1 was used in the constraint learning procedure. Given the amount of data available for these experiments all these parameters were hand-tunned.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Qualitative evaluation
</SectionTitle>
      <Paragraph position="0"> The system was executed using all the available information, with the same parametric settings used in the quantitative evaluation, yielding a set of 29 constraints, out of 23 generated clusters.</Paragraph>
      <Paragraph position="1"> These constraints were analyzed by hand and compared to the existing content-planner. We found that most rules that were learned were validated by our existing plan. Moreover, we gained placement constraints for two pieces of semantic information that are currently not represented in the system's plan. In addition, we found minor order variation in relative placement of two different pairs of semantic tags. This leads us to believe that the fixed order on these particular tags can be relaxed to attain greater degrees of variability in the generated plans. The process of creation of the existing content-planner was thorough, informed by multiple domain experts over a three year period. The fact that the obtained constraints mostly occur in the existing plan is very encouraging. null</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Related work
</SectionTitle>
    <Paragraph position="0"> As explained in (Hudak and McClure, 1999), motif detection is usually targeted with alignment techniques (as in (Durbin et al., 1998)) or with combinatorial pattern discovery techniques such as the ones we used here. Combinatorial pattern discovery is more appropriate for our task because it allows for matching across patterns with permutations, for representation of wild cards and for use on smaller data sets.</Paragraph>
    <Paragraph position="1"> Similar techniques are used in NLP. Alignments are widely used in MT, for example (Melamed, 1997), but the crossing problem is a phenomenon that occurs repeatedly and at many levels in our task and thus, this is not a suitable approach for us.</Paragraph>
    <Paragraph position="2"> Pattern discovery techniques are often used for information extraction (e.g., (Riloff, 1993; Fisher et al., 1995)), but most work uses data that contains patterns labelled with the semantic slot the pattern fills. Given the difficulty for humans in finding patterns systematically in our data, we needed unsupervised techniques such as those developed in computational genomics.</Paragraph>
    <Paragraph position="3"> Other stochastic approaches to NLG normally focus on the problem of sentence generation, including syntactic and lexical realization (e.g., (Langkilde and Knight, 1998; Bangalore and Rambow, 2000; Knight and Hatzivassiloglou, 1995)). Concurrent work analyzing constraints on ordering of sentences in summarization found that a coherence constraint that ensures that blocks of sentences on the same topic tend to occur together (Barzilay et al., 2001). This results in a bottom-up approach for ordering that opportunistically groups sentences together based on content features. In contrast, our work attempts to automatically learn plans for generation based on semantic types of the input clause, resulting in a top-down planner for selecting and ordering content.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML