File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0306_metho.xml

Size: 17,244 bytes

Last Modified: 2025-10-06 14:07:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0306">
  <Title>A Transformational-based Learner for Dependency Grammars in Discharge Summaries</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Dependency Grammars
</SectionTitle>
    <Paragraph position="0"> One approach to semantic categorization is the use of syntactic features (Kokkinakis, 2001).</Paragraph>
    <Paragraph position="1"> This is based on the assumption that lexemes that share similar syntactic relations to other lexemes in the corpus will be semantically similar (Dorr, 2000). The idea of clustering words based on syntactic features has been well investigated in general language (Pereira, 1993; Li, 1998) However, (Harris, 1991) states that the syntactic relationships are more well-defined and have less variation in scientific languages (sublanguages), such as the ones used in medical texts. Identifying word classes using syntactic relationships should be simpler and potentially more useful in these types of languages.</Paragraph>
    <Paragraph position="2"> Dependency grammars (Hudson, 1991) generate parses where words in a sentence are related directly to the word which is its syntactic head. Each word, except for the root has exactly one head, and the structure is a tree. The analysis does not generate any intermediate syntactic structures. Figure 1 shows an example of a sentence with a dependency grammar parse.</Paragraph>
    <Paragraph position="3"> There has been interest in learning dependency grammars from corpora. Collins (Collins, 1996) used dependencies as the backbone for his probabilistic parser and there has been work on learning both probabilistic (Carroll, 1992; Lee, 1999; Paskin, 2001) and transformation based dependency grammars (Hajic, 1997).</Paragraph>
    <Paragraph position="4"> There are a number of attributes of dependency grammars which make them ideal for our goal of investigating medical sublanguage. First, the semantics of a word are often defined by a feature space of related words. The head-dependent relationships generated by a dependency parse can be used as the relationship for acquisition. Second, dependency grammars may be a better fit for parsing medical text. Medical text is frequently Association for Computational Linguistics.</Paragraph>
    <Paragraph position="5"> the Biomedical Domain, Philadelphia, July 2002, pp. 37-44. Proceedings of the Workshop on Natural Language Processing in include telegraphic omissions, run-on structures, improper use of conjunctions, left attaching noun modifiers etc (Sager, 1981). In many cases, many traditional phrase structures are absent or altered, making a phrase structure parse using traditional production rules difficult. A dependency grammar may still capture useful syntactic relationships when an accurate phrase grammar parse is not possible. In this way, a dependency parse may be compared to a shallow parse, in that it can return a partial analysis. However, even with a shallow parser, we would still interested in the dependency relationships inside the chunks. Third, the syntactic grammar of medical English, specifically regarding discharge summaries, is simpler overall (Campbell, 2001). We are not interested so much in the labeling of intermediate syntactic structures, such as noun phrases and prepositional phrases. Dependency grammars may allow us to capitalize on the relative syntactic simplicity of medical language without the overhead of generating and identifying structures which will not be used.</Paragraph>
    <Paragraph position="6">  sentence &amp;quot;In general she was sleeping quietly.&amp;quot; The dependency grammar used in this experiment did not allow crossing dependencies (projectivity). Crossing dependencies are ones where the parent and child of a relationship are on opposite sides of a common ancestor.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Transformational Based
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Learning
</SectionTitle>
      <Paragraph position="0"> Transformational Based Learning (TBL) has been applied to numerous language learning problems, including part-of-speech tagging (Brill, 1994) , and parsing (Florian, 1998). It also has been used for learning dependency grammars (Hajic, 1997). In general, TBL algorithms generate smaller rule sets and require less training material than probabilistic approaches. Brill produced a part-of-speech tagger which was comparable in accuracy to other tagging methods.</Paragraph>
      <Paragraph position="1"> In language, the general paradigm for TBL is to generate logical rules which apply transformations to the text. The training text is first annotated with the goal state. In this case, the sentences would be assigned a dependency parse. An initial state annotator is then applied to an unannotated copy of the text. For example, a right branching dependency tree was used in our experiment as the initial state (compare figure 1 and figure 2). The goal of TBL is to then generate rules which transform the naive training state into the goal state. In order to do so, the TBL algorithm will have templates which describe the environment in the training corpus where a transformation can occur. The algorithm also has a scoring function which allows the comparison of the training state to the goal state. After iterating through the training corpus and testing all combinations of templates and transformations, the paired template and transformation which has the highest score becomes a rule. In other words, the best rule is the one which results in a corpus closest to the goal state after applying the transformation at the locations indicated by the template. This best rule is applied to the training corpus to produce a refined corpus. The process is then repeated, using the refined corpus as the training corpus, until no more positively scoring rules are produced. The final product is an ordered set of rules which can be applied to any unannotated corpus.</Paragraph>
      <Paragraph position="2">  sentence &amp;quot;In general she was sleeping quietly.&amp;quot; TBL is a good choice for learning a dependency grammar of medical language.</Paragraph>
      <Paragraph position="3"> Assigning dependency heads is a task that is similar to part-of-speech tagging; each word in the text has exactly one dependency head, represented by the index of the head word.</Paragraph>
      <Paragraph position="4"> Transformations to this representation consist of</Paragraph>
      <Paragraph position="6"> changing a word's dependency head from one word to another.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="2" type="metho">
    <SectionTitle>
4 The Learning Algorithm
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
4.1 Template Design
</SectionTitle>
      <Paragraph position="0"> In TBL, transformations occur when a specific environment in the text is found. These environments, or triggers, are defined by the proximal relationship of two or more parts of speech within a sentence. For example, in Brill's early work with POS tagging, one trigger was the existence of another specific POS tag immediately preceding the one to be transformed. The triggers, therefore, compose the 'if' component of the 'if-then' transformational rules.</Paragraph>
      <Paragraph position="1"> When considering what triggers would be appropriate for dependency grammars, it was noted that many arcs in the grammar span a number of words. For example, the arc between a verb and the head of a noun phrase may span many words, especially in medical narratives where noun phrases can be especially lengthy.</Paragraph>
      <Paragraph position="2"> In previous attempts to parse language using TBL templates, the triggers have been tokens in the vicinity of the token to be transformed.</Paragraph>
      <Paragraph position="3"> While this has been successful for POS tagging, where the context necessary to correctly transform the tag may be found within two or three surrounding tokens, the distance of some dependency relationships can be much greater.</Paragraph>
      <Paragraph position="4"> In order to capture long distance relationships explicitly in a trigger, it would be necessary to expand the vicinity to be searched.</Paragraph>
      <Paragraph position="5"> In the case of a dependency grammar parse, words are related to each other not only through their left-to-right arrangement, but also through the dependency tree. We sought to design triggers that take advantage of the dependency tree itself. Using the dependency relationships directly in the trigger is in the spirit of TBL where learning must change the triggering environments in the corpus from one iteration to the next. For example, in the case of POS tag learning, newly learned POS tags are used in subsequent iterations of the algorithm as triggers. Similarly, by using the dependency relationship directly in the trigger, we would expect the learner to capitalize on parse improvements through the learning process.</Paragraph>
      <Paragraph position="6"> Each trigger used in this experiment had six parameters, which defined the vicinity around a target token, summarized in figure 3. Triggers can search using solely word distance, tree distance, or a combination of both. Any template can have multiple triggers, requiring  The parameters of direction and distance are self-explanatory. Scope defines whether or not the triggering token must be exactly at the location defined by the distance, or within that distance. The third setting for scope is a special case. If the scope is set to all the template will search all tokens in the direction set, regardless of distance (e.g. if the tree direction is set to left and the scope is set to all, the trigger will match all tokens to the left, regardless of distance). Trigger parameters  1. Word distance 2. Word direction (left, right, either) 3. Word scope (exactly at, within, all) 4. Tree distance 5. Tree direction (parent, child, either) 6. Tree scope (exactly at, within, all)  Two examples of triggers are given in figure 3. In both cases the triggers are searching for elements near token x which meet the correct criteria. In the first example, the trigger criteria will be met by any token within the shaded area of the tree, those tokens which are either one or two tokens to the right of x and are descendents of x with a tree distance of one. The second trigger will match a single token, shown as a black circle, that is exactly two tokens to the right of x and is also an ancestor of tree distance two.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.2 Transformations
</SectionTitle>
      <Paragraph position="0"> The second principal component of a TBL rule is the transformation, which defines a change to the structure of the sentence. For example, in the case of POS tagging, the transformation would be to change POS tag x to POS tag y. When TBL has been applied to parsing, the transformations have been on bracketed parse trees and have added or deleted brackets in a balanced method. Where the transformations seem intuitive for POS tagging, they are not as transparent for parsing. A rule for POS tagging may read, &amp;quot;If tag x is DT and tag y immediately to the right is VB, change tag y to NN.&amp;quot; (see figure 4) This makes sense, for we do not expect verbs to immediately trail determiners, and transforming the verb to a noun would likely correct an error. A rule for parsing may read &amp;quot;If a bracket is immediately left of NN, delete a bracket to the left of the NN.&amp;quot; This rule will combine a phrase which has a noun as the left-most component with the phrase which covers it. While this makes some sense, as many phrases do not have nouns as their left-most component, there are also many phrases which do. The linguistic motivation behind the transformation is not immediately obvious.</Paragraph>
      <Paragraph position="1"> We wanted to give our transformations the intuitive readability of the rules seen in the POS tagging rules. In the case of our dependency grammar, we wanted our transformations to describe changes made directly to the tree. We considered four ways in which one token in the tree could be moved in relation to another outlined in figure 5. All four of the transformations decompose to the first transform. These transformations make intuitive sense for dependency grammars. We want to identify tokens in the text which are in the incorrect tree configuration and transform the tree by changing the dependency relationships.</Paragraph>
      <Paragraph position="2"> For example, the transformations &amp;quot;Make a noun the child of a verb&amp;quot; or &amp;quot;Make adjectives siblings of each other&amp;quot; are both readable in English and are linguistically reasonable.</Paragraph>
      <Paragraph position="3">  Some transformations are disallowed in the special case that the root node is involved. The root node has no parent and can have no siblings and therefore transformations which would create these circumstances are not allowed. The shape of the dependency tree is restricted in other ways as described above, in that the trees have no crossing dependencies. These restrictions are not enforced by the transformations and it is possible that they could generate trees that violate these restrictions.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.3 Rule Scoring
</SectionTitle>
      <Paragraph position="0"> At every iteration, it is necessary to evaluate the goodness of the parse that results from the application of all tested rules. The rule which produces the best parse for that iteration is the one that is chosen and applied before continuing on to the next iteration. A number of measures for measuring parsing accuracy have been established, including bracketing sensitivity and specificity. Parsing accuracy for dependency  grammars is often measured as a function of the number of tokens which have the correct ancestors, or dependency accuracy. Keeping our goal of generating word-modifier pairs for subsequent machine learning, we chose an aggressive scoring function, counting only correct parent-child relationships. This also keeps the scoring function as simple as possible. Dependency grammar transformations 1. Make x the child of y 2. Make x the parent of y 3. Make x the sibling of y keeping x's parent 4. Make x the sibling of y keeping y's parent</Paragraph>
    </Section>
    <Section position="4" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.4 The Algorithm
</SectionTitle>
      <Paragraph position="0"> The general design of TBL algorithms has been well described (Brill, 1994). The essential components, outlined above, include the template design, the transformations used, and the scoring system. The initial state of the dependency tree is the right branching tree shown in figure 2. To improve efficiency, we use the indexed TBL method outlined by Ramshaw and Marcus (Ramshaw, 1994). Rules have pointers to the sentences to which they apply, and similarly each sentence has pointers to the rules which have applied to it in the past. Rules are held on a heap based on their score, allowing the best rule to be found immediately after each iteration. The rule is applied to the list of sentences to which it points, and this list is used in the next iteration so no sentences which have not been modified need be seen.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="2" end_page="2" type="metho">
    <SectionTitle>
5 Methods
</SectionTitle>
    <Paragraph position="0"> A corpus of 1000 sentences (16,949 words) of text from medical discharge summaries was split into a training set of 830 sentences (13,954 words) and a test set of 170 sentences (2,995 words). The entire corpus was first POS tagged using a tagger trained specifically for discharge summaries (Campbell, 2001). The corpus was then hand parsed with a dependency grammar, and the TBL learner was allowed to learn rules on the training set. The sentences in the corpus were not restricted by length. Three sets of increasingly complex templates were used to learn rules, summarized in figure 6.</Paragraph>
    <Paragraph position="1">  1. Word distance: 2. Word direction: 3. Word scope: 4. Tree distance: 5. Tree direction: 6. Tree scope: 1, 2, or 3  left, right, or either exactly at, within, or all</Paragraph>
    <Paragraph position="3"> 1. Word distance: 2. Word direction: 3. Word scope: 4. Tree distance: 5. Tree direction: 6. Tree scope:</Paragraph>
    <Paragraph position="5"> child, parent or either exactly at, within, or all  1. Word distance: 2. Word direction: 3. Word scope: 4. Tree distance: 5. Tree direction: 6. Tree scope: all of set 1, 2, and. . . 1,2 or 3 left, right, or either exactly at, within, or all 1, 2, or 3 child, parent or either exactly at, within, or all</Paragraph>
    <Paragraph position="7"/>
  </Section>
class="xml-element"></Paper>
Download Original XML