File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0307_metho.xml
Size: 17,276 bytes
Last Modified: 2025-10-06 14:14:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0307"> <Title>Tagging Grammatical Functions</Title> <Section position="4" start_page="0" end_page="511" type="metho"> <SectionTitle> 3 Annotation Tool </SectionTitle> <Paragraph position="0"> Since syntactic annotation of corpora is timeconsuming, a partially automated annotation tool has been developed in order to increase efficiency.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 The User Interface </SectionTitle> <Paragraph position="0"> For optimal human-machine interaction, the tool supports immediate graphical representation of the structure being annotated.</Paragraph> <Paragraph position="1"> Since keyboard input is most efficient for assigning categories to words and phrases, cf. (Lehmann et al., 1996; Marcus et al., 1994), and structural manipulations are executed most efficiently using the mouse, both an elaborate keyboard and optical interface is provided. As suggested by Robert MacIntyre 3, it is hout this paper.</Paragraph> <Paragraph position="2"> 2'Free' word order is a function of several interacting parameters such as category, case and topic-focus articulation. Varying the order of words in a sentence yields a continuum of grammaticality judgments rather than a simple right-wrong distinction.</Paragraph> <Paragraph position="3"> 3personal communication, Oct. 1996 most efficient to use one hand for structural commands with the mouse and the other hand for short keyboard input.</Paragraph> <Paragraph position="4"> By additionally offering online menus for commands and labels, the tool suits beginners as well as experienced users. Commands such as &quot;group words&quot;, &quot;group phrases&quot;, &quot;ungroup&quot;, &quot;change labels&quot;, &quot;re-attach nodes&quot;, &quot;generate postscript output&quot;, etc. are available.</Paragraph> <Paragraph position="5"> The three tagsets (word, phrase, and edge labels) used by the annotation tool are variable. They are stored together with the corpus, which allows easy modification and exchange of tagsets. In addition, appropriateness checks are performed automatically.</Paragraph> <Paragraph position="6"> Comments can be added to structures.</Paragraph> <Paragraph position="7"> Figure 2 shows a screen dump of the graphical interface.</Paragraph> </Section> <Section position="2" start_page="0" end_page="511" type="sub_section"> <SectionTitle> 3.2 Automating Annotation </SectionTitle> <Paragraph position="0"> Existing treebank annotation tools are characterised by a high degree of automation. The task of the annotator is to correct the output of a parser, i.e., to eliminate wrong readings, complete partial parses, and adjust partially incorrect ones.</Paragraph> <Paragraph position="1"> Since broad-coverage parsers for German, especially robust parsers that assign predicate-argument structure and allow crossing branches, are not available, or require an annotated traing corpus (cf. (Collins, 1996), (Eisner, 1996)).</Paragraph> <Paragraph position="2"> As a consequence, we have adopted a bootstrapping approach, and gradually increased the degree of automation using already annotated sentences as training material for a stochastic processing module.</Paragraph> <Paragraph position="3"> This aspect of the work has led to a new model of human supervision. Here automatic annotation and human supervision are combined interactively whereby annotators are asked to confirm the local</Paragraph> <Paragraph position="5"> Node no.: I J Zag: I IB I-&quot;&quot;' II &quot;-'deg~ I1-~-I I Switchin~ to sentence no, 4.,. Done. predictions of the parser. The size of such 'supervision increments' varies from local trees of depth one to larger chunks, depending on the amount of training data available.</Paragraph> <Paragraph position="6"> We distinguish six degrees of automation: 0) Completely manual annotation.</Paragraph> <Paragraph position="7"> 1) The user determines phrase boundaries and syntactic categories (S, NP, VP, ...). The program automatically assigns grammatical functions. The annotator can alter the assigned tags (cf. figure 3).</Paragraph> <Paragraph position="8"> 2) The user only determines the components of a new phrase (local tree of depth 1), while both category and function labels are assigned automatically. Again, the annotator has the option of altering the assigned tags (cf. figure 4). 3) The user selects a substring and a category, whereas the entire structure covering the sub-string is determined automatically (cf. figure 5). 4) The program performs simple bracketing, i.e., finds 'kernel phrases' without the user having to explicitly mark phrase boundaries. The task can be performed by a chunk parser that is equipped with an appropriate finite state grammar (Abney, 1996).</Paragraph> <Paragraph position="9"> 5) The program suggests partiM or complete parses. null A set of 500 manually annotated training sentences (step 0) was sufficient for a statistical tagger to reliably assign grammatical functions, provided the user determines the elements of a phrase and its category (step 1). Approximately 700 additional sentences have been annotated this way. Annotation efficiency increased by 25 %, namely from an average annotation time of 4 minutes to 3 minutes per sentence (300 to 400 words per hour). The 1,200 sentences were used to train the tagger for automation step 2. Together with improvements in the user interface, this increased the efficiency by another 33%, from approximately 3 to 2 minutes (600 words per hour). The fastest annotators cover up to das 1993 startende Bonusprogramm for Vielflieger</Paragraph> </Section> </Section> <Section position="5" start_page="511" end_page="511" type="metho"> <SectionTitle> ART CARD ADJA NN APPR NN </SectionTitle> <Paragraph position="0"> has marked das, the AP, Bonusprogramm, and the PP as a constituent of category NP, and the tool's task is to determine the new edge labels (marked with question marks), which are, from left to right, NK, NK, NK, MNR.</Paragraph> <Paragraph position="1"> das 1993 startende Bonusprogramm ffir Vielflieger</Paragraph> </Section> <Section position="6" start_page="511" end_page="511" type="metho"> <SectionTitle> ART CARD ADJA NN APPR NN </SectionTitle> <Paragraph position="0"> has marked das, the AP, Bonusprogramm and the PP as a constituent, and the tool's task is to determine the new node and edge labels (marked with question marks).</Paragraph> <Paragraph position="1"> 1000 words per hour.</Paragraph> <Paragraph position="2"> At present, the treebank comprises 3000 sentences, each annotated independently by two annotators. 1,200 of the sentences are compared with the corresponding second annotation and are cleaned, 1,800 are currently cleaned.</Paragraph> <Paragraph position="3"> In the following sections, the automation steps 1 and 2 are presented in detail.</Paragraph> </Section> <Section position="7" start_page="511" end_page="511" type="metho"> <SectionTitle> 4 Tagging Grammatical Functions </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="511" end_page="511" type="sub_section"> <SectionTitle> 4.1 The Tagger </SectionTitle> <Paragraph position="0"> In contrast to a standard part-of-speech tagger which estimates lexical and contextual probabilities of tags from sequences of word-tag pairs in a corpus, (e.g. (Cutting et al., 1992; Feldweg, 1995)), the tagger for grammatical functions works with lexical and contextual probability measures Pq(.) depending on the category of the mother node (Q). Each phrasal category (S, VP, NP, PP etc.) is represented by a different Markov model. The categories of the dau+++(r)+ ++ das 1993 startende Bonusprograrnm for Vielflieger</Paragraph> </Section> </Section> <Section position="8" start_page="511" end_page="511" type="metho"> <SectionTitle> ART CARD ADJA NN APPR NN </SectionTitle> <Paragraph position="0"> has marked the words as a constituent, and the tool's task is to determine simple sub-phrases (the AP and PP) as well as the new node and edge labels (cf.</Paragraph> <Paragraph position="1"> previous figures ~br the resulting structure).</Paragraph> <Paragraph position="2"> ghter nodes correspond to the outputs of the Markov model, while grammatical functions correspond to states.</Paragraph> <Paragraph position="3"> The structure of a sample sentence is shown in figure 6. Figure 7 shows those parts of the Markov models for sentences (S) and verb phrases (VP) that represent the correct paths for the example. 4 Given a sequence of word and phrase categories T = T1...Tk and a parent category Q, we calculate the sequence of grammatical functions G =</Paragraph> <Paragraph position="5"/> <Paragraph position="7"/> <Paragraph position="9"> The contexts Ci are modeled by a fixed number of surrounding elements. Currently, we use two grammatical functions, which results in a trigram model:</Paragraph> <Paragraph position="11"> The contexts are smoothed by linear interpolation of unigrams, bigrams, and trigrams. Their weights are calculated by deleted interpolation (Brown et al., 1992).</Paragraph> <Paragraph position="12"> The predictions of the tagger are correct in approx. 94% of Ml cases. In section 4.3, we demonstrate how to cope with wrong predictions.</Paragraph> <Section position="1" start_page="511" end_page="511" type="sub_section"> <SectionTitle> 4.2 Serial Order </SectionTitle> <Paragraph position="0"> As the annotation format permits trees with crossing branches, we need a convention for determining the relative position of overlapping sibling phrases in order to assign them a position in a Markov model. For instance, in figure 6 the range of the terminal node positions of VP overlaps with those of the subject $B and the finite verb HD. Thus there is no single a-priori position for the VP node 5.</Paragraph> <Paragraph position="1"> The position of a phrase depends on the position of its descendants. We define the relative order of two phrases recursively as the order of their anchors, i.e., some specified daughter nodes. If the anchors are words, we simply take their linear order.</Paragraph> <Paragraph position="2"> The exact definition of the anchor is based on linguistic knowledge. We choose the most intuitive alternative and define the anchor as the head of the phrase (or some equivalent function). Noun phrases do not necessarily have a unique head; instead, we use the last element in the noun kernel (elements of the noun kernel are determiners, adjectives, and nouns) to mark the anchor position. Except for NPs, we employ a default rule that takes the leftmost element as the anchor in case the phrase has no (unique) head.</Paragraph> <Paragraph position="3"> Thus the position of the VP in figure 6 is defined as equal to the string position of besucht. The position of the VP node in figure 1 is equal to that of anfgegeben, and the position of the NP in figure 3 is equivalent to that of Bonusprograrara.</Paragraph> </Section> <Section position="2" start_page="511" end_page="511" type="sub_section"> <SectionTitle> 4.3 Reliability </SectionTitle> <Paragraph position="0"> Experience gained from the development of the Penn Treebank (Marcus et al., 1994) has shown that au-SWithout crossing edges, the serial order of phrases is trivial: phrase Q1 precedes phrase Q2 if and only if all terminal nodes derived from Qa precede those of Q2.</Paragraph> <Paragraph position="1"> This suffices to uniquely determine the order of sibling nodes.</Paragraph> <Paragraph position="2"> tomatic annotation is useful only if it is absolutely correct, while wrong analyses are often difficult to detect and their correction can be time-consuming. To prevent the human annotator from missing errors, the tagger for grammatical functions is equipped with a measure for the reliability of its output. Given a sequence of categories, the tagger calculates the most probable sequence of grammatical functions. In addition, it computes the probabilities of the second-best functions of each daughter node. If some of these probabilities are close to that of the best sequence, the alternatives are regarded as equally suited and the most probable one is not taken to be the sole winner, the prediction is marked as unreliable in the output of the tagger.</Paragraph> <Paragraph position="3"> These unreliable predictions can be further classified in that we distinguish &quot;unreliable&quot; sequences as opposed to &quot;almost reliable&quot; ones.</Paragraph> <Paragraph position="4"> The distance between two probabilities for the best and second-best alternative, Pbest and PseC/ond, is measured by their quotient. The classification of reliability is based on thresholds. In the current implementation we employ three degrees of reliability which are separated by two thresholds 01 and 02. 01 separating unreliable decisions from those considered almost reliable. 02 marks the difference between almost and fully reliable predictions.</Paragraph> <Paragraph position="5"> The probabilities of alternative assignments are within some small specified distance. In this case, it is the annotator who has to specify the grammatical function.</Paragraph> <Paragraph position="7"> The probability of an alternative is within some larger distance. In this case, the most probable function is displayed, but the annotator has to confirm it.</Paragraph> <Paragraph position="9"> The probabilitiesof all alternatives are much smaller than that of the best assignment, thus the latter is assigned.</Paragraph> <Paragraph position="10"> For efficiency, an extended Viterbi algorithm is used. Instead of keeping track of the best path only (of. (Rabiner, 1989)), we keep track of all paths that fall into the range marked by the probability of the best path and 02, i.e., we keep track of all alternative paths with probability Palt for which</Paragraph> <Paragraph position="12"> Suitable values for 01 and 02 were determined empirically (cf. section 6).</Paragraph> </Section> </Section> <Section position="9" start_page="511" end_page="511" type="metho"> <SectionTitle> 5 Tagging Phrase Categories </SectionTitle> <Paragraph position="0"> The second level of automation (cf. section 3) automates the recognition of phrasal categories, and so frees the annotator from typing phrase labels.</Paragraph> <Paragraph position="1"> The task is performed by an extension of the tagger presented in the previous section where different Markov models for each category were introduced.</Paragraph> <Paragraph position="2"> The annotator determines the category of the current phrase, and the tool runs the appropriate model to determine the edge labels.</Paragraph> <Paragraph position="3"> To assign the phrase label automatically, we run all models in parallel. Each model assigns grammatical functions and, more important for this step, a probability to the phrase. The model assigning the highest probability is assumed to be most adequate, and the corresponding label is assigned to the phrase.</Paragraph> <Paragraph position="4"> Formally, we calculate the phrase category Q (and at the same time the sequence of grammatical functions G = G1 ... Gk) on the basis of the sequence of daughters T = T1 ... Tk with argmax maXPQ(G\]T). O G This procedure is equivalent to a different view on the same problem involving one large (combined) Markov model that enables a very efficient calculation of the maximum.</Paragraph> <Paragraph position="5"> Let ~Q be the set of all grammatical functions that can occur within a phrase of type Q. Assume that these sets are pairwise disjoint. One can easily achieve this property by indexing all used grammatical functions with their associated phrases and, if necessary, duplicating labels, e.g., instead of using HD, MO, ..., use the indexed labels HDs, HDvp, MONp, ...This property makes it possible to determine a phrase category by inspecting the grammatical functions involved.</Paragraph> <Paragraph position="6"> When applied, the combined model assigns grammatical functions to the elements of a phrase (not knowing its category in advance). If transitions between states representing labels with different indices are forced to zero probability (together with smoothing applied to other transitions), all labels assigned to a phrase get the same index. This uniquely identifies a phrase category.</Paragraph> <Paragraph position="7"> The two additional conditions G e GQi :=v G C/ GQ2 (Qi C/ Q2)</Paragraph> <Paragraph position="9"> are sufficient to calculate</Paragraph> <Paragraph position="11"> using the Viterbi algorithm and to identify both the phrase category and the respective grammatical functions.</Paragraph> <Paragraph position="12"> Again, as described in section 4, we calculate probabilities for alternative candidates in order to get reliability estimates.</Paragraph> <Paragraph position="13"> The overall accuracy of this approach is approx. 95%, and higher if we only consider the reliable cases. Details about the accuracy are reported in the next section.</Paragraph> </Section> class="xml-element"></Paper>