File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/p01-1045_metho.xml
Size: 16,897 bytes
Last Modified: 2025-10-06 14:07:40
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1045"> <Title>From Chunks to Function-Argument Structure: A Similarity-Based Approach</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The T&quot;uSBL Architecture </SectionTitle> <Paragraph position="0"> In order to ensure a robust and efficient architecture, T&quot;uSBL, a similarity-based chunk parser, is organized in a three-level architecture, with the output of each level serving as input for the next higher level. The first level is part-of-speech (POS) tagging of the input string with the help of the bigram tagger LIKELY (Feldweg, 1993).2 The parts of speech serve as pre-terminal elements for the next step, i.e. the chunk analysis.</Paragraph> <Paragraph position="1"> Chunk parsing is carried out by an adapted version of Abney's (1996) CASS parser, which is realized as a cascade of finite-state transducers.</Paragraph> <Paragraph position="2"> The chunks, which extend if possible to the simplex clause level, are then remodeled into complete trees in the tree construction level.</Paragraph> <Paragraph position="3"> The tree construction level is similar to the DOP approach (Bod, 1998; Bod, 2000) in that it uses complete tree structures instead of rules.</Paragraph> <Paragraph position="4"> Contrary to Bod, we only use the complete trees and do not allow tree cuts. Thus the number of possible combinations of partial trees is strictly controlled. The resulting parser is highly efficient (3770 English sentences took 106.5 seconds to parse on an Ultra Sparc 10).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Chunking and Tree Construction </SectionTitle> <Paragraph position="0"> The division of labor between the chunking and tree construction modules can best be illustrated by an example.</Paragraph> <Paragraph position="1"> For sentences such as the input shown in Fig. 1, the chunker produces a structure in which some constituents remain unattached or partially annotated in keeping with the chunk-parsing strategy to factor out recursion and to resolve only unambiguous attachments.</Paragraph> <Paragraph position="2"> Since chunks are by definition non-recursive structures, a chunk of a given category cannot fully automatic recognition of functional labels. 2The inventory of POS tags is based on the STTS (Schiller et al., 1995) for German and on the Penn Treebank tagset (Santorini, 1990) for English.</Paragraph> <Paragraph position="3"> Input: alright and that should get us there about nine in the evening Chunk parser output: [uh alright] [simpx_ind [cc and] [that that] [vp [md should] [vb get]] [pp us] [adv [rb there]] [prep_p [about about] [np [cd nine]]] [prep_p [in in] [np [dt the] [daytime evening]]]] contain another chunk of the same type. In the case at hand, the two prepositional phrases ('prep p') about nine and in the evening in the chunk output cannot be combined into a single chunk, even though semantically these words constitute a single constituent. At the level of tree construction, as shown in Fig. 2, the prohibition against recursive phrases is suspended. Therefore, the proper PP attachment becomes possible. Additionally, the phrase about nine was wrongly categorized as a 'prep p'. Such miscategorizations can arise if a given word can be assigned more than one POS tag. In the case of about the tags 'in' (for: preposition) or 'rb' (for: adverb) would be appropriate. However, since the POS tagger cannot resolve this ambiguity from local context, the underspecified tag 'about' is assigned, instead. However, this can in turn lead to misclassification in the chunker.</Paragraph> <Paragraph position="4"> The most obvious deficiency of the chunk output shown in Fig. 1 is that the structure does not contain any information about the function-argument structure of the chunked phrases. However, once a (more) complete parse structure is created, the grammatical function of each major constituent needs to be identified. The labels SUBJ (for: subject), HD (for: head), ADJ (for: adjunct) COMP (for: complement), SPR (for: specifier), which appear as edge-labels between tree nodes in Fig. 2, signify the grammatical functions of the constituents in question. E.g. the label SUBJ encodes that the NP that is the subject of the whole sentence. The label ADJ above the phrase about nine in the evening signifies that this phrase is an adjunct of the verb get. T&quot;uSBL currently uses as its instance base two semi-automatically constructed treebanks of German and English that consist of appr. 67,000 and 35,000 fully annotated sentences, respectively3. Each treebank uses a different annotation scheme at the level of function-argument structure4. As shown in Table 1, the English treebank uses a total of 13 functional labels, while the German tree-bank has a richer set of 36 function labels. For German, therefore, the task of tree construction is slightly more complex because of the larger set of functional labels. Fig. 3 gives an example for a German input sentence and its corresponding chunk parser output.</Paragraph> <Paragraph position="5"> In this case, the subconstituents of the extraposed coordinated noun phrase are not attached to the simplex clause that ends with the non-finite verb that is typically in clause-final position in declarative main clauses of German. Moreover, each conjunct of the coordinated noun phrase forms a completely flat structure. T&quot;uSBL's tree construction module enriches the chunk output as shown in Fig. 4. Here the internally recursive NP conjuncts have been coordinated and in- null field-model standardly used in empirical studies of German syntax. The annotation for English is modeled after the theoretical assumptions of Head-Driven Phrase Structure Grammar. null Input: dann w&quot;urde ich vielleicht noch vorschlagen Donnerstag den elften und Freitag den zw&quot;olften August (then I would suggest maybe Thursday eleventh and Friday twelfth of August) tegrated correctly into the clause as a whole. In addition, function labels such as MOD (for: modifier), HD (for head), ON (for: subject), OA (for: direct object), OV (for: verbal object), and APP (for: apposition) have been added that encode the function-argument structure of the sentence.</Paragraph> <Paragraph position="6"> based learning assumes that the classification of a given input should be based on the similarity to previously seen instances of the same type that have been stored in memory. This paradigm is an instance of lazy learning in the sense that these previously encountered instances are stored &quot;as is&quot; and are crucially not abstracted over, as is typically the case in rule-based systems or other learning approaches. Previous applications of 5Memory-based learning has recently been applied to a variety of NLP classification tasks, including part-of-speech tagging, noun phrase chunking, grapheme-phoneme conversion, word sense disambiguation, and PP attachment (see (Daelemans et al., 1999; Veenstra et al., 2000; Zavrel et al., 1997) for details).</Paragraph> <Paragraph position="7"> memory-based learning to NLP tasks consisted of classification problems in which the set of classes to be learnt was simple in the sense that the class items did not have any internal structure and the number of distinct items was small. Since in the current application, the set of classes are parse trees, the classification task is much more complex. The classification is simple only in those cases where a direct hit is found, i.e. where a complete match of the input with a stored instance exists. In all other cases, the most similar tree from the instance base needs to be modified to match the chunked input. This means that the output tree will group together only those elements from the chunked input for which there is evidence in the instance base. If these strategies fail for complete chunks, T&quot;uSBL attempts to match smaller subchunks.</Paragraph> <Paragraph position="8"> The algorithm used for tree construction is presented in a slightly simplified form in Figs. 5-8. For readability, we assume here that chunks and complete trees share the same data structure so that subroutines like string yield can operate on both of them indiscriminately.</Paragraph> <Paragraph position="9"> The main routine construct tree in Fig. 5 separates the list of input chunks and passes each one to the subroutine process chunk in Fig. 6 where the chunk is then turned into one or more (partial) trees. process chunk first checks if a complete match with an instance from the instance base is possible.6 If this is not the case, a partial match on the lexical level is attempted. If a partial tree is found, attach next chunk in Fig. 7 and extend tree in Fig. 8 are used to extend the tree by either attaching one more chunk or by resorting to a comparison of the missing parts of the chunk with tree extensions on the POS level. attach next chunk is necessary to ensure that the best possible tree is found even in the rare case that the original segmentation into chunks contains mistakes. If no partial tree is found, the tree construction backs off to finding a complete match at the POS level or to starting the subroutine for processing a chunk recursively with all the subchunks of the present chunk.</Paragraph> <Paragraph position="10"> The application of memory-based techniques is implemented in the two subroutines complete match and partial match. The presentation of the two cases as two separate subroutines is for expository purposes only. In the actual implementation, the search is carried out only once. The two subroutines exist because of the postprocessing of the chosen tree, which is necessary for partial matches and which also deviates from standard memory-based applications. Postprocessing mainly consists of shortening the tree from the instance base so that it covers only those parts of the chunk that could be matched. However, if the match is done on the lexical level, a correction of tagging errors is possible if there is enough evidence in the instance base. T&quot;uSBL currently uses an overlap metric, the most basic metric for in6string yield returns the sequence of words included in the input structure, pos yield the sequence of POS tags. stances with symbolic features, as its similarity metric. This overlap metric is based on either lexical or POS features. Instead of applying a more sophisticated metric like the weighted overlap metric, T&quot;uSBL uses a backing-off approach that heavily favors similarity of the input with pre-stored instances on the basis of substring identity. Splitting up the classification and adaptation process into different stages allows T&quot;uSBL to prefer analyses with a higher likelihood of being correct. This strategy enables corrections of tagging and segmentation errors that may occur in the chunked input.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Quantitative Evaluation </SectionTitle> <Paragraph position="0"> Quantitive evaluations of robust parsers typically focus on the three PARSEVAL measures: labeled precision, labeled recall and crossing accuracy. It has frequently been pointed out that these evaluation parameters provide little or no information as to whether a parser assigns the correct semantic structure to a given input, if the set of category labels comprises only syntactic categories in the narrow sense, i.e. includes only names of lexical and phrasal categories. This justified criticism observes that a measure of semantic accuracy can only be obtained if the gold standard includes annotations of syntactic-semantic dependencies between bracketed constituents. It is to answer this criticism that the evaluation of the T&quot;uSBL system presented here focuses on the correct assignment of functional labels. For an in-depth evaluation that focuses on syntactic categories, we refer the interested reader to (K&quot;ubler and Hinrichs, 2001).</Paragraph> <Paragraph position="1"> The quantitative evaluation of T&quot;uSBL has been conducted on the treebanks of German and English described in section 3. Each treebank uses a different annotation scheme at the level of function-argument structure. As shown in Table 1, the English treebank uses a total of 13 functional labels, while the German treebank has a richer set of 36 function labels.</Paragraph> <Paragraph position="2"> The evaluation consisted of a ten-fold cross-validation test, where the training data provide an instance base of already seen cases for T&quot;uSBL's tree construction module. The evaluation was performed for both the German and English data.</Paragraph> <Paragraph position="3"> For each language, the following parameters were measured: 1. labeled precision for syntactic catconstruct tree(chunk list, treebank): while (chunk list is not empty) do remove first chunk from chunk list process chunk(chunk, treebank)</Paragraph> <Paragraph position="5"> if (tree is not empty) direct hit, then output(tree) i.e. complete chunk found in treebank else tree := partial match(words, treebank) if (tree is not empty) then if (tree = postfix of chunk) then</Paragraph> <Paragraph position="7"> if (tree is not empty) then tree := tree1 if ((chunk - tree) is not empty) if attach next chunk succeeded then tree := extend tree(chunk - tree, tree, treebank) chunk might consist of both chunks output(tree) if ((chunk - tree) is not empty) chunk might consist of both chunks (s.a.) then process chunk(chunk - tree, treebank) i.e. process remaining chunk else back off to POS sequence</Paragraph> <Paragraph position="9"> if (tree is not empty) then output(tree) else back off to subchunks while (chunk is not empty) do remove first subchunk c1 from chunk process chunk(c1, treebank) The results of the quantitative evaluation are shown in Tables 2 and 3. The results for labeled recall underscore the difficulty of applying the classical PARSEVAL measures to a partial parslanguage parameter minimum maximum average language parameter minimum maximum average German labeled precision for synt. cat. 81.28 % 82.08 % 81.56 % labeled precision for funct. cat. 89.26 % 90.13 % 89.73 % English labeled precision for synt. cat. 66.15 % 67.34 % 66.84 % labeled precision for funct. cat. 90.07 % 90.93 % 90.40 % ing approach like ours. We have, therefore divided the incorrectly matched nodes into three categories: the genuine false positives where a tree structure is found that matches the gold standard, but is assigned the wrong label; nodes which, relative to the gold standard, remain unattached in the output tree; and nodes contained in the gold standard for which no match could be found in the parser output. Our approach follows a strategy of positing and attaching nodes only if sufficient evidence can be found in the instance base. Therefore the latter two categories cannot really be considered errors in the strict sense. Nevertheless, in future research we will attempt to significantly reduce the proportion of unattached and unmatched nodes by exploring matching algorithms that permit a higher level of generalization when matching the input against the instance base. What is encouraging about the recall results reported in Table 2 is that the parser produces genuine false positives for an average of only 3.03 % for German and 3.25 % for English.</Paragraph> <Paragraph position="10"> For German, labeled precision for syntactic categories yielded 81.56 % correctness. While these results do not reach the performance reported for other parsers (cf. (Collins, 1999; Charniak, 1997)), it is important to note that the two treebanks consist of transliterated spontaneous speech data. The fragmentary and partially ill-formed nature of such spoken data makes them harder to analyze than written data such as the Penn treebank typically used as gold standard.</Paragraph> <Paragraph position="11"> It should also be kept in mind that the basic PARSEVAL measures were developed for parsers that have as their main goal a complete analysis that spans the entire input. This runs counter to the basic philosophy underlying an amended chunk parser such as T&quot;uSBL, which has as its main goal robustness of partially analyzed structures. null Labeled precision of functional labels for the German data resulted in a score of 89.73 % correctness. For English, precision of functional labels was 90.40 %. The slightly lower correctness rate for German is a reflection of the larger set of function labels used by the grammar. This raises interesting more general issues about trade-offs in accuracy and granularity of functional annotations. null</Paragraph> </Section> class="xml-element"></Paper>