File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0625_metho.xml
Size: 13,342 bytes
Last Modified: 2025-10-06 14:09:54
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0625"> <Title>Generalized Inference with Multiple Semantic Role Labeling Systems</Title> <Section position="3" start_page="0" end_page="182" type="metho"> <SectionTitle> 1 SRL System Architecture </SectionTitle> <Paragraph position="0"> Our SRL system consists of four stages: pruning, argument identification, argument classification, and inference. In particular, the goal of pruning and argument identification is to identify argument candidates for a given verb predicate. The system only classifies the argument candidates into their types during the argument classification stage. Linguistic and structural constraints are incorporated in the inference stage to resolve inconsistent global predictions. The inference stage can take as its input the output of the argument classification of a single system or of multiple systems. We explain the inference for multiple systems in Sec. 2.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 Pruning </SectionTitle> <Paragraph position="0"> Only the constituents in the parse tree are considered as argument candidates. In addition, our system exploits the heuristic introduced by (Xue and Palmer, 2004) to filter out very unlikely constituents. The heuristic is a recursive process starting from the verb whose arguments are to be identified. It first returns the siblings of the verb; then it moves to the parent of the verb, and collects the siblings again. The process goes on until it reaches the root. In addition, if a constituent is a PP (propositional phrase), its children are also collected. Candidates consisting of only a single punctuation mark are not considered.</Paragraph> <Paragraph position="1"> This heuristic works well with the correct parse trees. However, one of the errors by automatic parsers is due to incorrect PP attachment leading to missing arguments. To attempt to fix this, we consider as arguments the combination of any consecutive NP and PP, and the split of NP and PP inside the NP that was chosen by the previous heuristics.</Paragraph> </Section> <Section position="2" start_page="0" end_page="181" type="sub_section"> <SectionTitle> 1.2 Argument Identification </SectionTitle> <Paragraph position="0"> The argument identification stage utilizes binary classification to identify whether a candidate is an argument or not. We train and apply the binary classifiers on the constituents supplied by the pruning stage. Most of the features used in our system are standard features, which include * Predicate and POS tag of predicate indicate the lemma of the predicate and its POS tag.</Paragraph> <Paragraph position="1"> * Voice indicates tbe voice of the predicate.</Paragraph> <Paragraph position="2"> * Phrase type of the constituent.</Paragraph> <Paragraph position="3"> * Head word and POS tag of the head word include head word and its POS tag of the constituent. We use rules introduced by (Collins, 1999) to extract this feature. * First and last words and POS tags of the constituent. * Two POS tags before and after the constituent.</Paragraph> <Paragraph position="4"> * Position feature describes if the constituent is before or after the predicate relative to the position in the sentence. * Path records the traversal path in the parse tree from the predicate to the constituent.</Paragraph> <Paragraph position="5"> * Subcategorization feature describes the phrase structure around the predicate's parent. It records the immediate structure in the parse tree that expands to its parent. * Verb class feature is the class of the active predicate described in PropBank Frames.</Paragraph> <Paragraph position="6"> * Lengths of the target constituent, in the numbers of words and chunks separately.</Paragraph> <Paragraph position="7"> * Chunk tells if the target argument is, embeds, overlaps, or is embedded in a chunk with its type.</Paragraph> <Paragraph position="8"> * Chunk pattern length feature counts the number of chunks from the predicate to the argument.</Paragraph> <Paragraph position="9"> * Clause relative position is the position of the target word relative to the predicate in the pseudo-parse tree constructed only from clause constituent. There are four configurations--target constituent and predicate share the same parent, target constituent parent is an ancestor of predicate, predicate parent is an ancestor of target word, or otherwise.</Paragraph> <Paragraph position="10"> * Clause coverage describes how much of the local clause (from the predicate) is covered by the argument. It is round to the multiples of 1/4.</Paragraph> </Section> <Section position="3" start_page="181" end_page="181" type="sub_section"> <SectionTitle> 1.3 Argument Classification </SectionTitle> <Paragraph position="0"> This stage assigns the final argument labels to the argument candidates supplied from the previous stage.</Paragraph> <Paragraph position="1"> A multi-class classifier is trained to classify the types of the arguments supplied by the argument identification stage. To reduce the excessive candidates mistakenly output by the previous stage, the classifier can also classify the argument as NULL (&quot;not an argument&quot;) to discard the argument.</Paragraph> <Paragraph position="2"> The features used here are the same as those used in the argument identification stage with the following additional features.</Paragraph> <Paragraph position="3"> * Syntactic frame describes the sequential pattern of the noun phrases and the predicate in the sentence. This is the feature introduced by (Xue and Palmer, 2004).</Paragraph> <Paragraph position="4"> * Propositional phrase head is the head of the first phrase after the preposition inside PP.</Paragraph> <Paragraph position="5"> * NEG and MOD feature indicate if the argument is a baseline for AM-NEG or AM-MOD. The rules of the NEG and MOD features are used in a baseline SRL system developed by Erik Tjong Kim Sang (Carreras and M`arquez, 2004).</Paragraph> <Paragraph position="6"> * NE indicates if the target argument is, embeds, overlaps, or is embedded in a named-entity along with its type.</Paragraph> </Section> <Section position="4" start_page="181" end_page="182" type="sub_section"> <SectionTitle> 1.4 Inference </SectionTitle> <Paragraph position="0"> The purpose of this stage is to incorporate some prior linguistic and structural knowledge, such as &quot;arguments do not overlap&quot; or &quot;each verb takes at most one argument of each type.&quot; This knowledge is used to resolve any inconsistencies of argument classification in order to generate final legitimate predictions. We use the inference process introduced by (Punyakanok et al., 2004). The process is formulated as an integer linear programming (ILP) problem that takes as inputs the confidences over each type of the arguments supplied by the argument classifier. The output is the optimal solution that maximizes the linear sum of the confidence scores (e.g., the conditional probabilities estimated by the argument classifier), subject to the constraints that encode the domain knowledge.</Paragraph> <Paragraph position="1"> Formally speaking, the argument classifier attempts to assign labels to a set of arguments, S1:M, indexed from 1 to M. Each argument Si can take any label from a set of argument labels, P, and the indexed set of arguments can take a set of labels, c1:M [?] PM. If we assume that the argument classifier returns an estimated conditional probability distribution, Prob(Si = ci), then, given a sentence, the inference procedure seeks an global assignment that maximizes the following objective function,</Paragraph> <Paragraph position="3"> subject to linguistic and structural constraints. In other words, this objective function reflects the ex- null pected number of correct argument predictions, sub-ject to the constraints. The constraints are encoded as the followings.</Paragraph> <Paragraph position="4"> * No overlapping or embedding arguments.</Paragraph> <Paragraph position="5"> * No duplicate argument classes for A0-A5.</Paragraph> <Paragraph position="6"> * Exactly one V argument per predicate considered. * If there is C-V, then there has to be a V-A1-CV pattern. * If there is an R-arg argument, then there has to be an arg argument.</Paragraph> <Paragraph position="7"> * If there is a C-arg argument, there must be an arg argu- null ment; moreover, the C-arg argument must occur after arg. * Given the predicate, some argument types are illegal (e.g. predicate 'stalk' can take only A0 or A1). The illegal types may consist of A0-A5 and their corresponding C-arg and R-arg arguments. For each predicate, we look for the minimum value of i such that the class Ai is mentioned in its frame file as well as its maximum value j. All argument types Ak such that k < i or k > j are considered illegal.</Paragraph> </Section> </Section> <Section position="4" start_page="182" end_page="182" type="metho"> <SectionTitle> 2 Inference with Multiple SRL Systems </SectionTitle> <Paragraph position="0"> The inference process allows a natural way to combine the outputs from multiple argument classifiers. Specifically, given k argument classifiers which perform classification on k argument sets, {S1,...,Sk}. The inference process aims to optimize the objective function:</Paragraph> <Paragraph position="2"> where Probj is the probability output by system j.</Paragraph> <Paragraph position="3"> Note that all systems may not output with the same set of argument candidates due to the pruning and argument identification. For the systems that do not output for any candidate, we assign the probability with a prior to this phantom candidate. In particular, the probability of the NULL class is set to be 0.6 based on empirical tests, and the probabilities of the other classes are set proportionally to their occurrence frequencies in the training data.</Paragraph> <Paragraph position="4"> For example, Figure 1 shows the two candidate sets for a fragment of a sentence, &quot;..., traders say, unable to cool the selling panic in both stocks and futures.&quot; In this example, system A has two argument candidates, a1 = &quot;traders&quot; and a4 = &quot;the selling panic in both stocks and futures&quot;; system B has three argument candidates, b1 = &quot;traders&quot;, b2 = &quot;the selling panic&quot;, and b3 = &quot;in both stocks and futures&quot;. The phantom candidates are created for a2, a3, and b4 of which probability is set to the prior.</Paragraph> <Paragraph position="5"> Specifically for this implementation, we first train two SRL systems that use Collins' parser and Charniak's parser respectively. In fact, these two parsers have noticeably different output. In evaluation, we run the system that was trained with Charniak's parser 5 times with the top-5 parse trees output by Charniak's parser1. Together we have six different outputs per predicate. Per each parse tree output, we ran the first three stages, namely pruning, argument identification, and argument classification. Then a joint inference stage is used to resolve the inconsistency of the output of argument classification in these systems.</Paragraph> </Section> <Section position="5" start_page="182" end_page="183" type="metho"> <SectionTitle> 3 Learning and Evaluation </SectionTitle> <Paragraph position="0"> The learning algorithm used is a variation of the Winnow update rule incorporated in SNoW (Roth, 1998; Roth and Yih, 2002), a multi-class classifier that is tailored for large scale learning tasks. SNoW learns a sparse network of linear functions, in which the targets (argument border predictions or argument type predictions, in this case) are represented as linear functions over a common feature space. It improves the basic Winnow multiplicative update rule with a regularization term, which has the effect of trying to separate the data with a large margin separator (Grove and Roth, 2001; Hang et al., 2002) and voted (averaged) weight vector (Freund and Schapire, 1999).</Paragraph> <Paragraph position="1"> Softmax function (Bishop, 1995) is used to convert raw activation to conditional probabilities. If there are n classes and the raw activation of class i is acti, the posterior estimation for class i is</Paragraph> <Paragraph position="3"> In summary, training used both full and partial syntactic information as described in Section 1. In training, SNoW's default parameters were used with the exception of the separator thickness 1.5, the use of average weight vector, and 5 training cycles. The parameters are optimized on the development set.</Paragraph> <Paragraph position="4"> Training for each system took about 6 hours. The evaluation on both test sets which included running the WSJ test (bottom).</Paragraph> <Paragraph position="5"> with all six different parse trees (assumed already given) and the joint inference took about 4.5 hours. result with joint inference on the development set. Overall results on the development and test sets are shown in Table 1. Table 2 shows the results of individual systems and the improvement gained by the joint inference on the development set.</Paragraph> </Section> class="xml-element"></Paper>