File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0633_metho.xml
Size: 10,874 bytes
Last Modified: 2025-10-06 14:09:54
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0633"> <Title>Semantic Role Labeling Using Lexical Statistical Information</Title> <Section position="4" start_page="0" end_page="215" type="metho"> <SectionTitle> 2 System description </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="214" type="sub_section"> <SectionTitle> 2.1 Preprocessing </SectionTitle> <Paragraph position="0"> During preprocessing the predicates' semantic arguments are mapped to the nodes in the parse trees, a set of hand-crafted shallow tree pruning rules are applied, probability distributions for feature representation are generated from training data1, and feature vectors are extracted. Those are finally fed into the classifier for semantic role classification.</Paragraph> <Paragraph position="1"> 2.1.1 Tree node mapping of semantic arguments and named entities Following Gildea & Jurafsky (2002), (i) labels matching more than one constituent due to non-branching nodes are taken as labels of higher constituents, (ii) in cases of labels with no corresponding parse constituent, these are assigned to the partial match given by the constituent spanning the shortest portion of the sentence beginning at the label's span left boundary and lying entirely within it. We drop the role or named entity label if such suit- null The tagged trees are further processed by applying the following pruning rules: * All punctuation nodes are removed. This is for removing punctuation information, as well as for aligning spans of the syntactic nodes with PropBank constituents3.</Paragraph> <Paragraph position="2"> * If a node is unary branching and its daughter is also unary branching, the daughter is removed.</Paragraph> <Paragraph position="3"> This allows to remove redundant nodes spanning the same tokens in the sentence.</Paragraph> <Paragraph position="4"> * If a node has only preterminal children, these are removed. This allows to internally collapse base phrases such as base NPs.</Paragraph> <Paragraph position="5"> Tree pruning was carried out in order to reduce the number of nodes from which features were to be extracted later. This limits the number of candidate constituents for role labeling, and removes redundant information produced by the pipeline of previous components (i.e. PoS tags of preterminal labels), as well as the sparseness and fragmentation of the input data. These simple rules reduce the number of constituents given by the parser output by 38.4% on the training set, and by 38.7% on the development set, at the cost of limiting the coverage of the system by removing approximately 2% of the target role labeled constituents. On the development set, the number of constituents remaining on top of pruning is 81,193 of which 7,558 are semantic arguments, with a performance upper-bound of 90.6% F1.</Paragraph> <Paragraph position="6"> Given the pruned tree structures, we traverse the tree bottom-up left-to-right. For each non-terminal node whose span does not overlap the predicate we extract the following features: Phrase type: the syntactic category of the constituent (NP, PP, ADVP, etc.). In order to reduce the number of phrase labels, we retained only 3We noted during prototyping that in many cases no tree node fully matching a role constituent could be found, as the latter did not include punctuation tokens, whereas in Collins' trees the punctuation terminals are included within the preceding phrases. This precludes a priori the output to align to the gold standard PropBank annotation and we use therefore pruning as a recovery strategy.</Paragraph> <Paragraph position="7"> those labels which account for at least 0.1% of the overall available semantic arguments in the training data. We replace the label for every phrase type category below this threshold with a generic UNK label. This reduces the number of labels from 72 to 18.</Paragraph> <Paragraph position="8"> Position: the position of the constituent with respect to the target predicate (before or after).</Paragraph> <Paragraph position="9"> Adjacency: whether the right (if before) or left (if after) boundary of the constituent is adjacent, non-adjacent or inside the predicate's chunk.</Paragraph> <Paragraph position="10"> Clause: whether the constituent belongs to the clause of the predicate or not.</Paragraph> <Paragraph position="11"> Proposition size: measures relative to the proposition size, such as (i) the number of constituents and (ii) predicates in the proposition.</Paragraph> <Paragraph position="12"> Constituent size: measures relative to the constituent size, namely (i) the number of tokens and (ii) subconstituents (viz., non-leaf rooted subtrees) of the constituent.</Paragraph> <Paragraph position="13"> Predicate: the predicate lemma, represented as the probability distribution P(r|p) of the predicate p of taking one of the available r semantic roles. For unseen predicates we assume a uniform distribution.</Paragraph> <Paragraph position="14"> Voice: whether the predicate is in active or passive form. Passive voice is identified if the predicate's PoS tag is VBN and either it follows a form of to be or to get, or it does not belong to a VP chunk, or is immediately preceded by an NP chunk.</Paragraph> <Paragraph position="15"> Head word: the head word of the constituent, represented as the probability distribution P(r|hw) of the head word hw of heading a phrase filling one of the available r semantic roles. For unseen words we back off on a phrasal model by using the probability distribution P(r|pt) of the phrase type pt of filling a semantic slot r.</Paragraph> <Paragraph position="16"> Head word PoS: the PoS of the head word of the constituent, similarly represented as the probability distribution P(r|pos) of a PoS pos of belonging to a constituent filling one of the available r semantic roles.</Paragraph> <Paragraph position="17"> Local lexical context: the words in the constituent other than the head word, represented as the averaged probability distributions of each i-th non-head word wi of occurring in one of the available r semantic roles, namely</Paragraph> <Paragraph position="19"> constituent. For each unseen word we back off by using the probability distribution P(r|posi) of the PoS posi of filling a semantic role r4.</Paragraph> <Paragraph position="20"> Named entities: the label of the named entity which spans the same words as the constituent, as well as the label of the largest named entity embedded within the constituent. Both values are set to NULL if such labels could not be found.</Paragraph> <Paragraph position="21"> Path: the number of intervening NPB, NP, VP, VP-A, PP, PP-A, S, S-A and SBAR nodes along the path from the constituent to the predicate.</Paragraph> <Paragraph position="22"> Distance: the distance from the target predicate, measured as (i) the number of nodes from the constituent to the lowest node in the tree dominating both the constituent and the predicate, (ii) the number of nodes from the predicate to the former common dominating node5, (iii) the number of chunks between the base phrase of the constituent's head and the predicate chunk, (iv) the number of tokens between the head of the constituent and the predicate.</Paragraph> </Section> <Section position="2" start_page="214" end_page="214" type="sub_section"> <SectionTitle> 2.2 Classifier </SectionTitle> <Paragraph position="0"> We used the YaDT6 implementation of the C4.5 decision tree algorithm (Quinlan, 1993). Parameter selection (99% pruning confidence, at least 10 instances per leaf node) was carried out by performing 10-fold cross-validation on the development set.</Paragraph> <Paragraph position="1"> Data preprocessing and feature vector generation took approximately 2.5 hours (training set, including probability distribution generation), 5 minutes (development) and 7 minutes (test) on a 2GHz Opteron 4This feature was introduced as the information provided by lexical heads does not seem to suffice in many cases. This is shown by head word ambiguities, such as LOC and TMP arguments occurring in similar prepositional syntactic configurations -- i.e. the preposition in, which can be head of both AM-TMP and AM-LOC constituents, as in in October and in New York. The idea is therefore to look at the words in the constituents other than the head, and build up an overall constituent representation, thus making use of statistical lexical information for role disambiguation.</Paragraph> <Paragraph position="2"> dual processor server with 2GB memory7. Training time was of approximately 17 minutes. The final system was trained using all of the available training data from sections 2-21 of the Penn TreeBank. This amounts to 2,250,887 input constituents of which 10% are non-NULL examples. Interestingly, during prototyping we first limited ourselves to training and drawing probability distributions for feature representation from sections 15-18 only. This yielded a very low performance (57.23% F1, development set). A substantial performance increase was given by still training on sections 15-18, but using the probability distributions generated from sections 2-21 (64.43% F1, development set). This suggests that the system is only marginally sensitive to the training dataset size, but pivotally relies on taking probability distributions from a large amount of data.</Paragraph> <Paragraph position="3"> In order to make the task easier and overcome the uneven role class distribution, we limited the learner to classify only those 16 roles accounting for at least 0.5% of the total number of semantic arguments in the training data8.</Paragraph> </Section> <Section position="3" start_page="214" end_page="215" type="sub_section"> <SectionTitle> 2.3 Post-processing </SectionTitle> <Paragraph position="0"> As our system does not build an overall sentence contextual representation, it systematically produced errors such as embedded role labeling. In particular, since no embedding is observed for the semantic arguments of predicates, in case of (multiple) embeddings the classifier output was automatically post-processed to retain only the largest embedding constituent. Evaluation on the development set has shown that this does not significantly improve performance, still it provides a much more 'sane' output. Besides, we make use of a simple technique for avoiding multiple A0 or A1 role assignments within the same proposition, based on constituent position and predicate voice. In case of multiple A0 labels, if the predicate is in active form, the second A0 occurrence is replaced with A1, else we replace the first occurrence. Similarly, in case of multiple A1 labels, if the predicate is in active form, the first A1 occurrence is replaced with A0, else we</Paragraph> </Section> </Section> class="xml-element"></Paper>