File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2303_metho.xml
Size: 19,348 bytes
Last Modified: 2025-10-06 14:10:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2303"> <Title>Robust Parsing of the Proposition Bank</Title> <Section position="3" start_page="11" end_page="12" type="metho"> <SectionTitle> 2 The Basic Parsing Architecture </SectionTitle> <Paragraph position="0"> To achieve the complex task of assigning semantic role labels while parsing, we use a family of statistical parsers, the Simple Synchrony Network (SSN) parsers (Henderson, 2003), which do not make any explicit independence assumptions, and are therefore likely to adapt without much modification to the current problem. This architecture has shown state-of-the-art performance.</Paragraph> <Paragraph position="1"> SSN parsers comprise two components, one which estimates the parameters of a stochastic model for syntactic trees, and one which searches for the most probable syntactic tree given the parameter estimates. As with many other statistical parsers (Collins, 1999; Charniak, 2000), SSN parsers use a history-based model of parsing. Events in such a model are derivation moves. The set of well-formed sequences of derivation moves in this parser is defined by a Predictive LR push-down automaton (Nederhof, 1994), which implements a form of left-corner parsing strategy. The derivation moves include: projecting a constituent with a specified label, attaching one constituent to another, and shifting a tag-word pair onto the pushdown stack.</Paragraph> <Paragraph position="2"> Unlike standard history-based models, SSN parsers do not state any explicit independence assumptions between derivation steps. They use a neural network architecture, called Simple Synchrony Network (Henderson and Lane, 1998), to induce a finite history representation of an unbounded sequence of moves. The history representation of a parse history d1,...,di[?]1, which we denote h(d1,...,di[?]1), is assigned to the constituent that is on the top of the stack before the ith move.</Paragraph> <Paragraph position="3"> The representation h(d1,...,di[?]1) is computed from a set f of features of the derivation move di[?]1 and from a finite set D of recent history representations h(d1,...,dj), where j < i [?] 1. Because the history representation computed for the move i [?] 1 is included in the inputs to the computation of the representation for the next move i, virtually any information about the derivation history could flow from history representation to history representation and be used to estimate the probability of a derivation move. However, the recency preference exhibited by recursively defined neural networks biases learning towards information which flows through fewer history representations. (Henderson, 2003) exploits this bias by directly inputting information which is considered relevant at a given step to the history representation of the constituent on the top of the stack before that step. In addition to history representations, the inputs to h(d1,...,di[?]1) include hand-crafted features of the derivation history that are meant to be relevant to the move to be chosen at step i. For each of the experiments reported here, the set D that is input to the computation of the history representation of the derivation moves d1,...,di[?]1 includes the most recent history representation of the following nodes: topi, the node on top of the pushdown stack before the ith move; the left-corner ancestor of topi (that is, the second top-most node on the parser's stack); the leftmost child of topi; and the most recent child of topi, if any. The set of features f includes the last move in thederivation,thelabelortagoftopi,thetag-word pair of the most recently shifted word, and the left-most tag-word pair that topi dominates. Given the hidden history representation h(d1,***,di[?]1) of a derivation, a normalized exponential output function is computed by SSNs to estimate a probability distribution over the possible next derivation moves di.2 The second component of SSN parsers, which searches for the best derivation given the parameter estimates, implements a severe pruning strategy. Such pruning handles the high computational cost of computing probability estimates with SSNs, and renders the search tractable. The space of possible derivations is pruned in two different ways. The first pruning occurs immediately after a tag-word pair has been pushed onto the stack: only a fixed beam of the 100 best derivations ending in that tag-word pair are expanded. For training, the width of such beam is set to five.</Paragraph> <Paragraph position="4"> A second reduction of the search space prunes the space of possible project or attach derivation moves: a best-first search strategy is applied to the five best alternative decisions only.</Paragraph> <Paragraph position="5"> The next section describes our model, extended to produce richer output parse trees annotated with semantic role labels.</Paragraph> </Section> <Section position="4" start_page="12" end_page="14" type="metho"> <SectionTitle> 3 Learning Semantic Role Labels </SectionTitle> <Paragraph position="0"> Previous work on learning function labels during parsing (Merlo and Musillo, 2005; Musillo and Merlo, 2005) assumed that function labels represent the interface between lexical semantics and syntax. We extend this hypothesis to the semantic role labels assigned in PropBank, as they are an exhaustive extension of function labels, which have been reorganised in a coherent inventory of labelsandassignedexhaustivelytoallsentencesin the PTB. Because PropBank is built on the PTB, it inherits in part its notion of function labels which is directly integrated into the AM-X role labels.</Paragraph> <Paragraph position="1"> SSN parsing models. It performs a gradient descent with a maximum likelihood objective function and weight decay regularization (Bishop, 1995).</Paragraph> <Paragraph position="2"> ceiving a syntactic functional label such as SBJ (subject) or DTV (dative).</Paragraph> <Paragraph position="3"> Because they are projections of the lexical semantics of the elements in the sentence, semantic role labels are projected bottom-up, they tend to appear low in the tree and they are infrequently found on the higher levels of the parse tree, where projections of grammatical, as opposed to lexical, elements usually reside. Because they are the interface level with syntax, semantic labels are also subject to distributional constraints that govern syntactic dependencies, such as argument structure or subcategorization. We attempt to capture such constraints by modelling the c-command relation. Recall that the c-command relation relates two nodes in a tree, even if they are not close to each other, provided that the first node dominating one node also dominate the other. This notion of c-command captures both linear and hierarchical constraints and defines the domain in which semantic role labelling applies.</Paragraph> <Paragraph position="4"> While PTB function labels appear to overlap to alargeextentwithPropBanksemanticrolellabels, work by (Ye and Baldwin, 2005) on semantic labelling prepositional phrases, however, indicates that the function labels in the Penn Treebank are assigned more sporadically and heterogeneously than in PropBank. Apparently only the &quot;easy&quot; cases have been tagged functionally, because assigning these function tags was not the main goal of the annotation. PropBank instead was annotated exhaustively, taking all cases into account, annotating multiple roles, coreferences and discontinuous constituents. It is therefore not void of interest to test our hypothesis that, like function labels, semantic role labels are the interface between syntax and semantics, and they need to be recovered by applying constraints that model both higher level nodes and lower level ones.</Paragraph> <Paragraph position="5"> We assume that semantic roles are very often projected by the lexical semantics of the words in the sentence. We introduce this bottom-up lexical information by fine-grained modelling of semantic role labels. Extending a technique presented in (Klein and Manning, 2003) and adopted in (Merlo and Musillo, 2005; Musillo and Merlo, 2005) for function labels, we split some part-of-speech tags into tags marked with semantic role labels. The semantic role labels attached to a non-terminal directly projected by a preterminal and belonging to a few selected categories (DIR, EXT, LOC, MNR, PNC, CAUS and TMP) were propagated down to the pre-terminal part-of-speech tag of its head. To affect only labels that are projections of lexical semantics properties, the propagation takes into account the distance of the projection from the lexical head to the label, and distances greater than two are not included. Figure 2 illustrates the result of this operation.</Paragraph> <Paragraph position="6"> In our augmented model, inputs to each history representation are selected according to a linguistically motivated notion of structural locality over which dependencies such as argument structure or subcategorization could be specified.</Paragraph> <Paragraph position="7"> In SSN parsing models, the set D of nodes that are structurally local to a given node on top of the stack defines the structural distance between this given node and other nodes in the tree. Such a notion of distance determines the number of history representations through which information passes semantic role labels to capture the notion of c-command (solid lines). to flow from the representation of a node i to the representation of a node j. By adding nodes to the set D, one can shorten the structural distance between two nodes and enlarge the locality domain over which dependencies can be specified.</Paragraph> <Paragraph position="8"> To capture a locality domain appropriate for semantic role parsing, we add the most recent child oftopi labelledwithasemanticrolelabeltotheset D. These additions yield a model that is sensitive to regularities in structurally defined sequences of nodes bearing semantic role labels, within and across constituents. This modification of the biases is illustrated in Figure 3.</Paragraph> <Paragraph position="9"> This figure displays two constituents, S and VP with some of their respective child nodes. The VP node is assumed to be on the top of the parser's stack, and the S one is supposed to be its left-corner ancestor. The directed arcs represent the information that flows from one node to another.</Paragraph> <Paragraph position="10"> According to the original SSN model in (Henderson, 2003), only the information carried over by the leftmost child and the most recent child of a constituent directly flows to that constituent. In the figure above, only the information conveyed by the nodes a and d is directly input to the node S. Similarly, the only bottom-up information directly input to the VP node is conveyed by the child nodes o and th. In the original SSN models, nodes bearing a function label such as ph1 and ph2 are not directly input to their respective parents.</Paragraph> <Paragraph position="11"> In our extended model, information conveyed by ph1 and ph2 directly flows to their respective parents. So the distance between the nodes ph1 and ph2, which stand in a c-command relation, is shortened. For more information on this technique to capture domains induced by the c-command relation, see (Musillo and Merlo, 2005).</Paragraph> <Paragraph position="12"> We report the effects of these augmentations on parsingresultsintheexperimentsdescribedbelow.</Paragraph> </Section> <Section position="5" start_page="14" end_page="15" type="metho"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> Our extended semantic role SSN parser was trained on sections 2-21 and validated on section 24 from the PropBank. Training, validating and testing data sets consist of the PTB data annotated with PropBank semantic roles labels, as provided in the CoNLL-2005 shared task (Carreras and Marquez, 2005).</Paragraph> <Paragraph position="1"> Our augmented model has a total 613 of non-terminals to represents both the PTB and Prop-Bank labels of constituents, instead of the 33 of theoriginalSSNparser. The580newlyintroduced labels consist of a standard PTB label followed by a set of one or more PropBank semantic role such as PP-AM-TMP or NP-A0-A1. As a result of lowering the six AM-X semantic role labels, 240 new part-of-speech tags were introduced to partition the original tag set which consisted of 45 tags. SSN parsers do not tag their input sentences. To provide the augmented model with tagged input sentences, we trained an SVM tagger whose features and parameters are described in detail in (GimenezandMarquez, 2004). Trainedonsection 2-21, the tagger reaches a performance of 95.45% on the test set (section 23) using our new tag set.</Paragraph> <Paragraph position="2"> As already mentioned, argumental labels A0-A5 are specific to a given verb or a given verb sense, thus their distribution is highly variable. To reduce variability, we add some of the tag-verb pairs licensing these argumental labels to the vocabu- null and the original SSN parser.</Paragraph> <Paragraph position="3"> lary of our model. We reach a total of 4970 tag-word pairs.3 This vocabulary comprises the original 512 pairs of the original SSN model, and our added pairs which must occur at least 10 times in the training data. Our vocabulary as well as the new 240 POS tags and the new 580 non-terminal labels are included in the set f of features input to the history representations as described in section 2.</Paragraph> <Paragraph position="4"> We perform two different evaluations on our model trained on PropBank data. Recall that we distinguish between two parsing tasks: the PropBank parsing task and the PTB parsing task.</Paragraph> <Paragraph position="5"> To evaluate the first parsing task, we compute the standard Parseval measures of labelled recall and precision of constituents, taking into account not only the 33 original labels but also the 580 newly introduced PropBank labels. This evaluation gives us an indication of how accurately and exhaustively we can recover this richer set of non-terminal labels. The results, computed on the testing data set from the PropBank, are shown on the first line of Table 1.</Paragraph> <Paragraph position="6"> To evaluate the PTB task, we compute the labelled recall and precision of constituents, ignoring the set of PropBank semantic role labels that our model assigns to constituents. This evaluation indicates how well we perform on the standard PTB parsing task alone, and its results on the testing data set from the PTB are shown on the second line of Table 1.</Paragraph> <Paragraph position="7"> The third line of Table 1 gives the performance on the simpler PTB parsing task of the original SSN parser (Henderson, 2003), that was trained on the PTB data sets contrary to our SSN model trained on the PropBank data sets.</Paragraph> </Section> <Section position="6" start_page="15" end_page="16" type="metho"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> These results clearly indicate that our model can perform the PTB parsing task at levels of per3Suchpairsconsistsofatagandawordtoken. Noattempt at collecting word types was made.</Paragraph> <Paragraph position="1"> formance comparable to state-of-the-art statistical parsing, by extensions that take the nature of the richer labels to be recovered into account. They also suggest that the relationship between syntactic PTB parsing and semantic PropBank parsing is strict enough that an integrated approach to the problem of semantic role labelling is beneficial.</Paragraph> <Paragraph position="2"> In particular, recent models of semantic role labelling separate input indicators of the correlation between the structural position in the tree and the semantic label, such as path, from those indicators that encode constraints on the sequence, such as the previously assigned role (Kwon et al., 2004).</Paragraph> <Paragraph position="3"> Inthisway,theycanneverencodedirectlytheconstraining power of a certain role in a given structural position onto a following node in its structural position. In our augmented model, we attempt to capture these constraints by directly modelling syntactic domains defined by the notion of c-command.</Paragraph> <Paragraph position="4"> Our results also confirm the findings in (Palmer et al., 2005). They take a critical look at some commonly used features in the semantic role labelling task, such as the path feature. They suggest that the path feature is not very effective because it is sparse. Its sparseness is due to the occurrence of intermediate nodes that are not relevant for the syntactic relations between an argument and its predicate. Our model of domains is less noisy, and consequently more robust, because it can focus only on c-commanding nodes bearing semantic role labels, thus abstracting away from those nodes that smear the pertinent relations.</Paragraph> <Paragraph position="5"> (Yi and Palmer, 2005) share the motivation of our work. Like the current work, they observe that the distributions of semantic labels could potentially interact with the distributions of syntactic labels and redefine the boundaries of constituents, thus yielding trees that reflect generalisations over both these sources of information.</Paragraph> <Paragraph position="6"> Toourknowledge, noresultshaveyetbeenpublished on parsing the PropBank. Accordingly, it is not possible to draw a straigthforward quantitative of-the-art semantic role labelling systems on the PropBank parsing task (1267 sentences from PropBank validating data sets; Propbank data sets are available at http://www.lsi.upc.edu/ srlconll/st05/st05.html). comparison between our PropBank SSN parser and other PropBank parsers. However, state-of-the-art semantic role labelling systems (CoNLL, 2005) use parse trees output by state-of-the-art parsers (Collins, 1999; Charniak, 2000), both for training and testing, and return partial trees annotated with semantic role labels. An indirect way of comparing our parser with semantic role labellers suggests itself. We merge the partial trees output by a semantic role labeller with the output of a parser it was trained on, and compute Prop-Bank parsing performance measures on the resulting parse trees. The first five lines of Table 2 report such measures for the five best semantic role labelling systems (Haghighi et al., 2005; Pradhan et al., 2005; Punyakanok et al., 2005; Marquez et al., 2005; Surdeanu and Turmo, 2005) according to (CoNLL, 2005). The partial trees output by these systems were merged with the parse trees returned by (Charniak, 2000)'s parser. These systems use (Charniak, 2000)'s parse trees both for training and testing as well as various other information sources including sets of n-best parse trees (Punyakanok et al., 2005; Haghighi et al., 2005) or chunks (Marquez et al., 2005; Pradhan et al., 2005) and named entities (Surdeanu and Turmo, 2005). While our preliminary results indicated in the last line of Table 2 are not state-of-the-art, they do demonstrate the viability of SSN parsers for joint inference of syntactic and semantic representations. null</Paragraph> </Section> class="xml-element"></Paper>