File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2415_metho.xml

Size: 10,951 bytes

Last Modified: 2025-10-06 14:09:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2415">
  <Title>Hierarchical Recognition of Propositional Arguments with Perceptrons</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Semantic Role Labeling Strategy
</SectionTitle>
    <Paragraph position="0"> The strategy for recognizing propositional arguments in sentences is based on two main observations about argument structure in the data. The first observation is the relation of the arguments of a proposition with the chunk and clause hierarchy: a proposition places its arguments in the clause directly containing the verb (local clause), or in one of the ancestor clauses. Given a clause, we define the sequence of top-most syntactic elements as the words, chunks or clauses which are directly rooted at the clause. Then, arguments are formed as subsequences of top-most elements of a clause. Finally, for local clauses arguments are found strictly to the left or to the right of the target verb, whereas for ancestor clauses arguments are usually to the left of the verb. This observation holds for most of the arguments in the data. A general exception are arguments of type V, which are found only in the local clause, starting at the position of the target verb.</Paragraph>
    <Paragraph position="1"> The second observation is that the arguments of all propositions of a sentence do not cross their boundaries, and that arguments of a particular proposition are usually found strictly within an argument of a higher level proposition. Thus, the problem can be thought of as finding a hierarchy of arguments in which arguments are embedded inside others, and each argument is related to a number of propositions of a sentence in a particular role. If an argument is related to a certain verb, no other argument linking to the same verb can be found within it.</Paragraph>
    <Paragraph position="2"> The system presented in this paper translates these observations into constraints which are enforced to hold in a solution, and guide the recognition strategy. A limitation of the system is that it makes no attempt to recognize arguments which are split in many phrases.</Paragraph>
    <Paragraph position="3"> In what follows, x is a sentence, and xi is the i-th word of the sentence. We assume a mechanism to access the input information of x (PoS tags, chunks and clauses), as well as the set of target verbs V , represented by their position. A solution y 2 Y for a sentence x is a set of arguments of the form (s;e)kv, where (s;e) represents an argument spanning from word xs to word xe, playing a semantic role k 2 K with a verb v 2 V . Finally, [S;E] denotes a clause spanning from word xS to word sE.</Paragraph>
    <Paragraph position="4"> The SRL(x) function, predicting semantic roles of a sentence x, implements the following strategy:  1. Initialize set of arguments, A, to empty.</Paragraph>
    <Paragraph position="5"> 2. Define the level of each clause as its distance to the root clause.</Paragraph>
    <Paragraph position="6"> 3. Explore clauses bottom-up, i.e. from deeper levels to the root clause. For a clause [S;E]: A := A[ arg search(x;[S;E]) 4. Return A</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Building Argument Hierarchies
</SectionTitle>
      <Paragraph position="0"> Here we describe the function arg search, which builds a set of arguments organized hierarchically, within a clause [S;E] of a sentence x. The function makes use of two learning-based components, defined here and described below. First, a filtering function F, which, given a candidate argument, determines its plausible categories, or rejects it when no evidence for it being an argument is found. Second, a set of k-score functions, for each k 2 K, which, given an argument, predict a score of plausibility for it being of role type k of a certain proposition. The function arg search searches for the argument hierarchy which optimizes a global score on the hierarchy.</Paragraph>
      <Paragraph position="1"> As in earlier works, we define the global score ( ) as the summation of scores of each argument in the hierarchy.</Paragraph>
      <Paragraph position="2"> The function explores all possible arguments in the clause formed by contiguous top-most elements, and selects the subset which optimizes the global score function, forcing a hierarchy in which the arguments linked to the same verb do not embed.</Paragraph>
      <Paragraph position="3"> Using dynamic programming, the function can be computed in cubic time. It considers fragments of top-most elements, which are visited bottom-up, incrementally in length, until the whole clause is explored. While exploring, it maintains a two-dimensional matrix A of partial solutions: each position [s;e] contains the optimal argument hierarchy for the fragment from s to e. Finally, the solution is found at A[S;E]. For a fragment from s to e the algorithm is as follows:  1. A := A[s;r] [ A[r+1;e] where r := arg maxs r&lt;e A[s;r] + A[r+1;e] 2. For each prop v 2 V : (a) K := F((s;e);v) (b) Compute k0 such that k0 := arg maxk2K k-score((s;e);v;x) Set to the score of category k0.</Paragraph>
      <Paragraph position="4"> (c) Set Av as the arguments in A linked to v.</Paragraph>
      <Paragraph position="5"> (d) If (Av) &lt; then A := AnAv [f(s;e)k0v g 3. A[s;e] := A  Note that an argument is visited once, and that its score can be stored to efficiently compute the global score.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Start-End Filtering
</SectionTitle>
      <Paragraph position="0"> The function F determines which categories in K are plausible for an argument (s;e) to relate to a verb v.</Paragraph>
      <Paragraph position="1"> This is done via start-end filters (FkS and FkE), one for each type in K1. They operate on words, independently of verbs, deciding whether a word is likely to start or end some argument of role type k.</Paragraph>
      <Paragraph position="2"> The selection of categories is conditional to the relative level of the verb and the clause, and to the relative position of the verb and the argument. The conditions are: v is local to the clause, and (v=s) and FVE(xe):</Paragraph>
      <Paragraph position="4"> 1Actually, we share start-end filters for A0-A5 arguments.</Paragraph>
      <Paragraph position="5"> v is at deeper level, and (e&lt;v):</Paragraph>
      <Paragraph position="7"> where K(v) is the set of categories already assigned to the verb in deeper clauses.</Paragraph>
      <Paragraph position="8"> Otherwise, K is set to empty.</Paragraph>
      <Paragraph position="9"> Note that setting K to empty has the effect of filtering out the argument for the proposition. Note also that Start-End classifications do not depend on the verb, thus they can be performed once per candidate word, before entering the exploration of clauses. Then, when visiting a clause, the Start-End filtering can be performed with stored predictions.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Learning with Perceptrons
</SectionTitle>
    <Paragraph position="0"> In this section we describe the learning components of the system, namely start, end and score functions, and the Perceptron-based algorithm to train them together online.</Paragraph>
    <Paragraph position="1"> Each function is implemented using a linear separator, hw : Rn ! R, operating in a feature space defined by a feature extraction function, : X ! Rn, for some instance space X. The start-end functions (FkS and FkE) are formed by a prediction vector for each type, noted as wkS or wkE, and a shared representation function w which maps a word in context to a feature vector. A prediction is computed as FkS(x) = wkS w(x), and similarly for the FkE, and the sign is taken as the binary classification.</Paragraph>
    <Paragraph position="2"> The score functions compute real-valued scores for arguments (s;e)v. We implement these functions with a prediction vector wk for each type k 2 K, and a shared representation function a which maps an argument-verb pair to a feature vector. The score prediction for a type k is then given by the expression:</Paragraph>
    <Paragraph position="4"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Perceptron Learning Algorithm
</SectionTitle>
      <Paragraph position="0"> We describe a mistake-driven online algorithm to train prediction vectors together. The algorithm is essentially the same as the one introduced in (Collins, 2002). Let W be the set of prediction vectors: Initialize: 8w2W w := 0 For each epoch t := 1::: T, for each sentence-solution pair (x;y) in training:  1. ^y = SRLW (x) 2. learning feedback(W;x;y; ^y)</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Return W
3.2 Learning Feedback for Filtering-Ranking
</SectionTitle>
      <Paragraph position="0"> We now describe the learning feedback rule, introduced in earlier works (Carreras and M`arquez, 2004b). We differentiate two kinds of global errors in order to give feed-back to the functions being learned: missed arguments and over-predicted arguments. In each case, we identify the prediction vectors responsible for producing the incorrect argument and update them additively: vectors are moved towards instances predicted too low, and moved away from instances predicted too high.</Paragraph>
      <Paragraph position="1"> Let y be the gold set of arguments for a sentence x, and ^y those predicted by the SRL function. Let goldS(xi;k) and goldE(xi;k) be, respectively, the perfect indicator functions for start and end boundaries of arguments of type k. That is, they return 1 if word xi starts/ends some k-argument in y and -1 otherwise. The feedback is as follows: Missed arguments: 8(s;e)kv 2 y n^y: 1. Update misclassified boundary words: if (wkS w(xs) 0) then wkS = wkS + w(xs) if (wkE w(xe) 0) then wkE = wkE + w(xe) 2. Update score function, if applied:</Paragraph>
      <Paragraph position="3"> 2. Update words misclassified as S or E: if (goldS(xs;k)= 1) then wkS = wkS w(xs) if (goldE(xe;k)= 1) then wkE =wkE w(xe)</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Kernel Perceptrons with Averaged Predictions
</SectionTitle>
      <Paragraph position="0"> Our final architecture makes use of Voted Perceptrons (Freund and Schapire, 1999), which compute a prediction as an average of all vectors generated during training. Roughly, each vector contributes to the average proportionally to the number of correct positive training predictions the vector has made. Furthermore, a prediction vector can be expressed in dual form as a combination of training instances, which allows the use of kernel functions. We use standard polynomial kernels of degree 2.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML