XML Viewer - p04-1043

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1043_metho.xml
Size: 27,808 bytes
Last Modified: 2025-10-06 14:08:57
<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1043">
  <Title>A Study on Convolution Kernels for Shallow Semantic Parsing</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 summarizes the conclusions.
2 Predicate Argument Extraction: a
</SectionTitle>
    <Paragraph position="0"> standard approach Given a sentence in natural language and the target predicates, all arguments have to be recognized. This problem can be divided into two subtasks: (a) the detection of the argument boundaries, i.e. all its compounding words and (b) the classi cation of the argument type, e.g. Arg0 or ArgM in PropBank or Agent and Goal in FrameNet.</Paragraph>
    <Paragraph position="1"> The standard approach to learn both detection and classi cation of predicate arguments is summarized by the following steps:  1. Given a sentence from the training-set generate a full syntactic parse-tree; 2. let P and A be the set of predicates and the set of parse-tree nodes (i.e. the potential arguments), respectively; 3. for each pair &lt;p;a&gt; 2 P A:  extract the feature representation set, Fp;a; if the subtree rooted in a covers exactly the words of one argument of p, put Fp;a in T+ (positive examples), otherwise put it in T (negative examples).</Paragraph>
    <Paragraph position="2"> For example, in Figure 1, for each combination of the predicate give with the nodes N, S, VP, V, NP, PP, D or IN the instances F&amp;quot;give&amp;quot;;a are generated. In case the node a exactly covers Paul, a lecture or in Rome, it will be a positive instance otherwise it will be a negative one, e.g. F&amp;quot;give&amp;quot;;&amp;quot;IN&amp;quot;.</Paragraph>
    <Paragraph position="3"> To learn the argument classi ers the T + set can be re-organized as positive T +argi and negative T argi examples for each argument i. In this way, an individual ONE-vs-ALL classi er for each argument i can be trained. We adopted this solution as it is simple and e ective (Hacioglu et al., 2003). In the classi cation phase, given a sentence of the test-set, all its Fp;a are generated and classi ed by each individual classi er. As a nal decision, we select the argument associated with the maximum value among the scores provided by the SVMs, i.e.</Paragraph>
    <Paragraph position="4">  argmaxi2S Ci, where S is the target set of arguments. null - Phrase Type: This feature indicates the syntactic type of the phrase labeled as a predicate argument, e.g. NP for Arg1.</Paragraph>
    <Paragraph position="5"> - Parse Tree Path: This feature contains the path in the parse tree between the predicate and the argument phrase, expressed as a sequence of nonterminal labels linked by direction (up or down) symbols, e.g. V &amp;quot; VP # NP for Arg1.</Paragraph>
    <Paragraph position="6"> - Position: Indicates if the constituent, i.e. the potential argument, appears before or after the predicate in the sentence, e.g. after for Arg1 and before for Arg0. - Voice: This feature distinguishes between active or passive voice for the predicate phrase, e.g. active for every argument.</Paragraph>
    <Paragraph position="7"> - Head Word: This feature contains the headword of the evaluated phrase. Case and morphological information are preserved, e.g. lecture for Arg1.</Paragraph>
    <Paragraph position="8"> - Governing Category indicates if an NP is dominated by a sentence phrase or by a verb phrase, e.g. the NP associated with Arg1 is dominated by a VP.</Paragraph>
    <Paragraph position="9"> - Predicate Word: This feature consists of two components: (1) the word itself, e.g. gives for all arguments; and (2) the lemma which represents the verb normalized to lower case and in nitive form, e.g. give for all arguments. null  parse-tree in Figure 1.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Standard feature space
</SectionTitle>
      <Paragraph position="0"> The discovery of relevant features is, as usual, a complex task, nevertheless, there is a common consensus on the basic features that should be adopted. These standard features, rstly proposed in (Gildea and Jurasfky, 2002), refer to a at information derived from parse trees, i.e.</Paragraph>
      <Paragraph position="1"> Phrase Type, Predicate Word, Head Word, Governing Category, Position and Voice. Table 1 presents the standard features and exempli es how they are extracted from the parse tree in Figure 1.</Paragraph>
      <Paragraph position="2"> For example, the Parse Tree Path feature represents the path in the parse-tree between a predicate node and one of its argument nodes.</Paragraph>
      <Paragraph position="3"> It is expressed as a sequence of nonterminal labels linked by direction symbols (up or down), e.g. in Figure 1, V&amp;quot;VP#NP is the path between the predicate to give and the argument 1, a lecture. Two pairs &lt;p1;a1&gt; and &lt;p2;a2&gt; have two di erent Path features even if the paths differ only for a node in the parse-tree. This pre- null vents the learning algorithm to generalize well on unseen data. In order to address this problem, the next section describes a novel kernel space for predicate argument classi cation.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Support Vector Machine approach
</SectionTitle>
      <Paragraph position="0"> Given a vector space in &lt;n and a set of positive and negative points, SVMs classify vectors according to a separating hyperplane, H(~x) = ~w ~x + b = 0, where ~w 2 &lt;n and b 2 &lt; are learned by applying the Structural Risk Minimization principle (Vapnik, 1995).</Paragraph>
      <Paragraph position="1"> To apply the SVM algorithm to Predicate Argument Classi cation, we need a function : F ! &lt;n to map our features space F = ff1;::;fjFjg and our predicate/argument pair representation, Fp;a = Fz, into &lt;n, such that: Fz ! (Fz) = ( 1(Fz);::; n(Fz)) From the kernel theory we have that:</Paragraph>
      <Paragraph position="3"> where, Fi 8i 2 f1;::;lg are the training instances and the product K(Fi;Fz) =&lt; (Fi) (Fz)&gt; is the kernel function associated with the mapping . The simplest mapping that we can apply is (Fz) = ~z = (z1;:::;zn) where zi = 1 if fi 2 Fz otherwise zi = 0, i.e.</Paragraph>
      <Paragraph position="4"> the characteristic vector of the set Fz with respect to F. If we choose as a kernel function the scalar product we obtain the linear kernel</Paragraph>
      <Paragraph position="6"> Another function which is the current state-of-the-art of predicate argument classi cation is the polynomial kernel: Kp(Fx;Fz) = (c+~x ~z)d, where c is a constant and d is the degree of the polynom.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Convolution Kernels for Semantic
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Parsing
</SectionTitle>
      <Paragraph position="0"> We propose two di erent convolution kernels associated with two di erent predicate argument sub-structures: the rst includes the target predicate with one of its arguments. We will show that it contains almost all the standard feature information. The second relates to the sub-categorization frame of verbs. In this case, the kernel function aims to cluster together verbal predicates which have the same syntactic realizations. This provides the classi cation algorithm with important clues about the possible set of arguments suited for the target syntactic structure.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Predicate/Argument Feature
</SectionTitle>
      <Paragraph position="0"> (PAF) We consider the predicate argument structures annotated in PropBank or FrameNet as our semantic space. The smallest sub-structure which includes one predicate with only one of its arguments de nes our structural feature. For example, Figure 2 illustrates the parse-tree of the sentence &amp;quot;Paul delivers a talk in formal style&amp;quot;. The circled substructures in (a), (b) and (c) are our semantic objects associated with the three arguments of the verb to deliver, i.e. &lt;deliver, Arg0&gt;, &lt;deliver, Arg1&gt; and &lt;deliver, ArgM&gt;. Note that each predicate/argument pair is associated with only one structure, i.e. Fp;a contain only one of the circled sub-trees. Other important properties are the followings: (1) The overall semantic feature space F contains sub-structures composed of syntactic information embodied by parse-tree dependencies and semantic information under the form of  predicate/argument annotation.</Paragraph>
      <Paragraph position="1"> (2) This solution is e cient as we have to classify as many nodes as the number of predicate arguments.</Paragraph>
      <Paragraph position="2"> (3) A constituent cannot be part of two di er- null ent arguments of the target predicate, i.e. there is no overlapping between the words of two arguments. Thus, two semantic structures Fp1;a1 and Fp2;a21, associated with two di erent ar1Fp;a was de ned as the set of features of the object &lt;p; a&gt;. Since in our representations we have only one  predicate argument structures.</Paragraph>
      <Paragraph position="3"> guments, cannot be included one in the other. This property is important because a convolution kernel would not be e ective to distinguish between an object and its sub-parts.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Sub-Categorization Feature (SCF)
</SectionTitle>
      <Paragraph position="0"> The above object space aims to capture all the information between a predicate and one of its arguments. Its main drawback is that important structural information related to interargument dependencies is neglected. In order to solve this problem we de ne the Sub-Categorization Feature (SCF). This is the sub-parse tree which includes the sub-categorization frame of the target verbal predicate. For example, Figure 3 shows the parse tree of the sentence &amp;quot;He flushed the pan and buckled his belt&amp;quot;. The solid line describes the SCF of the predicate ush, i.e. Fflush whereas the dashed line tailors the SCF of the predicate buckle, i.e. Fbuckle. Note that SCFs are features for predicates, (i.e. they describe predicates) whereas PAF characterizes predicate/argument pairs.</Paragraph>
      <Paragraph position="1"> Once semantic representations are de ned, we need to design a kernel function to estimate the similarity between our objects. As suggested in Section 2 we can map them into vectors in &lt;n and evaluate implicitly the scalar product among them.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Predicate/Argument structure
Kernel (PAK)
</SectionTitle>
      <Paragraph position="0"> Given the semantic objects de ned in the previous section, we design a convolution kernel in a way similar to the parse-tree kernel proposed in (Collins and Du y, 2002). We divide our mapping in two steps: (1) from the semantic structure space F (i.e. PAF or SCF objects) to the set of all their possible sub-structures element in Fp;a with an abuse of notation we use it to indicate the objects themselves.</Paragraph>
      <Paragraph position="1">  structure associated with Arg 1 of Figure 2.</Paragraph>
      <Paragraph position="2"> F0 = ff01;::;f0jF0jg and (2) from F0 to &lt;jF0j. An example of features in F0 is given in Figure 4 where the whole set of fragments, F0deliver;Arg1, of the argument structure Fdeliver;Arg1, is shown (see also Figure 2).</Paragraph>
      <Paragraph position="3"> It is worth noting that the allowed sub-trees contain the entire (not partial) production rules. For instance, the sub-tree [NP [D a]] is excluded from the set of the Figure 4 since only a part of the production NP ! D N is used in its generation. However, this constraint does not apply to the production VP ! V NP PP along with the fragment [VP [V NP]] as the subtree [VP [PP [...]]] is not considered part of the semantic structure. Thus, in step 1, an argument structure Fp;a is mapped in a fragment set F 0p;a. In step 2, this latter is mapped into ~x = (x1;::;xjF0j) 2 &lt;jF0j, where xi is equal to the number of times that f0i occurs in F0p;a2.</Paragraph>
      <Paragraph position="4"> In order to evaluate K( (Fx); (Fz)) without evaluating the feature vector ~x and ~z we dene the indicator function Ii(n) = 1 if the sub-structure i is rooted at node n and 0 otherwise. It follows that i(Fx) = Pn2Nx Ii(n), where Nx is the set of the Fx's nodes. Therefore, the kernel can be written as:</Paragraph>
      <Paragraph position="6"> where Nx and Nz are the nodes in Fx and Fz, respectively. In (Collins and Du y, 2002), it has been shown that Pi Ii(nx)Ii(nz) = (nx;nz) can be computed in O(jNxj jNzj) by the following recursive relation:  (1) if the productions at nx and nz are di erent then (nx;nz) = 0; 2A fragment can appear several times in a parse-tree, thus each fragment occurrence is considered as a di erent element in F 0p;a.</Paragraph>
      <Paragraph position="7"> (2) if the productions at nx and nz are the same, and nx and nz are pre-terminals then (nx;nz) = 1; (3) if the productions at nx and nz are the same, and nx and nz are not pre-terminals then</Paragraph>
      <Paragraph position="9"> where nc(nx) is the number of the children of nx and ch(n;i) is the i-th child of the node n. Note that as the productions are the same ch(nx;i) = ch(nz;i).</Paragraph>
      <Paragraph position="10"> This kind of kernel has the drawback of assigning more weight to larger structures while the argument type does not strictly depend on the size of the argument (Moschitti and Bejan, 2004). To overcome this problem we can scale the relative importance of the tree fragments using a parameter for the cases (2) and (3), i.e. (nx;nz) = and</Paragraph>
      <Paragraph position="12"> respectively.</Paragraph>
      <Paragraph position="13"> It is worth noting that even if the above equations de ne a kernel function similar to the one proposed in (Collins and Du y, 2002), the sub-structures on which it operates are di erent from the parse-tree kernel. For example, Figure 4 shows that structures such as [VP [V] [NP]], [VP [V delivers ] [NP]] and [VP [V] [NP [DT] [N]]] are valid features, but these fragments (and many others) are not generated by a complete production, i.e. VP ! V NP PP. As a consequence they would not be included in the parse-tree kernel of the sentence.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Comparison with Standard
Features
</SectionTitle>
      <Paragraph position="0"> In this section we compare standard features with the kernel based representation in order to derive useful indications for their use: First, PAK estimates a similarity between two argument structures (i.e., PAF or SCF) by counting the number of sub-structures that are in common. As an example, the similarity between the two structures in Figure 2, F&amp;quot;delivers&amp;quot;;Arg0 and F&amp;quot;delivers&amp;quot;;Arg1, is equal to 1 since they have in common only the [V delivers] substructure. Such low value depends on the fact that di erent arguments tend to appear in di erent structures.</Paragraph>
      <Paragraph position="1"> On the contrary, if two structures di er only for a few nodes (especially terminals or near terminal nodes) the similarity remains quite high. For example, if we change the tense of the verb to deliver (Figure 2) in delivered, the [VP [V delivers] [NP]] subtree will be transformed in [VP [VBD delivered] [NP]], where the NP is unchanged. Thus, the similarity with the previous structure will be quite high as:  (1) the NP with all sub-parts will be matched and (2) the small di erence will not highly affect the kernel norm and consequently the nal score. The above property also holds for the SCF structures. For example, in Figure 3, KPAK ( (Fflush); (Fbuckle)) is quite high as  the two verbs have the same syntactic realization of their arguments. In general, at features do not possess this conservative property. For example, the Parse Tree Path is very sensible to small changes of parse-trees, e.g. two predicates, expressed in di erent tenses, generate two di erent Path features.</Paragraph>
      <Paragraph position="2"> Second, some information contained in the standard features is embedded in PAF: Phrase Type, Predicate Word and Head Word explicitly appear as structure fragments. For example, in Figure 4 are shown fragments like [NP [DT] [N]] or [NP [DT a] [N talk]] which explicitly encode the Phrase Type feature NP for the Arg 1 in Figure 2.b. The Predicate Word is represented by the fragment [V delivers] and the Head Word is encoded in [N talk]. The same is not true for SCF since it does not contain information about a speci c argument. SCF, in fact, aims to characterize the predicate with respect to the overall argument structures rather than a speci c pair &lt;p;a&gt;.</Paragraph>
      <Paragraph position="3"> Third, Governing Category, Position and Voice features are not explicitly contained in both PAF and SCF. Nevertheless, SCF may allow the learning algorithm to detect the active/passive form of verbs.</Paragraph>
      <Paragraph position="4"> Finally, from the above observations follows that the PAF representation may be used with PAK to classify arguments. On the contrary, SCF lacks important information, thus, alone it may be used only to classify verbs in syntactic categories. This suggests that SCF should be used in conjunction with standard features to boost their classi cation performance.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 The Experiments
</SectionTitle>
    <Paragraph position="0"> The aim of our experiments are twofold: On the one hand, we study if the PAF representation produces an accuracy higher than standard features. On the other hand, we study if SCF can be used to classify verbs according to their syntactic realization. Both the above aims can be carried out by combining PAF and SCF with the standard features. For this purpose we adopted two ways to combine kernels3: (1)</Paragraph>
    <Paragraph position="2"> sulting set of kernels used in the experiments is the following: Kpd is the polynomial kernel with degree d over the standard features.</Paragraph>
    <Paragraph position="3"> KPAF is obtained by using PAK function over the PAF structures.</Paragraph>
    <Paragraph position="5"> , i.e. the sum between the normalized4 PAF-based kernel and the normalized polynomial kernel.</Paragraph>
    <Paragraph position="6"> KPAF P = KPAF KpdjKPAFj jK pdj , i.e. the normalized product between the PAF-based kernel and the polynomial kernel.</Paragraph>
    <Paragraph position="8"> , i.e. the summation between the normalized SCF-based kernel and the normalized polynomial kernel.</Paragraph>
    <Paragraph position="9"> KSCF P = KSCF KpdjKSCFj jK pdj , i.e. the normalized product between SCF-based kernel and the polynomial kernel.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Corpora set-up
</SectionTitle>
      <Paragraph position="0"> The above kernels were experimented over two corpora: PropBank (www.cis.upenn.edu/ ace) along with Penn TreeBank5 2 (Marcus et al., 1993) and FrameNet.</Paragraph>
      <Paragraph position="1"> PropBank contains about 53,700 sentences and a xed split between training and testing which has been used in other researches e.g., (Gildea and Palmer, 2002; Surdeanu et al., 2003; Hacioglu et al., 2003). In this split, Sections from 02 to 21 are used for training, section 23 for testing and sections 1 and 22 as developing set. We considered all PropBank arguments6 from Arg0 to Arg9, ArgA and ArgM for a total of 122,774 and 7,359 arguments in training and testing respectively. It is worth noting that in the experiments we used the gold standard parsing from Penn TreeBank, thus our kernel structures are derived with high precision.  the function tags like SBJ and TMP as parsers usually are not able to provide this information.</Paragraph>
      <Paragraph position="2"> 6We noted that only Arg0 to Arg4 and ArgM contain enough training/testing data to a ect the overall performance.</Paragraph>
      <Paragraph position="3"> .edu/ framenet) we extracted all 24,558 sentences from the 40 frames of Senseval 3 task (www.senseval.org) for the Automatic Labeling of Semantic Roles. We considered 18 of the most frequent roles and we mapped together those having the same name. Only verbs are selected to be predicates in our evaluations. Moreover, as it does not exist a xed split between training and testing, we selected randomly 30% of sentences for testing and 70% for training.</Paragraph>
      <Paragraph position="4"> Additionally, 30% of training was used as a validation-set. The sentences were processed using Collins' parser (Collins, 1997) to generate parse-trees automatically.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Classi cation set-up
</SectionTitle>
      <Paragraph position="0"> The classi er evaluations were carried out using the SVM-light software (Joachims, 1999) available at svmlight.joachims.org with the default polynomial kernel for standard feature evaluations. To process PAF and SCF, we implemented our own kernels and we used them inside SVM-light.</Paragraph>
      <Paragraph position="1"> The classi cation performances were evaluated using the f1 measure7 for single arguments and the accuracy for the nal multi-class classi er. This latter choice allows us to compare the results with previous literature works, e.g.</Paragraph>
      <Paragraph position="2"> (Gildea and Jurasfky, 2002; Surdeanu et al., 2003; Hacioglu et al., 2003).</Paragraph>
      <Paragraph position="3"> For the evaluation of SVMs, we used the default regularization parameter (e.g., C = 1 for normalized kernels) and we tried a few cost-factor values (i.e., j 2 f0:1;1;2;3;4;5g) to adjust the rate between Precision and Recall. We chose parameters by evaluating SVM using Kp3 kernel over the validation-set. Both (see Section 3.3) and parameters were evaluated in a similar way by maximizing the performance of SVM using KPAF and KSCFjKSCFj + KpdjK pdj  respectively. These parameters were adopted also for all the other kernels.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Kernel evaluations
</SectionTitle>
      <Paragraph position="0"> To study the impact of our structural kernels we rstly derived the maximal accuracy reachable with standard features along with polynomial kernels. The multi-class accuracies, for Prop-Bank and FrameNet using Kpd with d = 1;::;5, are shown in Figure 5. We note that (a) the highest performance is reached for d = 3, (b) for PropBank our maximal accuracy (90.5%) 7f1 assigns equal importance to Precision P and Recall R, i.e. f1 = 2P RP+R .</Paragraph>
      <Paragraph position="1"> is substantially equal to the SVM performance (88%) obtained in (Hacioglu et al., 2003) with degree 2 and (c) the accuracy on FrameNet (85.2%) is higher than the best result obtained in literature, i.e. 82.0% in (Gildea and Palmer, 2002). This di erent outcome is due to a di erent task (we classify di erent roles) and a di erent classi cation algorithm. Moreover, we did not use the Frame information which is very im- null ferent degrees of the polynomial kernel.</Paragraph>
      <Paragraph position="2"> It is worth noting that the di erence between linear and polynomial kernel is about 3-4 percent points for both PropBank and FrameNet.</Paragraph>
      <Paragraph position="3"> This remarkable di erence can be easily explained by considering the meaning of standard features. For example, let us restrict the classication function CArg0 to the two features Voice  and Position. Without loss of generality we can assume: (a) Voice=1 if active and 0 if passive, and (b) Position=1 when the argument is af null ter the predicate and 0 otherwise. To simplify the example, we also assume that if an argument precedes the target predicate it is a subject, otherwise it is an object9. It follows that a constituent is Arg0, i.e. CArg0 = 1, if only one feature at a time is 1, otherwise it is not an Arg0, i.e. CArg0 = 0. In other words, CArg0 = Position XOR Voice, which is the classical example of a non-linear separable function that becomes separable in a superlinear space (Cristianini and Shawe-Taylor, 2000).</Paragraph>
      <Paragraph position="4"> After it was established that the best kernel for standard features is Kp3, we carried out all the other experiments using it in the kernel combinations. Table 2 and 3 show the single class (f1 measure) as well as multi-class classier (accuracy) performance for PropBank and FrameNet respectively. Each column of the two tables refers to a di erent kernel de ned in the 8Preliminary experiments indicate that SVMs can reach 90% by using the frame feature.</Paragraph>
      <Paragraph position="5"> 9Indeed, this is true in most part of the cases. previous section. The overall meaning is discussed in the following points: First, PAF alone has good performance, since in PropBank evaluation it outperforms the linear kernel (Kp1), 88.7% vs. 86.7% whereas in FrameNet, it shows a similar performance 79.5% vs. 82.1% (compare tables with Figure 5). This suggests that PAF generates the same information as the standard features in a linear space. However, when a degree greater than 1 is used for standard features, PAF is outperformed10.</Paragraph>
      <Paragraph position="6">  Second, SCF improves the polynomial kernel (d = 3), i.e. the current state-of-the-art, of about 3 percent points on PropBank (column SCF P). This suggests that (a) PAK can measure the similarity between two SCF structures and (b) the sub-categorization information provides e ective clues about the expected argument type. The interesting consequence is that SCF together with PAK seems suitable to automatically cluster di erent verbs that have the same syntactic realization. We note also that to fully exploit the SCF information it is necessary to use a kernel product (K1 K2) combination rather than the sum (K1 + K2), e.g. column SCF+P.</Paragraph>
      <Paragraph position="7"> Finally, the FrameNet results are completely di erent. No kernel combinations with both PAF and SCF produce an improvement. On 10Unfortunately the use of a polynomial kernel on top the tree fragments to generate the XOR functions seems not successful.</Paragraph>
      <Paragraph position="8"> the contrary, the performance decreases, suggesting that the classi er is confused by this syntactic information. The main reason for the di erent outcomes is that PropBank arguments are di erent from semantic roles as they are an intermediate level between syntax and semantic, i.e. they are nearer to grammatical functions. In fact, in PropBank arguments are annotated consistently with syntactic alternations (see the Annotation guidelines for Prop-Bank at www.cis.upenn.edu/ ace). On the contrary FrameNet roles represent the nal semantic product and they are assigned according to semantic considerations rather than syntactic aspects. For example, Cause and Agent semantic roles have identical syntactic realizations. This prevents SCF to distinguish between them.</Paragraph>
      <Paragraph position="9"> Another minor reason may be the use of automatic parse-trees to extract PAF and SCF, even if preliminary experiments on automatic semantic shallow parsing of PropBank have shown no important di erences versus semantic parsing which adopts Gold Standard parse-trees.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML