File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0407_intro.xml

Size: 4,796 bytes

Last Modified: 2025-10-06 14:03:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0407">
  <Title>Engineering of Syntactic Features for Shallow Semantic Parsing</Title>
  <Section position="2" start_page="0" end_page="48" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The design of features for natural language processing tasks is, in general, a critical problem. The inherent complexity of linguistic phenomena, often characterized by structured data, makes difficult to find effective linear feature representations for the target learning models.</Paragraph>
    <Paragraph position="1"> In many cases, the traditional feature selection techniques (Kohavi and Sommerfield, 1995) are not so useful since the critical problem relates to feature generation rather than selection. For example, the design of features for a natural language syntactic parse-tree re-ranking problem (Collins, 2000) cannot be carried out without a deep knowledge about automatic syntactic parsing. The modeling of syntactic/semantic based features should take into account linguistic aspects to detect the interesting context, e.g. the ancestor nodes or the semantic dependencies (Toutanova et al., 2004).</Paragraph>
    <Paragraph position="2"> A viable alternative has been proposed in (Collins and Duffy, 2002), where convolution kernels were used to implicitly define a tree substructure space.</Paragraph>
    <Paragraph position="3"> The selection of the relevant structural features was left to the voted perceptron learning algorithm. Another interesting model for parsing re-ranking based on tree kernel is presented in (Taskar et al., 2004).</Paragraph>
    <Paragraph position="4"> The good results show that tree kernels are very promising for automatic feature engineering, especially when the available knowledge about the phenomenon is limited.</Paragraph>
    <Paragraph position="5"> Along the same line, automatic learning tasks that rely on syntactic information may take advantage of a tree kernel approach. One of such tasks is the automatic boundary detection of predicate arguments of the kind defined in PropBank (Kingsbury and Palmer, 2002). For this purpose, given a predicate p in a sentence s, we can define the notion of predicate argument spanning trees (PASTs) as those syntactic subtrees of s which exactly cover all and only the p's arguments (see Section 4.1). The set of nonspanning trees can be then associated with all the remaining subtrees of s.</Paragraph>
    <Paragraph position="6"> An automatic classifier which recognizes the spanning trees can potentially be used to detect the predicate argument boundaries. Unfortunately, the application of such classifier to all possible sentence subtrees would require an exponential execution time. As a consequence, we can use it only to decide for a reduced set of subtrees associated with a corresponding set of candidate boundaries. Notice how these can be detected by previous approaches  (e.g. (Pradhan et al., 2004)) in which a traditional boundary classifier (tbc) labels the parse-tree nodes as potential arguments (PA). Such classifiers, generally, are not sensitive to the overall argument structure. On the contrary, a PAST classifier (pastc) can consider the overall argument structure encoded in the associated subtree. This is induced by the PA subsets.</Paragraph>
    <Paragraph position="7"> The feature design for the PAST representation is not simple. Tree kernels are a viable alternative that allows the learning algorithm to measure the similarity between two PASTs in term of all possible tree substructures.</Paragraph>
    <Paragraph position="8"> In this paper, we designed and experimented a boundary classifier for predicate argument labeling based on two phases: (1) a first annotation of potential arguments by using a high recall tbc and (2) a PAST classification step aiming to select the correct substructures associated with potential arguments. Both classifiers are based on Support Vector Machines learning. The pastc uses the tree kernel function defined in (Collins and Duffy, 2002). The results show that the PAST classification can be learned with high accuracy (the f-measure is about 89%) and the impact on the overall boundary detection accuracy is good.</Paragraph>
    <Paragraph position="9"> In the remainder of this paper, Section 2 introduces the Semantic Role Labeling problem along with the boundary detection subtask. Section 3 defines the SVMs using the linear kernel and the parse tree kernel for boundary detection. Section 4 describes our boundary detection algorithm. Section 5 shows the preliminary comparative results between the traditional and the two-step boundary detection.</Paragraph>
    <Paragraph position="10"> Finally, Section 7 summarizes the conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML