File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3211_intro.xml

Size: 3,218 bytes

Last Modified: 2025-10-06 14:02:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3211">
  <Title>Mixing Weak Learners in Semantic Parsing</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Shallow semantic parsing is the process of finding sentence constituents that play a semantic role relative to a target predicate and then labeling those constituents according to their respective roles. Specifying an event's agent, patient, location, time of occurrence, etc, can be useful for NLP tasks such as information extraction (c.f., Surdeanu et al., 2003), dialog understanding, question answering, text summarization, and machine translation. Example 1 depicts a semantic parse.</Paragraph>
    <Paragraph position="1"> (1) [Agent She] [P bought] [Patient the vase] [Locative in Egypt] We expand on previous semantic parsing work (Gildea and Jurafsky, 2002; Pradhan et al., 2003; Surdeanu et al., 2003) by presenting a novel algorithm worthy of further exploration, describing a technique to drastically reduce feature space size, and presenting statistically significant new features. The accuracy of the final system is 88.3% on the classification task using the PropBank (Kingsbury et al., 2002) corpus. This is just 0.6% off the best accuracy reported in the literature.</Paragraph>
    <Paragraph position="2"> The classification algorithm used here is a variant of Random Forests (RFs) (Breiman, 2001).</Paragraph>
    <Paragraph position="3"> This was motivated by Breiman's empirical studies of numerous datasets showing that RFs often have lower generalize error than AdaBoost (Freund and Schapire, 1997), are less sensitive to noise in the training data, and learn well from weak inputs, while taking much less time to train. RFs are also simpler to understand and implement than SVMs, leading to, among other things, easier interpretation of feature importance and interactions (c.f., Breiman, 2004), easier multi-class classification (requiring only a single training session versus one for each class), and easier problem-specific customization (e.g., by introducing prior knowledge).</Paragraph>
    <Paragraph position="4"> The algorithm described here is considerably different from those in (Breiman, 2001). It was significantly revised to better handle high dimensional categorical inputs and as a result provides much better accuracy on the shallow semantic parsing problem.</Paragraph>
    <Paragraph position="5"> The experiments reported here focus on the classification task - given a parsed constituent known to play a semantic role relative to a given predicate, decide which role is the appropriate one to assign to that constituent. Gold-standard sentence parses for test and training are taken from the PropBank dataset. We report results on two feature sets from the literature and a new feature set described here.</Paragraph>
    <Paragraph position="6"> In section 2, we describe the data used in the experiments. Section 3 details the classification algorithm. Section 4 presents the experimental results and describes each experiment's feature set. Section 5 provides a discussion and thoughts on future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML