File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-1030_intro.xml
Size: 2,234 bytes
Last Modified: 2025-10-06 14:02:18
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-1030"> <Title>Shallow Semantic Parsing using Support Vector Machines</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Semantic Annotation and Corpora </SectionTitle> <Paragraph position="0"> We will be reporting on results using PropBank1 (Kingsbury et al., 2002), a 300k-word corpus in which predicate argument relations are marked for part of the verbs in the Wall Street Journal (WSJ) part of the Penn Tree-Bank (Marcus et al., 1994). The arguments of a verb are labeled ARG0 to ARG5, where ARG0 is the PROTO-AGENT (usually the subject of a transitive verb) ARG1 is the PROTO-PATIENT (usually its direct object), etc.</Paragraph> <Paragraph position="1"> PropBank attempts to treat semantically related verbs consistently. In addition to these CORE ARGUMENTS, additional ADJUNCTIVE ARGUMENTS, referred to as ARGMs are also marked. Some examples are ARGM-LOC, for locatives, and ARGM-TMP, for temporals. Figure 1 shows the syntax tree representation along with the argument labels for an example structure extracted from the PropBank corpus.</Paragraph> <Paragraph position="2"> Most of the experiments in this paper, unless specified otherwise, are performed on the July 2002 release of PropBank. A larger, cleaner, completely adjudicated version of PropBank was made available in Feb 2004.</Paragraph> <Paragraph position="3"> We will also report some final best performance numbers on this corpus. PropBank was constructed by assigning semantic arguments to constituents of the hand-corrected TreeBank parses. The data comprise several sections of the WSJ, and we follow the standard convention of using 21 were used for training. In the July 2002 release, the training set comprises about 51,000 sentences, instantiating about 132,000 arguments, and the test set comprises 2,700 sentences instantiating about 7,000 arguments. The Feb 2004 release training set comprises about 85,000 sentences instantiating about 250,000 arguments and the test set comprises 5,000 sentences instantiating about 12,000 arguments.</Paragraph> <Paragraph position="4"> [ARG0 He] [predicate talked] for [ARGM TMP about</Paragraph> </Section> class="xml-element"></Paper>