File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2415_evalu.xml

Size: 4,625 bytes

Last Modified: 2025-10-06 13:59:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2415">
  <Title>Hierarchical Recognition of Propositional Arguments with Perceptrons</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Features
</SectionTitle>
    <Paragraph position="0"> The features of the system are extracted from three types of elements: words, target verbs, and arguments. They are formed making use of PoS tags, chunks and clauses of the sentence. The functions w and a are defined in terms of a collection of feature extraction patterns, which are binarized in the functions: each extracted pattern forms a binary dimension indicating the existence of the pattern in a learning instance.</Paragraph>
    <Paragraph position="1"> Extraction on Words. The list of features extracted from a word xi is the following: PoS tag.</Paragraph>
    <Paragraph position="2"> Form, if the PoS tag does not match with the Perl regexp /^(CD|FW|J|LS|N|POS|SYM|V)/.</Paragraph>
    <Paragraph position="3"> Chunk type, of the chunk containing the word.</Paragraph>
    <Paragraph position="4"> Binary-valued flags: (a) Its chunk is one-word or multi-word; (b) Starts and/or ends, or is strictly within a chunk (3 flags); (c) Starts and/or ends clauses (2 flags); (d) Aligned with a target verb; and (e) First and/or last word of the sentence (2 flags). Given a word xi, the w function implements a 3 window, that is, it returns the features of the words xi+r, with 3 r +3, each with its relative position r.</Paragraph>
    <Paragraph position="5"> Extraction on Target Verbs. Given a target verb v, we extract the following features from the word xv: Form, PoS tag, and target verb infinitive form.</Paragraph>
    <Paragraph position="6"> Voice : passive, if xv has PoS tag VBN, and either its chunk is not VP or xv is preceded by a form of &amp;quot;to be&amp;quot; or &amp;quot;to get&amp;quot; within its chunk; otherwise active. Chunk type.</Paragraph>
    <Paragraph position="7"> Binary-valued flags: (a) Its chunk is multi-word or not; and (b) Starts and/or ends clauses (2 flags).</Paragraph>
    <Paragraph position="8"> Extraction on Arguments. The a function performs the following feature extraction for an argument (s;e) linked to a verb v: Target verb features, of verb v.</Paragraph>
    <Paragraph position="9"> Word features, of words s 1, s, e, and e+1, each anchored with its relative position.</Paragraph>
    <Paragraph position="10"> Distance of v to s and to e: for both pairs, a flag indicating if distance is f0;1; 1;&gt;1;&lt;1g.</Paragraph>
    <Paragraph position="11"> PoS Sequence, of PoS tags from s to e: (a) n-grams of size 2, 3 and 4; and (b) the complete PoS pattern, if it is less than 5 tags long.</Paragraph>
    <Paragraph position="12"> TOP sequence: tags of the top-most elements found strictly from s to e. The tag of a word is its PoS. The tag of a chunk is its type. The tag of a clause is its type (S) enriched as follows: if the PoS tag of the first word matches /^(IN|W|TO)/ the tag is enriched with the form of that word (e.g. S-to); if that word is a verb, the tag is enriched with its PoS (e.g. S-VBG); otherwise, it is just S. The following features are extracted: (a) n-grams of sizes 2, 3 and 4; (b) The complete pattern, if it is less than 5 tags long; and (c) Anchored tags of the first, second, penultimate and last elements.</Paragraph>
    <Paragraph position="13"> PATH sequence: tags of elements found between the argument and the verb. It is formed by a concatenation of horizontal tags and vertical tags. The horizontal tags correspond to the TOP sequence of elements at the same level of the argument, from it to the phrase containing the verb, both excluded. The vertical part is the list of tags of the phrases which contain the verb, from the phrase at the level of the argument to the verb. The tags of the PATH sequence are extracted as in the TOP sequence, with an additional mark indicating whether an element is horizontal to the left or to the right of the argument, or vertical. The following features are extracted: (a) n-grams of sizes 4 and 5; and (b) The complete pattern, if it is less than 5 tags long.</Paragraph>
    <Paragraph position="14"> Bag of Words: we consider the top-most elements of the argument which are not clauses, and extract all nouns, adjectives and adverbs. We then form a separate bag for each category.</Paragraph>
    <Paragraph position="15"> Lexicalization: we extract the form of the head of the first top-most element of the argument, via common head word rules; if the first element is a PP chunk, we also extract the head of the firstNPfound.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML