File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0634_metho.xml

Size: 6,573 bytes

Last Modified: 2025-10-06 14:09:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0634">
  <Title>Semantic Role Chunking Combining Complementary Syntactic Views</Title>
  <Section position="5" start_page="218" end_page="218" type="metho">
    <SectionTitle>
FIRST AND LAST WORD/POS IN CONSTITUENT
ORDINAL CONSTITUENT POSITION
CONSTITUENT TREE DISTANCE
CONSTITUENT RELATIVE FEATURES: Nine features representing
</SectionTitle>
    <Paragraph position="0"> the phrase type, head word and head word part of speech of the parent, and left and right siblings of the constituent.</Paragraph>
  </Section>
  <Section position="6" start_page="218" end_page="218" type="metho">
    <SectionTitle>
SYNTACTIC FRAME
CONTENT WORD FEATURES: Content word, its POS and named entities
</SectionTitle>
    <Paragraph position="0"> in the content word</Paragraph>
  </Section>
  <Section position="7" start_page="218" end_page="218" type="metho">
    <SectionTitle>
CLAUSE-BASED PATH VARIATIONS:
</SectionTitle>
    <Paragraph position="0"> I. Replacing all the nodes in a path other than clause nodes with an &amp;quot;*&amp;quot;.</Paragraph>
    <Paragraph position="1"> For example, the path NP|S|VP|SBAR|NP|VP|VBD becomes NP|S|*S|*|*|VBD II. Retaining only the clause nodes in the path, which for the above example would produce NP|S|S|VBD, III. Adding a binary feature that indicates whether the constituent is in the same clause as the predicate, IV. collapsing the nodes between S nodes which gives NP|S|NP|VP|VBD. PATH N-GRAMS: This feature decomposes a path into a series of trigrams. For example, the path NP|S|VP|SBAR|NP|VP|VBD becomes: NP|S|VP, S|VP|SBAR, VP|SBAR|NP, SBAR|NP|VP, etc. We used the first ten trigrams as ten features. Shorter paths were padded with nulls.</Paragraph>
    <Paragraph position="2"> SINGLE CHARACTER PHRASE TAGS: Each phrase category is clustered to a category defined by the first character of the phrase label. PREDICATE CONTEXT: Two words and two word POS around the predicate and including the predicate were added as ten new features. PUNCTUATION: Punctuation before and after the constituent were added as two new features.</Paragraph>
    <Paragraph position="3"> FEATURE CONTEXT: Features for argument bearing constituents were added as features to the constituent being classified.</Paragraph>
    <Paragraph position="4">  and predicate sub-categorization). So for example, when assigning labels to constituents in a Charniak parse, all of the features in Table 1 were extracted from the Charniak tree, and in addition phrase, head word, head word POS, path and sub-categorization were extracted from the Collins tree. We have previously determined that using different sets of features for each argument (role) achieves better results than using the same set of features for all argument classes. A simple feature selection was implemented by adding features one by one to an initial set of features and selecting those that contribute significantly to the performance. As described in Pradhan et al. (2004), we post-process lattices of n-best decision using a trigram language model of argument sequences.</Paragraph>
    <Paragraph position="5"> Table 2 lists the features used by the chunker.</Paragraph>
    <Paragraph position="6"> These are the same set of features that were used in the CoNLL-2004 semantic role labeling task by Hacioglu, et al. (2004) with the addition of the two semantic argument (IOB) features. For each token (base phrase) to be tagged, a set of features is created from a fixed size context that surrounds each token.</Paragraph>
    <Paragraph position="7"> In addition to the features in Table 2, it also uses previous semantic tags that have already been assigned to the tokens contained in the linguistic context. A 5-token sliding window is used for the context.</Paragraph>
    <Paragraph position="8"> SVMs were trained for begin (B) and inside (I) classes of all arguments and an outside (O) class.</Paragraph>
  </Section>
  <Section position="8" start_page="218" end_page="218" type="metho">
    <SectionTitle>
WORDS
PREDICATE LEMMAS
PART OF SPEECH TAGS
</SectionTitle>
    <Paragraph position="0"> BP POSITIONS: The position of a token in a BP using the IOB2 representation (e.g. B-NP, I-NP, O, etc.) CLAUSE TAGS: The tags that mark token positions in a sentence with respect to clauses.</Paragraph>
    <Paragraph position="1"> NAMED ENTITIES: The IOB tags of named entities.</Paragraph>
    <Paragraph position="2"> TOKEN POSITION: The position of the phrase with respect to the predicate. It has three values as &amp;quot;before&amp;quot;, &amp;quot;after&amp;quot; and &amp;quot;-&amp;quot; (for the predicate) PATH: It defines a flat path between the token and the predicate HIERARCHICAL PATH: Since we have the syntax tree for the sentences, we also use the hierarchical path from the phrase being classified to the base phrase containing the predicate.</Paragraph>
    <Paragraph position="3"> CLAUSE BRACKET PATTERNS CLAUSE POSITION: A binary feature that identifies whether the token is inside or outside the clause containing the predicate HEADWORD SUFFIXES: suffixes of headwords of length 2, 3 and 4. DISTANCE: Distance of the token from the predicate as a number of base phrases, and the distance as the number of VP chunks. LENGTH: the number of words in a token.</Paragraph>
    <Paragraph position="4"> PREDICATE POS TAG: the part of speech category of the predicate PREDICATE FREQUENCY: Frequent or rare using a threshold of 3. PREDICATE BP CONTEXT: The chain of BPs centered at the predicate  within a window of size -2/+2.</Paragraph>
    <Paragraph position="5"> PREDICATE POS CONTEXT: POS tags of words immediately preceding and following the predicate.</Paragraph>
  </Section>
  <Section position="9" start_page="218" end_page="218" type="metho">
    <SectionTitle>
PREDICATE ARGUMENT FRAMES: Left and right core argument patterns
</SectionTitle>
    <Paragraph position="0"> around the predicate.</Paragraph>
    <Paragraph position="1"> DYNAMIC CLASS CONTEXT: Hypotheses generated for two preceeding phrases.</Paragraph>
    <Paragraph position="2"> NUMBER OF PREDICATES: This is the number of predicates in the sentence.</Paragraph>
  </Section>
  <Section position="10" start_page="218" end_page="218" type="metho">
    <SectionTitle>
CHARNIAK-BASED SEMANTIC IOB TAG: This is the IOB tag generated
</SectionTitle>
    <Paragraph position="0"> using the tagger trained on Charniak trees</Paragraph>
  </Section>
  <Section position="11" start_page="218" end_page="218" type="metho">
    <SectionTitle>
COLLINS-BASED SEMANTIC IOB TAG: This is the IOB tag generated
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
class="xml-element"></Paper>
Download Original XML