File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0626_metho.xml

Size: 8,256 bytes

Last Modified: 2025-10-06 14:09:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0626">
  <Title>Semantic Role Labeling via Consensus in Pattern-Matching</Title>
  <Section position="4" start_page="0" end_page="186" type="metho">
    <SectionTitle>
2 System Description
</SectionTitle>
    <Paragraph position="0"> An overview of the system architecture is shown in  sentence. We convert a sentence with words, and Charniak's information into a parsed tree as the input of GT-PARA. GT-PARA then converts the parse tree into a flat representation with all predicates and arguments expressed in [GPLVR] format; where G: Grammatical function - 5 denotes subject, 3 object, and 2 others; P: Phrase type of this boundary - 00 denotes ADJP, 01 ADVP, 02 NP, 03 PP, 04 S, 05 SBAR, 06 SBARQ, 07 SINV, 08 SQ, 09 VP, 10 WHADVP, 11 WHNP, 12 WHPP, and 13 Others L: Distance (and position) of the argument with respect to the predicate that follows V: Voice of the predicate, 0: active 1: passive R: Distance (and position) of the argument with respect to the preceding predicate (n.b. L and R are mutually exclusive).</Paragraph>
    <Paragraph position="1"> An example of the output of GT-PARA is shown in Figure 2. There is one predicate &amp;quot;take&amp;quot; in the sample input sentence. There are 4 arguments for that predicate, denoted as &amp;quot;302110&amp;quot;, &amp;quot;AM-MOD&amp;quot;, &amp;quot;203011&amp;quot;, and &amp;quot;302012&amp;quot; respectively. &amp;quot;302110&amp;quot; symbolizes the NP Object of distance 1 prior to the passive predicate. &amp;quot;203011&amp;quot; symbolizes an undefined PP argument (which  means it can be a core argument or an adjunct) with distance 1 after the passive predicate. And &amp;quot;302012&amp;quot; symbolizes a NP Object with distance 2 after the passive predicate.</Paragraph>
    <Paragraph position="2"> For all boundaries extracted by GT-PARA, we simply denote all boundaries with noun phrases (NP) or similar phrases, such as WHNP, SBAR, and so on, as core pattern candidates and all boundaries with prepositional phrases (PP), ADJP, ADVP, or similar phrases, such as WHADJP, WHADVP, and so on, as adjunct candidates. But there is no exact rule for defining a core role or an adjunct explicitly in a boundary span, for example, given a sentence where (1) P1 is done by P2. (P1 and P2 are two groups of words or phrases) We can guess P1 might be labeled with &amp;quot;A1&amp;quot;, and P2 with &amp;quot;A0&amp;quot; if there is no further feature information. But if the &amp;quot;head word&amp;quot; feature of P2 is &amp;quot;hour&amp;quot;, for example, P2 can be labeled with &amp;quot;AM-TMP&amp;quot; instead. Because there are some uncertainties between core roles and adjuncts before labeling, we use the Member Generator (in Figure 1) to create all possible combinations, called members, from the output of GT-PARA by changing ANs (Core Role Candidates) into AMs (Adjunct Candidates), or AMs into ANs, except core candidates before predicates. All possible combinations (members) for the example in Figure 1 are M1: [AN1, AM-MOD, V, AM1&lt;points&gt;(from), AN2]</Paragraph>
    <Paragraph position="4"> (change AM1 as AN3 and one AN2 as AM2) The output from the Member Generator is passed to the Role Classifier, which finds all possible roles for each member with suitable core roles and adjuncts according to a Database built up by training data, in which each predicate has different patterns associated with it, each pattern has different semantic roles, and each role has the following format.</Paragraph>
    <Paragraph position="5"> Role {Phrase type} &lt; Head Word&gt; (preposition) There is an additional Boolean voice for a predicate to show if the predicate is passive or active (0: denotes active, 1: denotes passive). Each pattern includes a count on the number of the same patterns learned from the training data (denoted as &amp;quot;[statistical figure]&amp;quot;). For example, eight patterns for a predicate lemma &amp;quot;take&amp;quot; are</Paragraph>
    <Paragraph position="7"> ments and adjuncts respectively. AN classifier finds a suitable core pattern for labeled core pattern candidates in each member generated by Member Generator according to  (1) the same numbers of core roles (2) the same prepositions for each core role (3) the same phrase types for each core role (4) the same voice (active or passive) AM classifier finds a suitable adjunct role for any labeled adjunct candidate in each member generated by Member Generator according to (1) the same Head Word (2) the same Phrase type (3) the highest statistical probability learned from  the training data The followings are the results for each member</Paragraph>
    <Section position="1" start_page="186" end_page="186" type="sub_section">
      <SectionTitle>
Role Classifier
</SectionTitle>
      <Paragraph position="0"> a 1 ,a 2 ,and a 3 are weights (a 1 &gt;&gt;a 2 &gt;&gt;a 3) used to rank the relative contribution of Rk , Vk , and Sk.</Paragraph>
      <Paragraph position="1"> Empirical studies led to the use of a so-called Maxlabeled-role Heuristic to derive suitable values for these weights.</Paragraph>
      <Paragraph position="2"> The final consensus decision for role classification is determined by calculating There are 3 roles labeled in M3, which are AN1 as A1, AM-MOD, AM2 as AM-TMP respectively.</Paragraph>
      <Paragraph position="3"> And there are 4 roles labeled in M4, which are</Paragraph>
      <Paragraph position="5"> So the pattern [A1 AM-MOD V A2(from) AM-TMP&lt;week&gt;] in M4 applied by Pattern 6 and Pattern 5 is selected due to the most roles labeled.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="186" end_page="186" type="metho">
    <SectionTitle>
3 Data and Evaluation
</SectionTitle>
    <Paragraph position="0"> We extracted patterns from the training data (WSJ Section 02 to 21) to build up a pattern database.</Paragraph>
    <Paragraph position="1"> Table 1 reveals sparseness of the pattern database.</Paragraph>
    <Paragraph position="2"> Twenty-six percent of predicates contain only one pattern, and fifteen two patterns. Seventy-five percents of predicates contain no more than 10 patterns. null  collected from training, WSJ Section 02-21 The evaluation software, srl-eval.pl, is available from CoNLL2005 Shared Task1 , which is the official script for evaluation of CoNLL-2005 Shared Task systems. In order to test boundary performance of GT-PARA, we simply convert all correct propositional arguments into A0s, except AM-</Paragraph>
  </Section>
  <Section position="6" start_page="186" end_page="187" type="metho">
    <SectionTitle>
4 Experimental Results
</SectionTitle>
    <Paragraph position="0"> The results of classification on the development, and test data of the CoNLL2005 shared task are outlined in Table 2. The overall results on the Development, Test-WSJ, Test-Brown, and Test-WSJ+Brown datasets for F-score are 65.78, 67.91, 58.58 and 66.72 respectively, which are moderate compared to the best result reported in CoNLL2004 Shared Task (Carreras et al., 2004) using partial trees and the result in (Pradhan et al., 2004). The results for boundary recognition via GT-PARA are summarized in Table 3.</Paragraph>
    <Paragraph position="2"> on the WSJ test (bottom), obtained by the system.</Paragraph>
    <Paragraph position="3"> The overall performance (F1: 76.43) on the WSJ Section 24 is not as good as on the WSJ Section 21 (F1: 85.78). The poor performance for the development was caused by more parser errors in the WSJ Section 24. Most parser errors are brought on by continuous phrases with commas and/or quotation marks.</Paragraph>
    <Paragraph position="4"> One interesting fact is that when we tested our system using the data in CoNLL2004 shared task, we found the result with the train data WSJ 15-18 on the WSJ 21 is 73.48 shown in Table 4, which increases about 7 points in the F1 score, compared to WSJ 24 shown in Table 2. We found the labeling accuracy for WSJ 24 is 87.73, which is close to 89.30 for WSJ Section 21. But the results of boundary recognition in Table 3 for the two data are 9.14 points different, which leads to the better performance in WSJ Section 21. Boundary recognition as mentioned in CoNLL004 does play a very important role in this system as well.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML