File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1622_metho.xml

Size: 10,584 bytes

Last Modified: 2025-10-06 14:10:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1622">
  <Title>Semantic Role Labeling via Instance-Based Learning</Title>
  <Section position="5" start_page="180" end_page="180" type="metho">
    <SectionTitle>
1. Instance base:
</SectionTitle>
    <Paragraph position="0"> All the training data is stored in a format similar to Bosch et al., (2004)--specifically, &amp;quot;Role, Predicate, Voice, Phrase type, Distance, Head Word, Path&amp;quot;. As an example instance, the second argument of a predicate &amp;quot;take&amp;quot; in the training data is stored as: A0 take active NP -1 classics NP |S |VP |VBD This format maps each argument to six feature dimensions + one classification.</Paragraph>
    <Paragraph position="1"> 2. Distance metric (Euclidean distance) is defined as: D(xi, xj) = [?][?][?] [?] S (ar(xi))-ar(xj))2 where r=1 to n (n = number of different classifications), and ar(x) is the r-th feature of instance x. If instances xi and xj are identical, then D(xi , xj )=0 otherwise D(xi , xj ) represents the vector distance between xi and xj .</Paragraph>
  </Section>
  <Section position="6" start_page="180" end_page="181" type="metho">
    <SectionTitle>
3. Classification function
</SectionTitle>
    <Paragraph position="0"> Given a query/test instance xq to be classified, let x1, ... xk denote the k instances from the training data that are nearest to xq. The classification function is</Paragraph>
    <Paragraph position="2"> where i =1 to k, v =1 to m (m = size of training data), d (a,b)=1 if a=b, 0 otherwise; and v denotes a semantic role for each instance of training data.</Paragraph>
    <Paragraph position="3"> Computational complexity for kNN is linear, such that TkNN -&gt; O( m * n ), which is proportional to the product of the number of features (m) and the number of training instances (n).</Paragraph>
    <Section position="1" start_page="180" end_page="181" type="sub_section">
      <SectionTitle>
2.2 Priority Maximum Likelihood (PML)
Estimation
</SectionTitle>
      <Paragraph position="0"> Gildea &amp; Jurafsky (2002), Gildea &amp; Hockenmaier (2003) and Palmer et al., (2005) use a statistical approach based on Maximum Likelihood method for SRL, with different backoff combina-</Paragraph>
      <Paragraph position="2"> tion methods in which selected probabilities are combined with linear interpolation. The probability estimation or Maximum Likelihood is based on the number of known features available.</Paragraph>
      <Paragraph position="3"> If the full feature set is selected the probability is calculated by P (r  |pr, vo, pt, di, hw, pa, pp) = # (r, pr, vo, pt, di, hw, pa, pp) / # (pr, vo, pt, di, hw, pa, pp) Gildea &amp; Jurafsky (2002) claims &amp;quot;there is a trade-off between more-specific distributions, which have higher accuracy but lower coverage, and less-specific distributions, which have lower accuracy but higher coverage&amp;quot; and that the selection of feature subsets is exponential; and that selection of combinations of different feature subsets is doubly exponential, which is NPcomplete. Gildea &amp; Jurafsky (2002) propose the backoff combination in a linear interpolation for both coverage and precision. Following their lead, the research presented here uses Priority Maximum Likelihood Estimation modified from the backoff combination as follows:</Paragraph>
      <Paragraph position="5"> where S il i = 1.</Paragraph>
      <Paragraph position="6"> Figure 2 depicts a graphic organization of the priority combination with more-specific distribution toward the top, similar to Palmer et al. (2005) but adding another preposition feature. The backoff lattice is consulted to calculate probabilities for whichever subset of features is available to combine. As Gildea &amp; Jurasksy (2002) state, &amp;quot;the less-specific distributions were used only when no data were present for any more-specific distribution. Thus, the distributions selected are arranged in a cut across the lattice representing the most-specific distributions for which data are  PML system originated from Gildea et al., (2002) The classification decision is made by the following calculation for each argument in a sentence: argmax r1 .. n P(r1...n  |f1,..n) This approach is described in more detail in Gildea and Jurasky (2002).</Paragraph>
      <Paragraph position="7"> The computational complexity of PML is hard to calculate due to the many different distributions at each priority level. In Figure 2, the two calculations P(r  |hw, pp), and P(r  |pt, di, vo, pp) belong to the global search, while the rest belong to a local search which can reduce the computational complexity. Examination of the details of execution time (described in the results section of this paper) show that a plot of the execution time exhibits logarithmic characteristics, implying that the computational complexity for PML is log-linear, such that TPML -&gt; O( m * log n ) where m denotes the size of features and n denotes the size of training data.</Paragraph>
    </Section>
    <Section position="2" start_page="181" end_page="181" type="sub_section">
      <SectionTitle>
2.3 Predicate-Argument Recognition Algo-
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"> predicate-argument recognition algorithm (PARA). PARA simply finds all boundaries for given predicates by browsing input parse-trees, such as given by Charniak's parser or hand-corrected parses. There are three major types of phrases including given predicates, which are VP, NP, and PP. Boundaries can be recognized within boundary areas or from the top levels of clauses (as in Xue &amp; Palmer, 2004). Figure 3 shows the basic algorithm of PARA, and more details can be found in Lin &amp; Smith (2006). The best state-of-the-art ML technique using the same syntactic information (Moschitti, 2005) only just outperforms a preliminary version of PARA in F1 from 80.72 to 81.52 for boundary recognition tasks. But PARA is much faster than all other existing techniques, and is therefore used for preprocessing in this study to minimize query time when applying instance-based learning to SRL. The computational complexity of PARA is constant.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="181" end_page="182" type="metho">
    <SectionTitle>
3 System Architecture
</SectionTitle>
    <Paragraph position="0"> There are two stages to this system: the building stage (comparable to training for inductive systems) and testing (or classification). The building stage shown in Figure 4 just stores all feature representations of training instances in memory without any calculations. All instances are stored in memory in the format described earlier, denoting {Role (r), Predicate (pr), Voice (vo),  Phrase Type (pt), Path (pa), Distance (di), Head Word (hw), Preposition in a PP (pp) }. Figure 5 characterizes the testing stage, where new instances are classified by matching their feature representation to all instances in memory in order to find the most similar instances. There are two tasks during the testing stage: Argument Identification (or Boundary recognition) performed by PARA, and Argument Classification (or Role Labeling) performed using either kNN or PML. This approach is thus a &amp;quot;lazy learning&amp;quot; strategy applied to SRL because no calculations occur during the building stage.</Paragraph>
  </Section>
  <Section position="8" start_page="182" end_page="183" type="metho">
    <SectionTitle>
4 Data, Evaluation, and Parsers
</SectionTitle>
    <Paragraph position="0"> The research outlined here uses the dataset released by the CoNLL-05 Shared Task (http://www.lsi.upc.edu/~srlconll/soft.html). It includes several Wall Street Journal sections with parse-trees from both Charniak's (2000) parser and Collins' (1999) parser. These sections are also part of the PropBank corpus (http://www.cis.upenn.edu/~treebank). WSJ sections 20 and 21 (with Charniak's parses) were used as test data. PARA operates directly on the parse tree. Evaluation is carried out using precision, recall and F1 measures of assignmentaccuracy of predicated arguments. Precision (p) is the proportion of arguments predicated by the system that are correct. Recall (r) is the proportion of correct arguments in the dataset that are predicated by the system.</Paragraph>
    <Paragraph position="1"> Finally, the F1 measure computes the harmonic mean of precision and recall, such that F1 =2*p*r / (p+r), and is the most commonly used primary measure when comparing different SRL systems.</Paragraph>
    <Paragraph position="2"> For consistency, the performance of PARA for boundary recognition is tested using the official evaluation script from CoNLL 2005, srl-eval.pl (http://www.lsi.upc.edu/~srlconll/soft.html) in all experiments presented in this paper. Related statistics of training data and testing data are outlined in Table 1. The average number of predicates in a sentence for WSJ02-21 is 2.27, and each predicate comes with an average of 2.64 arguments.</Paragraph>
    <Paragraph position="3"> Create_Boundary(predicate, tree) If the phrase type of the predicate == VP  - find the boundary area ( the closest S clause) - find NP before predicate - If there is no NP, then find the closest NP from Ancestors. - find if WHNP in it's siblings of the boundary area, if found // for what, which, that , who,...</Paragraph>
    <Paragraph position="4"> - if the word of the first WP's family is &amp;quot;what&amp;quot; then - add WHNP to boundary list else // not what, such as who which,...</Paragraph>
    <Paragraph position="5"> - find the closest NP from Ancestors - add the NP to the boundary list and add this WHNP to boundary list as reference of NP - add valid boundaries of the rest of constituents to boundary list. If phrase type of the predicate ==NP - find the boundary area ( the NP clause) - find RB(POS) before predicate and add to boundary list. - Add this predicate to boundary list.</Paragraph>
    <Paragraph position="6"> - Add the rest of word group after the predicate and before the end of the NP clause as a whole boundary to boundary list.</Paragraph>
    <Paragraph position="7"> If phrase type of the predicate ==PP - find the boundary area ( the PP clause) - find the closet NP from Ancestors if the lemma of the predicate is &amp;quot;include&amp;quot;, and add this NP to boundary list.(special for PropBank) - Add this predicate to boundary list.</Paragraph>
    <Paragraph position="8">  -Add the rest of children of this predicate to boundary list or add one closest NP outside the boundary area to boundary list if there is no child after this predicate.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML