XML Viewer - w05-0622

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0622_metho.xml
Size: 4,352 bytes
Last Modified: 2025-10-06 14:09:55
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0622">
  <Title>Semantic Role Labelling with Tree Conditional Random Fields</Title>
  <Section position="5" start_page="169" end_page="169" type="metho">
    <SectionTitle>
3 Model
</SectionTitle>
    <Paragraph position="0"> We define a CRF over the labelling y given the observation tree x as:</Paragraph>
    <Paragraph position="2"> where C is the set of cliques in the observation tree, lk are the model's parameters and fk(*) is the feature function which maps a clique labelling to a vector of scalar values. The function Z(*) is the normalising function, which ensures that p is a valid probability distribution. This can be restated as:</Paragraph>
    <Paragraph position="4"> where C1 are the vertices in the graph and C2 are the maximal cliques in the graph, consisting of all (parent, child) pairs. The feature function has been split into g and h, each dealing with one and two node cliques respectively.</Paragraph>
    <Paragraph position="5"> Preliminary experimentation without any pair-wise features (h), was used to mimic a simple maximum entropy classifier. This model performed considerably worse than the model with the pair-wise features, indicating that the added complexity of modelling the parent-child interactions provides for more accurate modelling of the data.</Paragraph>
    <Paragraph position="6"> The log-likelihood of the training sample was optimised using limited memory variable metric (LMVM), a gradient based technique. This required the repeated calculation of the log-likelihood and its derivative, which in turn required the use of dynamic programming to calculate the marginal probability of each possible labelling of every clique using the sum-product algorithm (Pearl, 1988).</Paragraph>
  </Section>
  <Section position="6" start_page="169" end_page="170" type="metho">
    <SectionTitle>
4 Features
</SectionTitle>
    <Paragraph position="0"> As the conditional random field is conditioned on the observation, it allows feature functions to be defined over any part of the observation. The tree structure requires that features incorporate either a node labelling or the labelling of a parent and its  role labels, and the dotted and dashed edges are those which are pruned from the tree. child. We have defined node and pairwise clique features using data local to the corresponding syntactic node(s), as well as some features on the predicate itself.</Paragraph>
    <Paragraph position="1"> Each feature type has been made into binary feature functions g and h by combining (feature type, value) pairs with a label, or label pair, where this combination was seen at least once in the training data. The following feature types were employed, most of which were inspired by previous works: Basic features: {Head word, head PoS, phrase syntactic category, phrase path, position relative to the predicate, surface distance to the predicate, predicate lemma, predicate token, predicate voice, predicate sub-categorisation, syntactic frame}. These features are common to many SRL systems and are described in Xue and Palmer (2004).</Paragraph>
    <Paragraph position="2"> Context features {Head word of first NP in preposition phrase, left and right sibling head words and syntactic categories, first and last word in phrase yield and their PoS, parent syntactic category and head word}. These features are described in Pradhan et al. (2005).</Paragraph>
    <Paragraph position="3"> Common ancestor of the verb The syntactic category of the deepest shared ancestor of both the verb and node.</Paragraph>
    <Paragraph position="4"> Feature conjunctions The following features were conjoined: { predicate lemma + syntactic category, predicate lemma + relative position, syntactic category + first word of the phrase}.</Paragraph>
    <Paragraph position="5"> Default feature This feature is always on, which allows the classifier to model the prior probability distribution over the possible argument labels.</Paragraph>
    <Paragraph position="6"> Joint features These features were only defined over pair-wise cliques: {whether the parent and child head words do not match, parent syntactic category + and child syntactic category, parent relative position + child relative position, parent relative position + child relative position + predicate PoS + predicate lemma}.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML