XML Viewer - p02-1042

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/p02-1042_metho.xml
Size: 15,287 bytes
Last Modified: 2025-10-06 14:07:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1042">
  <Title>Building Deep Dependency Structures with a Wide-Coverage CCG Parser</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Grammar
</SectionTitle>
    <Paragraph position="0"> In CCG, most language-specific aspects of the grammar are specified in the lexicon, in the form of syntactic categories that identify a lexical item as either a functor or argument. For the functors, the category specifies the type and directionality of the arguments and the type of the result. For example, the following category for the transitive verb bought specifies its first argument as a noun phrase (NP) to its right and its second argument as an NP to its left, and its result as a sentence:</Paragraph>
    <Paragraph position="2"> For parsing purposes, we extend CCG categories to express category features, and head-word and dependency information directly, as follows: (2) bought := a2 Sa7 dcla8 boughta3 NP1a4a6a5 NP2 The feature a7 dcla8 specifies the category's S result as a declarative sentence, bought identifies its head, and the numbers denote dependency relations. Heads and dependencies are always marked up on atomic categories (S, N, NP, PP, and conj in our implementation). null The categories are combined using a small set of typed combinatory rules, such as functional application and composition (see Steedman (2000) for details). Derivations are written as follows, with underlines indicating combinatory reduction and arrows indicating the direction of the application:  Formally, a dependency is defined as a 4-tuple: a18 h f a19 f a19 sa19 haa20 , where hf is the head word of the functor,2 f is the functor category (extended with head and dependency information), s is the argument slot, and ha is the head word of the argument--for example, the following is the object dependency yielded by the first step of derivation (3):  used via unification to pass head information from one category to another. For example, the expanded category for the control verb persuade is as follows: (5) persuade := a2a6a2 Sa7 dcla8 persuadea3 NP1a4a6a5a21a2 Sa7 toa8 2a3 NPXa4a6a4a6a5 NPX,3 The head of the infinitival complement's subject is identified with the head of the object, using the variable X. Unification then &amp;quot;passes&amp;quot; the head of the object to the subject of the infinitival, as in standard unification-based accounts of control.3 The kinds of lexical items that use the head passing mechanism are raising, auxiliary and control verbs, modifiers, and relative pronouns. Among the constructions that project unbounded dependencies are relativisation and right node raising. The following category for the relative pronoun category (for words such as who, which, that) shows how heads are co-indexed for object-extraction:</Paragraph>
    <Paragraph position="4"> The derivation for the phrase The company that Marks wants to buy is given in Figure 1 (with the features on S categories removed to save space, and the constant heads reduced to the first letter). Type-raising (a22 ) and functional composition (a23 ), along  automatic. For example, any word with the same category (5) as persuade gets the object-control extension. In certain rare cases (such as promise) this gives semantically incorrect dependencies in both the grammar and the data (promise Brooks to go has a structure meaning promise Brooks that Brooks will go). The company that Marks wants to buy</Paragraph>
    <Paragraph position="6"> with co-indexing of heads, mediate transmission of the head of the NP the company onto the object of buy. The corresponding dependencies are given in the following figure, with the convention that arcs point away from arguments. The relevant argument slot in the functor category labels the arcs.</Paragraph>
    <Paragraph position="7">  The towantscompany Marksthat buy Note that we encode the subject argument of the to category as a dependency relation (Marks is a &amp;quot;subject&amp;quot; of to), since our philosophy at this stage is to encode every argument as a dependency, where possible. The number of dependency types may be reduced in future work.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The Probability Model
</SectionTitle>
    <Paragraph position="0"> The DAG-like nature of the dependency structures makes it difficult to apply generative modelling techniques (Abney, 1997; Johnson et al., 1999), so we have defined a conditional model, similar to the model of Collins (1996) (see also the conditional model in Eisner (1996b)). While the model of Collins (1996) is technically unsound (Collins, 1999), our aim at this stage is to demonstrate that accurate, efficient wide-coverage parsing is possible with CCG, even with an over-simplified statistical model. Future work will look at alternative models.4 4The reentrancies creating the DAG-like structures are fairly limited, and moreover determined by the lexical categories. We conjecture that it is possible to define a generative model that includes the deep dependencies.</Paragraph>
    <Paragraph position="1"> The parse selection component must choose the most probable dependency structure, given the sen- null tence S. A sentence S a29 a18 w1</Paragraph>
    <Paragraph position="3"> is assumed to be a sequence of word, pos-tag pairs. For our purposes, a dependency structure pi</Paragraph>
    <Paragraph position="5"> cn is the sequence of categories assigned to the words, and</Paragraph>
    <Paragraph position="7"> m a36 is the set of dependencies. The probability of a dependency structure can be written as follows:</Paragraph>
    <Paragraph position="9"> where Xi is the local context for the ith word. We have explained elsewhere (Clark, 2002) how suitable features can be defined in terms of the a18 word, pos-tag a20 pairs in the context, and how maximum entropy techniques can be used to estimate the probabilities, following Ratnaparkhi (1996). We assume that each argument slot in the category sequence is filled independently, and write  where hai is the head word filling the argument slot of the ith dependency, and m is the number of dependencies entailed by the category sequence C.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Estimating the dependency probabilities
</SectionTitle>
      <Paragraph position="0"> The estimation method is based on Collins (1996).</Paragraph>
      <Paragraph position="1"> We assume that the probability of a dependency only depends on those words involved in the dependency, together with their categories. We follow Collins and base the estimate of a dependency probability on the following intuition: given a pair of words, with a pair of categories, which are in the same sentence, what is the probability that the words are in a particular dependency relationship? We again follow Collins in defining the following functions, where a41 is the set of words in the data, and a42 is the set of lexical categories.</Paragraph>
      <Paragraph position="3"> for aa45 c a48a50a49 and ba45 d a48a52a51 is the number of times that word-category pairs  where cai is the lexical category of the argument head ai. The normalising factor ensures that the probabilities for each argument slot sum to one over all the word-category pairs in the sequence.5 This factor is constant for the given category sequence, but not for different category sequences. However, the dependency structures with high enough Pa2 C</Paragraph>
      <Paragraph position="5"> to be among the highest probability structures are likely to have similar category sequences. Thus we ignore the normalisation factor, thereby simplifying the parsing process. (A similar argument is used by Collins (1996) in the context of his parsing model.) The estimate in equation 10 suffers from sparse data problems, and so a backing-off strategy is employed. We omit details here, but there are four levels of back-off: the first uses both words and both categories; the second uses only one of the words and both categories; the third uses the categories only; and a final level substitutes pos-tags for the categories.</Paragraph>
      <Paragraph position="6"> One final point is that, in practice, the number of dependencies can vary for a given category sequence (because multiple arguments for the same slot can 5One of the problems with the model is that it is deficient, assigning probability mass to dependency structures not licensed by the grammar.</Paragraph>
      <Paragraph position="7"> be introduced through coordination), and so a geometric mean of pa2 pia4 is used as the ranking function, averaged by the number of dependencies in D.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 The Parser
</SectionTitle>
    <Paragraph position="0"> The parser analyses a sentence in two stages. First, in order to limit the number of categories assigned to each word in the sentence, a &amp;quot;supertagger&amp;quot; (Bangalore and Joshi, 1999) assigns to each word a small number of possible lexical categories. The supertagger (described in Clark (2002)) assigns to each word all categories whose probabilities are within some constant factor, b, of the highest probability category for that word, given the surrounding context.</Paragraph>
    <Paragraph position="1"> Note that the supertagger does not provide a single category sequence for each sentence, and the final sequence returned by the parser (along with the dependencies) is determined by the probability model described in the previous section. The supertagger is performing two roles: cutting down the search space explored by the parser, and providing the categorysequence model in equation 8.</Paragraph>
    <Paragraph position="2"> The supertagger consults a &amp;quot;category dictionary&amp;quot; which contains, for each word, the set of categories the word was seen with in the data. If a word appears at least K times in the data, the supertagger only considers categories that appear in the word's category set, rather than all lexical categories.</Paragraph>
    <Paragraph position="3"> The second parsing stage applies a CKY bottom-up chart-parsing algorithm, as described in Steedman (2000). The combinatory rules currently used by the parser are as follows: functional application (forward and backward), generalised forward composition, backward composition, generalised backward-crossed composition, and typeraising. There is also a coordination rule which conjoins categories of the same type.6 Type-raising is applied to the categories NP, PP, and Sa7 adja8a66a3 NP (adjectival phrase); it is currently implemented by simply adding pre-defined sets of type-raised categories to the chart whenever an NP, PP or Sa7 adja8a66a3 NP is present. The sets were chosen on the basis of the most frequent type-raising rule instantiations in sections 02-21 of the CCGbank, which resulted in 8 type-raised categories for NP, 6Restrictions are placed on some of the rules, such as that given by Steedman (2000) for backward-crossed composition (p.62).</Paragraph>
    <Paragraph position="4"> and 2 categories each for PP and Sa7 adja8a66a3 NP.</Paragraph>
    <Paragraph position="5"> As well as combinatory rules, the parser also uses a number of lexical rules and rules involving punctuation. The set of rules consists of those occurring roughly more than 200 times in sections 02-21 of the CCGbank. For example, one rule used by the parser is the following: (12) Sa7 inga8a66a3 NP a67 NPXa3 NPX This rule creates a nominal modifier from an ing-form of a verb phrase.</Paragraph>
    <Paragraph position="6"> A set of rules allows the parser to deal with commas (all other punctuation is removed after the supertagging phase). For example, one kind of rule treats a comma as a conjunct, which allows the NP object in John likes apples, bananas and pears to have three heads, which can all be direct objects of like.7 The search space explored by the parser is reduced by exploiting the statistical model. First, a constituent is only placed in a chart cell if there is not already a constituent with the same head word, same category, and some dependency structure with a higher or equal score (where score is the geometric mean of the probability of the dependency structure). This tactic also has the effect of eliminating &amp;quot;spuriously ambiguous&amp;quot; entries from the chart-cf. Komagata (1997). Second, a constituent is only placed in a cell if the score for its dependency structure is within some factor, a, of the highest scoring dependency structure for that cell.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="379" type="metho">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> Sections 02-21 of the CCGbank were used for train-</Paragraph>
    <Paragraph position="2"> sentences).8 Sections 02-21 were also used to obtain the category set, by including all categories that appear at least 10 times, which resulted in a set of 398 category types.</Paragraph>
    <Paragraph position="3"> The word-category sequences needed for estimating the probabilities in equation 8 can be read directly from the CCGbank. To obtain dependencies</Paragraph>
    <Paragraph position="5"> Sa4 , we ran the parser over the trees, tracing out the combinatory rules applied during the derivation, and outputting the dependencies. This method was also applied to the trees in section 23 to provide the gold standard test set.</Paragraph>
    <Paragraph position="6"> Not all trees produced dependency structures, since not all categories and type-changing rules in the CCGbank are encoded in the parser. We obtained dependency structures for roughly 95% of the trees in the data. For evaluation purposes, we increased the coverage on section 23 to 99</Paragraph>
    <Paragraph position="8"> tences) by identifying the cause of the parse failures and adding the additional rules and categories when creating the gold-standard; so the final test set consisted of gold-standard dependency structures from  352 sentences. The coverage was increased to ensure the test set was representative of the full section. We emphasise that these additional rules and categories were not made available to the parser during testing, or used for training.</Paragraph>
    <Paragraph position="9"> Initially the parser was run with b a29 0  for the parser. A time-out was applied so that the parser was stopped if any sentence took longer than  ysis, with 206 timing out and 48 failing to parse. To deal with the 48 no-analysis cases, the cut-off for the category-dictionary, K, was increased to 100. Of the 48 cases, 23 sentences then received an analysis. To deal with the 206 time-out cases, b was increased to 0  05, which resulted in 181 of the 206 sentences then receiving an analysis, with 18 failing to parse, and 7 timing out. So overall, almost 98% of the 2  352 unseen sentences were given some analysis. null To return a single dependency structure, we chose the most probable structure from the Sa7 dcla8 categories spanning the whole sentence. If there was no such category, all categories spanning the whole string were considered.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML