File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2048_evalu.xml

Size: 7,589 bytes

Last Modified: 2025-10-06 13:59:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2048">
  <Title>Exploring the Potential of Intractable Parsers</Title>
  <Section position="8" start_page="373" end_page="375" type="evalu">
    <SectionTitle>
6 Experiments
</SectionTitle>
    <Paragraph position="0"> We employed a familiar experimental set-up. For training, we used sections 2 21 of the WSJ section of the Penn treebank. As a development set, we used the rst 20 les of section 22, and then saved section 23 for testing the nal model. One unconventional preprocessing step was taken. Namely, for the entire treebank, we compressed all unary chains into a single node, labeled with the label of the node furthest from the root. We did so in order to simplify our experiments, since the framework outlined in this paper allows only one label per labeling scheme per span. Thus by avoiding unary chains, we avoid the need for many labeling schemes or more complicated compound labels (labels like NP-NN ). Since our goal here was not to create a parsing tool but rather to explore the viability of this approach, this seemed a fair concession. It should be noted that it is indeed possible to create a fully general parser using our framework (for instance, by using the above idea of compound labels for unary chains).</Paragraph>
    <Paragraph position="1"> The main dif culty with this compromise is that it renders the familiar metrics of labeled precision and labeled recall incomparable with previous work (i.e. the LP of a set of candidate parses with respect to the unmodi ed test set differs from the LP with respect to the preprocessed test set).</Paragraph>
    <Paragraph position="2"> This would be a major problem, were it not for the existence of other metrics which measure only the quality of a parser's recursive decomposition of a sentence. Fortunately, such metrics do exist, thus we used cross-bracketing statistics as the basic measure of quality for our parser. The cross-bracketing score of a set of candidate parses with</Paragraph>
    <Paragraph position="4"> an arbitrary integer.</Paragraph>
    <Paragraph position="5"> respect to the unmodi ed test set is identical to the cross-bracketing score with respect to the preprocessed test set, hence our preprocessing causes no comparability problems as viewed by this metric.</Paragraph>
    <Paragraph position="6"> For our parsing model, we used an HLP H = &lt;L,&lt;,A,F,P&gt; with the following parameters. L consisted of three labeling schemes: the set Lwd of word labels, the set Lpt of preterminal labels, and the set Lnt of nonterminal labels. The order &lt; of the model variables was the unique order such that for all suitable integers i,j,k,l: (1) Sij &lt; Lwdij &lt; Lptij &lt; Lntij , (2) Lntij &lt; Skl iff span (i,j) is strictly shorter than span (k,l) or they have the same length and integer i is less than integer k. For auto-assignment function A, we essentially used the function in Figure 5, modi ed so that it automatically assigned null to model variables Lwdij and Lptij for i negationslash= j (i.e. no preterminal or word tagging of internal nodes), and to model variables Lntii (i.e. no nonterminal tagging of leaves, rendered unnecessary by our preprocessing step).</Paragraph>
    <Paragraph position="7"> Rather than incorporate part-of-speech tagging into the search process, we opted to pretag the sentences of our development and test sets with an off-the-shelf tagger, namely the Brill tagger (Brill, 1994). Thus the object of our computation was HLPDECODE(H, n, w), where n was the length of the sentence, and partial assignment w specied the word and PT labels of the leaves. Given this partial assignment, the job of HLPDECODE was to nd the most probable assignment of model variables Sij and Lntij for 1 [?] i &lt; j [?] n.</Paragraph>
    <Paragraph position="8"> The two probability models, P S and Pnt, were trained in the manner described in Section 4.</Paragraph>
    <Paragraph position="9"> Two decisions needed to be made: which features to use and which learning technique to employ. As for the learning technique, we used maximum entropy models, speci cally the implementation called MegaM provided by Hal Daume (Daum*e III, 2004). For P S, we needed features  of the Penn Treebank.</Paragraph>
    <Paragraph position="10"> that would be relevant to deciding whether a given span (i,j) should be considered a constituent. The basic building blocks we used are depicted in Figure 7. A few words of explanation are in order. By label(k), we mean the highest nonterminal label so far assigned that covers word k, or if such a label does not yet exist, then the preterminal label of k (recall that our model order was bottom-up). By category(k), we mean the category of the preterminal label of word k (given a coarser, hand-made categorization of preterminal labels that grouped all noun tags into one category, all verb tags into another, etc.). By signature(k,m), where k [?] m, we mean the sequence &lt;label(k),label(k + 1),...,label(m)&gt; , from which all consecutive sequences of identical labels are compressed into a single label. For instance, &lt;IN,NP,NP,V P,V P&gt; would become &lt;IN,NP,V P&gt; . Ad-hoc conjunctions of these basic binary features were used as features for our probability model P S. In total, approximately 800,000 such conjunctions were used.</Paragraph>
    <Paragraph position="11"> For Pnt, we needed features that would be relevant to deciding which nonterminal label to give to a given constituent span. For this somewhat simpler task, we used a subset of the basic features used for P S, shown in bold in Figure 7. Ad-hoc conjunctions of these boldface binary features were used as features for our probability model Pnt. In total, approximately 100,000 such conjunctions were used.</Paragraph>
    <Paragraph position="12"> As mentioned earlier, we used cross-bracketing statistics as our basis of comparision. These results as shown in Figure 8. CB denotes the average cross-bracketing, i.e. the overall percentage of candidate constituents that properly overlap with a constituent in the gold parse. 0CB denotes the percentage of sentences in the test set that exhibit no cross-bracketing. With a simple feature set, we manage to obtain performance comparable to the unlexicalized PCFG parser of (Klein and Manning, 2003) on the set of sentences of length  40 or less. On the subset of Section 23 consisting of sentences of length 100 or less, our parser slightly outperforms their results in terms of average cross-bracketing. Interestingly, our parser has a lower percentage of sentences exhibiting no cross bracketing. To reconcile this result with the superior overall cross-bracketing score, it would appear that when our parser does make bracketing errors, the errors tend to be less severe.</Paragraph>
    <Paragraph position="13"> The surprise was how quickly the parser performed. Despite its exponential worst-case time bounds, the search space turned out to be quite conducive to depth- rst branch-and-bound pruning. Using an unoptimized Java implementation on a 4x Opteron 848 with 16GB of RAM, the parser required (on average) less than 0.26 seconds per sentence to optimally parse the subset of Section 23 comprised of sentences of 40 words or less. It required an average of 0.48 seconds per sentence to optimally parse the sentences of 100 words or less (an average of less than 3.5 seconds per sentence for those sentences of length 41-100).</Paragraph>
    <Paragraph position="14"> As noted earlier, the parser requires space linear in the size of the sentence.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML