File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1014_evalu.xml
Size: 5,656 bytes
Last Modified: 2025-10-06 13:58:39
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1014"> <Title>Converting Dependency Structures to Phrase Structures</Title> <Section position="4" start_page="6" end_page="8" type="evalu"> <SectionTitle> 4. EXPERIMENTS </SectionTitle> <Paragraph position="0"> So far, we have described two existing algorithms and proposed a new algorithm for converting d-trees into phrase structures. As explained at the beginning of Section 3, we evaluated the performance of the algorithms by comparing their output with an existing Treebank. Because there are no English dependency Treebanks available, we first ran the algorithm in Section 2 to produce d-trees from the PTB, then applied these three algorithms to the d-trees and compared the output with the original phrase structures in the PTB.</Paragraph> <Paragraph position="1"> The process is shown in Figure 9.</Paragraph> <Paragraph position="2"> The results are shown in Table 1, which use Section 0 of the PTB. The precision and recall rates are for unlabelled brackets. The last column shows the ratio of the number of brackets produced by the algorithms and the number of brackets in the original Treebank. From the table (especially the last column), it is clear that Algorithm 1 produces many more brackets than the original Treebank, resulting in a high recall rate but low precision rate. Algorithm 2 produces very flat structures, resulting in a low recall rate and high precision rate. Algorithm 3 produces roughly the same number of brackets as the Treebank and has the best recall rate, and its precision rate is almost as good as that of Algorithm 2.</Paragraph> <Paragraph position="3"> The differences between the output of the algorithms and the phrase structures in the PTB come from four sources: (S1) Annotation errors in the PTB (S2) Errors in the Treebank-specific tables used by the algorithms in Sections 2 and 3 (e.g., the head percolation table, the projection table, the argument table, and the modification table) Punctuation marks are not part of the d-trees produced by Lex-Tract. We wrote a simple program to attach them as high as possible to the phrase structures produced by the conversion algorithms. recall prec no-cross ave test/ Section 0 of the PTB (S3) The imperfection of the conversion algorithm in Section 2 (which converts phrase structures to d-trees) (S4) Mismatches between the heuristic rules used by the algorithms in Section 3 and the annotation schemata adopted by the PTB To estimate the contribution of (S1)-(S4) to the differences between the output of Algorithm 3 and the phrase structures in the PTB, we manually examined the first twenty sentences in Section 0. Out of thirty-one differences in bracketing, seven are due to (S1), three are due to (S2), seven are due to (S3), and the remaining fourteen mismatches are due to (S4).</Paragraph> <Paragraph position="4"> While correcting annotation errors to eliminate (S1) requires more human effort, it is quite straightforward to correct the errors in the Treebank-specific tables and therefore eliminate the mismatches caused by (S2). For (S3), we mentioned in Section 2 that the algorithm chose the wrong heads for the noun phrases with the appositive construction. As for (S4), we found several exceptions (as shown in Table 2) to the one-projection-chain-per-category assumption (i.e., for each POS tag, there is a unique projection chain), an assumption which was used by all three algorithms in Section 3. The performance of the conversion algorithms in Section 2 and 3 could be improved by using additional heuristic rules or statistical information. For instance, Algorithm 3 in Section 3 could use a heuristic rule that says that an adjective (JJ) projects to an NP if the JJ follows the determiner the and the JJ is not followed by a noun as in the rich are getting richer, and it projects to an ADJP in other cases. Notice that such heuristic rules are Treebank-dependent. most likely projection other projection(s) JJ ! ADJP JJ ! NP CD ! NP CD ! QP ! NP VBN ! VP ! S VBN ! VP ! RRC NN ! NP NN ! NX ! NP VBG ! VP ! S VBG ! PP chain Empty categories are often explicitly marked in phrase-structures, but they are not always included in dependency structures. We believe that including empty categories in dependency structures has many benefits. First, empty categories are useful for NLP applications such as machine translation. To translate a sentence from one language to another, many machine translation systems first create the dependency structure for the sentence in the source language, then produce the dependency structure for the target language, and finally generate a sentence in the target language. If the source language (e.g., Chinese and Korean) allows argument deletion and the target language (e.g., English) does not, it is crucial that the dropped argument (which is a type of empty category) is explicitly marked in the source dependency structure, so that the machine translation systems are aware of the existence of the dropped argument and can handle the situation accordingly. The second benefit of including empty categories in dependency structures is that it can improve the performance of the conversion algorithms in Section 3, because the phrase structures produced by the algorithms would then have empty categories as well, just like the phrase structures in the PTB. Third, if a sentence includes a non-projective construction such as wh-movement in English, and if the dependency tree did not include an empty category to show the movement, traversing the dependency tree would yield the wrong word order.</Paragraph> </Section> class="xml-element"></Paper>