File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-3014_metho.xml
Size: 9,464 bytes
Last Modified: 2025-10-06 14:10:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-3014"> <Title>Parsing and Subcategorization Data</Title> <Section position="4" start_page="79" end_page="79" type="metho"> <SectionTitle> 2 Experiment Design </SectionTitle> <Paragraph position="0"> Three models will be investigated for parsing and extracting SCCs from the parser's output: 1. punc: leaving punctuation in both training and test data.</Paragraph> <Paragraph position="1"> 2. no-punc: removing punctuation from both training and test data.</Paragraph> <Paragraph position="2"> 3. punc-no-punc: removing punctuation from only test data.</Paragraph> <Paragraph position="3"> Following the convention in the parsing community, for written language, we selected sections 02-21 of WSJ as training data and section 23 as test data (Collins, 1999). For spoken language, we designated section 2 and 3 of Switchboard as training data and files of sw4004 to sw4135 of section 4 as test data (Roark, 2001). Since we are also interested in extracting SCCs from the parser's output, we eliminated from the two test corpora all sentences that do not contain verbs. Our experiments proceed in the following three steps: 1. Tag test data using the POS-tagger described in Ratnaparkhi (1996).</Paragraph> <Paragraph position="4"> 2. Parse the POS-tagged data using Bikel's parser.</Paragraph> <Paragraph position="5"> 1We use punctuation to refer to sentence-internal punctuation unless otherwise specified.</Paragraph> <Paragraph position="6"> label clause type desired SCCs 3. Extract SCCs from the parser's output. The extractor we built first locates each verb in the parser's output and then identifies the syntactic categories of all its sisters and combines them into an SCC. However, there are cases where the extractor has more work to do. * Finite and Infinite Clauses: In the Penn Treebank, S and SBAR are used to label different types of clauses, obscuring too much detail about the internal structure of each clause. Our extractor is designed to identify the internal structure of different types of clause, as shown in Table 1.</Paragraph> <Paragraph position="7"> * Passive Structures: As noted above, Roland and Jurafsky (Roland and Juraf null sky, 1998) have noticed that written language tends to have a much higher percentage of passive structures than spoken language. Our extractor is also designed to identify passive structures from the parser's output.</Paragraph> </Section> <Section position="5" start_page="79" end_page="82" type="metho"> <SectionTitle> 3 Experiment Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="79" end_page="80" type="sub_section"> <SectionTitle> 3.1 Parsing and SCCs </SectionTitle> <Paragraph position="0"> We used EVALB measures Labeled Recall (LR) and Labeled Precision (LP) to compare the parsing performance of different models. To compare the accuracy of SCCs proposed from the parser's output, we calculated SCC Recall (SR) and SCC Precision (SP). SR and SP are defined as follows:</Paragraph> <Paragraph position="2"> The results for parsing WSJ and Switchboard and extracting SCCs are summarized in Table 2.</Paragraph> <Paragraph position="3"> The LR/LP figures show the following trends: 1. Roark (2001) showed LR/LP of 86.4%/86.8% for punctuated written language, 83.4%/84.1% for unpunctuated written language. We achieve a higher accuracy in both punctuated and unpunctuated written language, and the decrease if punctuation is removed is less 2. For spoken language, Roark (2001) showed LR/LP of 85.2%/85.6% for punctuated spoken language, 84.0%/84.6% for unpunctuated spoken language. We achieve a lower accuracy in both punctuated and unpunctuated spoken language, and the decrease if punctuation is removed is less. The trends in (1) and (2) may be due to parser differences, or to the removal of sentences lacking verbs. 3. Unsurprisingly, if the test data is unpunctu- null ated, but the models have been trained on punctuated language, performance decreases sharply.</Paragraph> <Paragraph position="4"> In terms of the accuracy of extraction of SCCs, the results follow a similar pattern. However, the utility of punctuation turns out to be even smaller. Removing punctuation from both training and test data results in a less than 0.3% drop in the accuracy of SCC extraction.</Paragraph> <Paragraph position="5"> Figure 1 exhibits the relation between the accuracy of parsing and that of extracting SCCs. If we consider WSJ and Switchboard individually, there seems to exist a positive correlation between the accuracy of parsing and that of extracting SCCs. In other words, higher LR/LP indicates higher SR/SP. However, Figure 1 also shows that although the parser achieves a higher F-measure value for paring WSJ, it achieves a higher F-measure value when generating SCCs from Switchboard.</Paragraph> <Paragraph position="6"> The fact that the parser achieves a higher accuracy for extracting SCCs from Switchboard than WSJ merits further discussion. Intuitively, it seems to be true that the shorter an SCC is, the more likely that the parser is to get it right. This intuition is confirmed by the data shown in Figure 2. Figure 2 plots the accuracy level of extracting SCCs by SCC's length. It is clear from Figure 2 that as SCCs get longer, the F-measure value drops progressively for both WSJ and Switchboard. Again, Roland and Jurafsky (1998) have suggested that one major subcategorization difference between written and spoken corpora is that spoken corpora have a much higher percentage of the zero-anaphora construction. We then examined the distribution of SCCs of different length in WSJ and Switchboard. Figure 3 shows that SCCs of length 02 account for a much higher percentage in Switchboard than WSJ, but it is always the other way around for SCCs of non-zero length. This observation led us to believe that the better performance that Bikel's parser achieves in extracting SCCs from Switchboard may be attributed to the following two factors: 1. Switchboard has a much higher percentage of SCCs of length 0.</Paragraph> <Paragraph position="7"> 2. The parser is very accurate in extracting shorter SCCs.</Paragraph> </Section> <Section position="2" start_page="80" end_page="82" type="sub_section"> <SectionTitle> 3.2 Extraction of Dependents </SectionTitle> <Paragraph position="0"> In order to estimate the effects of SCCs of length 0, we examined the parser's performance in retrieving dependents of verbs. Every constituent (whether an argument or adjunct) in an SCC generated by the parser is considered a dependent of 2Verbs have a length-0 SCC if they are intransitive and have no modifiers.</Paragraph> <Paragraph position="1"> that verb. SCCs of length 0 will be discounted because verbs that do not take any arguments or adjuncts have no dependents3. In addition, this way of evaluating the extraction of SCCs also matches the practice in some NLP tasks such as semantic role labeling (Xue and Palmer, 2004). For the task of semantic role labeling, the total number of dependents correctly retrieved from the parser's output affects the accuracy level of the task.</Paragraph> <Paragraph position="2"> To do this, we calculated the number of dependents shared by between each SCC proposed from the parser's output and its corresponding SCC proposed from Penn Treebank. We based our calculation on a modified version of Minimum Edit Distance Algorithm. Our algorithm works by creating a shared-dependents matrix with one column for each constituent in the target sequence (SCCs proposed from Penn Treebank) and one 3We are aware that subjects are typically also considered dependents, but we did not include subjects in our experiments</Paragraph> <Paragraph position="4"> row for each constituent in the source sequence (SCCs proposed from the parser's output). Each cell shared-dependent[i,j] contains the number of constituents shared between the first i constituents of the target sequence and the first j constituents of the source sequence. Each cell can then be computed as a simple function of the three possible paths through the matrix that arrive there. The algorithm is illustrated in Table 3.</Paragraph> <Paragraph position="5"> Table 4 shows an example of how the algorithm works with NP-S-that-PP-in-INF as the target sequence and NP-NP-PP-in-ADVP-INF as the source sequence. The algorithm returns 3 as the number of dependents shared by two SCCs.</Paragraph> <Paragraph position="6"> We compared the performance of Bikel's parser in retrieving dependents from written and spoken language over all three models using Dependency Recall (DR) and Dependency Precision (DP). These metrics are defined as follows:</Paragraph> <Paragraph position="8"> The results of Bikel's parser in retrieving dependents are summarized in Figure 4. Overall, the parser achieves a better performance for WSJ over all three models, just the opposite of what have been observed for SCC extraction. Interestingly, removing punctuation from both the training and test data actually slightly improves the F-measure.</Paragraph> <Paragraph position="9"> This holds true for both WSJ and Switchboard.</Paragraph> <Paragraph position="10"> This Dependency F-measure differs in detail from similar measures in (Xue and Palmer, 2004). For present purposes all that matters is the relative value for WSJ and Switchboard.</Paragraph> </Section> </Section> class="xml-element"></Paper>