File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2034_evalu.xml
Size: 4,334 bytes
Last Modified: 2025-10-06 13:59:45
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2034"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Discriminative Reranking for Semantic Parsing</Title> <Section position="7" start_page="267" end_page="268" type="evalu"> <SectionTitle> 4.2 Results </SectionTitle> <Paragraph position="0"> Table 2 shows the results comparing the base-line learner SCISSOR using both the back-off parameters in Ge and Mooney (2005) (SCISSOR) and the revised parameters in Section 2.2 (SCISSOR+).</Paragraph> <Paragraph position="1"> As we expected, SCISSOR+ has better recall and worse precision than SCISSOR on both corpora due to the additional levels of back-off. SCISSOR+ is used as the baseline model for all reranking experiments in the next section.</Paragraph> <Paragraph position="2"> Table 3 gives oracle recalls for CLANG and GEOQUERY where an oracle picks the correct parse from the n-best SAPTs if any of them are correct. Results are shown for increasing values of n. The trends for CLANG and GEOQUERY are different: small values of n show significant improvements for CLANG, while a larger n is needed to improve results for GEOQUERY.</Paragraph> <Paragraph position="3"> In this section, we describe the experiments with reranking models utilizing different feature sets. All models include the score assigned to a SAPT by the baseline model as a special feature. Table 4 shows results using different feature sets derived directly from SAPTs. In general, reranking improves the performance of semantic parsing on CLANG, but not on GEOQUERY. This could be explained by the different oracle recall trends of CLANG and GEOQUERY. We can see that in Table 3, even a small n can increase the oracle score on CLANG significantly, but not on GEOQUERY.</Paragraph> <Paragraph position="4"> With the baseline score included as a feature, correct SAPTs closer to the top are more likely to be reranked to the top than the ones in the back, thus CLANG is more likely to have more sentences reranked correct than GEOQUERY. On CLANG, using the semantic feature set alone achieves the best improvements over the baseline with 2.8% absolute improvement in F-measure (15.8% relative error reduction), which is significant at the 95% confidence level using a paired Student's ttest. Nevertheless, the difference between SEM1 and SYN+SEM1 is very small (only one example).</Paragraph> <Paragraph position="5"> Using syntactic features alone only slightly improves the results because the syntactic features do not directly discriminate between correct and incorrect meaning representations. To put this in perspective, Charniak and Johnson (2005) reported that reranking improves the F-measure of syntactic parsing from 89.7% to 91.0% with a 50best oracle F-measure score of 96.8%.</Paragraph> <Paragraph position="6"> Table 5 compares results using semantic features directly derived from SAPTs (SEM1), and from trees with purely-syntactic nodes removed (SEM2). It compares reranking models using these SAPTs, and semantic features from trees with purely-syntactic nodes removed. The symbol SEM1 and SEM2 refer to the semantic feature sets in Section 3.2.1 and 3.2.1 respectively, and SYN refers to the syntactic feature set in Section 3.1.</Paragraph> <Paragraph position="7"> feature sets alone and together, and using them along with the syntactic feature set (SYN) alone and together. Overall, SEM1 provides better results than SEM2 on CLANG and slightly worse results on GEOQUERY (only in one sentence), regardless of whether or not syntactic features are included. Using both semantic feature sets does not improve the results over just using SEM1. On one hand, the better performance of SEM1 on CLANG contradicts our expectation because of the reasons discussed in Section 3.2.2; the reason behind this needs to be investigated. On the other hand, however, it also suggests that the semantic features derived directly from SAPTs can provide good evidence for semantic correctness, even with redundant purely syntactically motivated features.</Paragraph> <Paragraph position="8"> We have also informally experimented with smoothed semantic features utilizing domain ontology given by CLANG, which did not show improvements over reranking models not using these features.</Paragraph> </Section> class="xml-element"></Paper>