File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-1056_evalu.xml
Size: 4,086 bytes
Last Modified: 2025-10-06 13:59:39
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1056"> <Title>Learning for Semantic Parsing with Statistical Machine Translation</Title> <Section position="7" start_page="443" end_page="444" type="evalu"> <SectionTitle> 6 Experiments </SectionTitle> <Paragraph position="0"> We evaluated WASP in the ROBOCUP and GEOQUERY domains (see Section 2). To build a corpus for ROBOCUP, 300 pieces of coach advice were randomly selected from the log files of the 2003 ROBOCUP Coach Competition, which were manually translated into English (Kuhlmann et al., 2004). The average sentence length is 22.52. To build a corpus for GEOQUERY, 880 English questions were gathered from various sources, which were manually translated into the functional GEOQUERY language (Tang and Mooney, 2001). The average sentence length is 7.48, much shorter than ROBOCUP.</Paragraph> <Paragraph position="1"> 250 of the queries were also translated into Spanish, Japanese and Turkish, resulting in a smaller, multi-lingual data set.</Paragraph> <Paragraph position="2"> For each domain, there was a minimal set of initial rules representing knowledge needed for translating basic domain entities. These rules were always included in a lexicon. For example, in GEO-QUERY, the initial rules were: NUM - <x,x> , for all x [?] CA; CITY - <c, cityid('c', )> , for all city names c (e.g. new york); and similar rules for other types of names (e.g. rivers). Name translationswereprovidedforthemultilingualdataset(e.g. null training time by more than half with similar accuracy. CITY - <nyuu yooku, cityid('new york', )> for Japanese).</Paragraph> <Paragraph position="3"> Standard 10-fold cross validation was used in our experiments. A semantic parser was learned from the training set. Then the learned parser was used to translate the test sentences into MRs. Translation failed when there were constructs that the parser did not cover. We counted the number of sentences that weretranslatedintoanMR,andthenumberoftranslations that were correct. For ROBOCUP, a translation was correct if it exactly matched the correct MR. For GEOQUERY, a translation was correct if it retrieved the same answer as the correct query. Using these counts, we measured the performance of the parser in terms of precision (percentage of translations that were correct) and recall (percentage of test sentences that were correctly translated). For ROBOCUP, it took 47 minutes to learn a parser using IIS. For GEOQUERY, it took 83 minutes.</Paragraph> <Paragraph position="4"> Figure 6 shows the performance of WASP compared to four other algorithms: SILT (Kate et al., 2005), COCKTAIL (Tang and Mooney, 2001), SCISSOR (Ge and Mooney, 2005) and Zettlemoyer and Collins (2005). Experimental results clearly show the advantage of extra supervision in SCISSOR and Zettlemoyer and Collins's parser (see Section 1).</Paragraph> <Paragraph position="5"> However, WASP performs quite favorably compared to SILT and COCKTAIL, which use the same training data. In particular, COCKTAIL, a deterministic shift-reduce parser based on inductive logic programming, fails to scale up to the ROBOCUP domain where sentences are much longer, and crashes on larger training sets due to memory overflow.</Paragraph> <Paragraph position="6"> WASP also outperforms SILT in terms of recall, where lexical learning is done by a local bottom-up search, which is much less effective than the wordalignment-based algorithm in WASP.</Paragraph> <Paragraph position="7"> the multilingual GEOQUERY data set. The languages being considered differ in terms of word order: Subject-Verb-Object for English and Spanish, and Subject-Object-Verb for Japanese and Turkish.</Paragraph> <Paragraph position="8"> WASP's performance is consistent across these languages despite some slight differences, most probably due to factors other than word order (e.g. lower recall for Turkish due to a much larger vocabulary).</Paragraph> <Paragraph position="9"> Details can be found in a longer version of this paper (Wong, 2005).</Paragraph> </Section> class="xml-element"></Paper>