File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-3003_intro.xml
Size: 3,821 bytes
Last Modified: 2025-10-06 14:03:07
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-3003"> <Title>Efficient solving and exploration of scope ambiguities</Title> <Section position="3" start_page="0" end_page="9" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> One of the most exciting recent developments in computational linguistics is that large-scale grammars which compute semantic representations are becoming available. Examples for such grammars are the HPSG English Resource Grammar (ERG) (Copestake and Flickinger, 2000) and the LFG Par-Gram grammars (Butt et al., 2002); a similar resource is being developed for the XTAG grammar (Kallmeyer and Romero, 2004).</Paragraph> <Paragraph position="1"> But with the advent of such grammars, a phenomenon that is sometimes considered a somewhat artificial toy problem of theoretical semanticists becomes a very practical challenge: the presence of scope ambiguities. Because grammars often uniformly treat noun phrases as quantifiers, even harmless-looking sentences can have surprisingly many readings. The median number of scope readings for the sentences in the Rondane Treebank (distributed with the ERG) is 55, but the treebank also contains extreme cases such as (1) below, which according to the ERG has about 2.4 trillion (1012) readings: null (1) Myrdal is the mountain terminus of the Flam rail line (or Flamsbana) which makes its way down the lovely Flam Valley (Flamsdalen) to its sea-level terminus at Flam. (Rondane 650) In order to control such an explosion of readings (and also to simplify the grammar design process), the developers of large-scale grammars typically use methods of packing or underspecification to specify the syntax-semantics interface. The general idea is that the parser doesn't compute all the individual scope readings, but only a compact underspecified description, from which the individual readings can then be extracted at a later stage of processing - but the underspecified description could also be used as a platform for the integration of lexical and context information, so as to restrict the set of possible readings without enumerating the wrong ones.</Paragraph> <Paragraph position="2"> Such an approach is only feasible if we have access to efficient tools that support the most important operations on underspecified descriptions. We present utool, the Swiss Army Knife of Underspecification, which sets out to do exactly this. It supports the following operations: 1. enumerate all scope readings represented by an underspecified description; 2. check whether a description has any readings, and compute how many readings it has without explicitly enumerating them; 3. convert underspecified descriptions between different underspecification formalisms (at this point, Minimal Recursion Semantics (Copestake et al., 2003), Hole Semantics (Bos, 1996), and dominance constraints/graphs (Egg et al., 2001; Althaus et al., 2003)).</Paragraph> <Paragraph position="3"> Our system is the fastest solver for underspecificied description available today; that is, it is fastest at solving Task 1 above (about 100.000 readings per second on a modern PC). It achieves this by implementing an efficient algorithm for solving dominance graphs (Bodirsky et al., 2004) and caching intermediate results in a chart data structure. To our knowledge, it is the only system that can do Tasks 2 and 3. It is only because utool can compute the number of readings without enumerating them that we even know that (1) has trillions of readings; even utool would take about a year to enumerate and count the readings individually.</Paragraph> <Paragraph position="4"> utool is implemented in C++, efficient and portable, open source, and freely downloadable from http://utool.sourceforge.net.</Paragraph> </Section> class="xml-element"></Paper>