File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-3003_metho.xml
Size: 7,488 bytes
Last Modified: 2025-10-06 14:09:49
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-3003"> <Title>Efficient solving and exploration of scope ambiguities</Title> <Section position="4" start_page="9" end_page="11" type="metho"> <SectionTitle> 2 Technical Description </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="9" end_page="9" type="sub_section"> <SectionTitle> 2.1 Solving dominance graphs </SectionTitle> <Paragraph position="0"> At the core of utool is a solver for dominance graphs (Bodirsky et al., 2004) - graph representations of weakly normal dominance constraints, which constitute one of the main formalisms used in scope underspecification (Egg et al., 2001; Althaus et al., 2003). Dominance graphs are directed graphs with two kinds of edges, tree edges and dominance edges. They can be used to describe the set of all trees into which their tree edges can be embedded, in such a way that every dominance edge in the graph is realised as reachability in the tree. Dominance graphs are used as underspecified descriptions by describing sets of trees that are encodings of the formulas of some language of semantic representations, such as predicate logic.</Paragraph> <Paragraph position="1"> Fig. 1 shows an example of a constraint graph for the sentence &quot;every student reads a book.&quot; It consists of five tree fragments - sets of nodes that are connected by (solid) tree edges - which are connected by dominance edges (dotted lines). Two of the fragments have two holes each, into which other fragments can be &quot;plugged&quot;. The graph can be embedded into the two trees shown in the middle of Fig. 1, which correspond to the two readings of the sentence. By contrast, the graph cannot be embedded into the tree shown on the right: a dominance edge stipulates that &quot;readx,y&quot; must be reachable from &quot;somey&quot;, but it is not reachable from &quot;somey&quot; in the tree. We call the two trees into which the graph can be embedded its solutions.</Paragraph> <Paragraph position="2"> The Bodirsky et al. algorithm enumerates the solutions of a dominance graph (technically, its solved forms) by computing the set of its free fragments, which are the fragments that can occur at the root of some solution. Then it picks one of these fragments as the root and removes it from the graph. This splits the graph into several connected subgraphs, which are then solved recursively.</Paragraph> <Paragraph position="3"> This algorithm can call itself for the same sub-graph several times, which can waste a lot of time because the set of all solutions was already computed for the subgraph on the first recursive call.</Paragraph> <Paragraph position="4"> For this reason, our implementation caches intermediate results in a chart-like data structure. This data structure maps each subgraph G to a set of splits, each of which records which fragment of G should be placed at the root of the solution, what the sub-graphs after removal of this fragment are, and how their solutions should be plugged into the holes of the fragment. In the worst case, the chart can have exponential size; but in practice, it is much smaller than the set of all solutions. For example, the chart for (1) contains 74.960 splits, which is a tiny number compared to the 2.4 trillion readings, and can be computed in a few seconds.</Paragraph> <Paragraph position="5"> Now solving becomes a two-phase process. In the first phase, the chart data structure is filled by a run of the algorithm. In the second phase, the complete solutions are extracted from the chart. Although the first phase is conceptually much more complex than the second one because it involves interesting graph algorithms whose correctness isn't trivial to prove, it takes only a small fraction of the entire runtime in practice.</Paragraph> <Paragraph position="6"> Instead of enumerating all readings from the chart, we can also compute the number of solutions represented by the chart. For each split, we compute the numbers of solutions of the fragment sets in the split. Then we multiply these numbers (choices for the children can be combined freely). Finally, we obtain the number of solutions for a subgraph by adding the numbers of solutions of all its splits. This computation takes linear time in the size of the chart.</Paragraph> </Section> <Section position="2" start_page="9" end_page="9" type="sub_section"> <SectionTitle> 2.2 Translating between formalisms </SectionTitle> <Paragraph position="0"> One of the most significant obstacles in the development of tools and resources for scope underspecification is that different resources (such as grammars and solvers) are built for different underspecification formalisms. To help alleviate this problem, utool can read and write underspecified descriptions and write out solutions in a variety of different formats: The input and output functionality is provided by codecs, which translate between descriptions in one of these formalisms and the internal dominance graph format. The codecs for MRS and Hole Semantics are based on the (non-trivial) translations in (Koller et al., 2003; Niehren and Thater, 2003) and are only defined on nets, i.e. constraints whose graphs satisfy certain structural restrictions. This is not a very limiting restriction in practice (Flickinger et al., 2005). utool also allows the user to test efficiently whether a description is a net.</Paragraph> <Paragraph position="1"> In practice, utool can be used to convert descriptions between the three underspecification formalisms. Because the codecs work with concrete syntaxes that are used in existing systems, utool can be used as a drop-in replacement e.g. in the LKB grammar development system (Copestake and Flickinger, 2000).</Paragraph> </Section> <Section position="3" start_page="9" end_page="11" type="sub_section"> <SectionTitle> 2.3 Runtime comparison </SectionTitle> <Paragraph position="0"> To illustrate utool's performance, we compare its runtimes for the enumeration task with the (already quite efficient) MRS constraint solver of the LKB system (Copestake and Flickinger, 2000). Our data set consists of the 850 MRS-nets extracted from the in the data set, and the dashed line shows the constraints that the LKB solver could solve.</Paragraph> <Paragraph position="1"> Rondane treebank which have less than one million solutions (see Fig. 2). Fig. 3 displays the runtimes for enumerating all solutions, divided by the number of solutions, for both solvers. The horizontal axis shows the description sizes (number of tree fragments), and the (logarithmic!) vertical axis shows the average runtime per solution for descriptions of this size.</Paragraph> <Paragraph position="2"> Due to memory limitations, the LKB solver could only solve descriptions with up to 21 tree fragments, which account for 80% of the test data. utool solved all descriptions in the test set. The evaluation was done using a 1.2 GHz PC with 2 GB of memory.</Paragraph> <Paragraph position="3"> The figure shows that utool is generally faster than the LKB solver, up to a factor of approx. 1000. We should note that the LKB solver displays a dramatically higher variation in runtimes for constraints of the same size. Note that for small constraints, the runtimes tend to be too small to measure them accurately. null nets in the Rondane treebank for LKB and utool.</Paragraph> </Section> </Section> class="xml-element"></Paper>