File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2704_intro.xml

Size: 5,101 bytes

Last Modified: 2025-10-06 14:04:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2704">
  <Title>Towards an Alternative Implementation of NXT's Query Language via XQuery</Title>
  <Section position="3" start_page="28" end_page="29" type="intro">
    <SectionTitle>
3 Requirements
</SectionTitle>
    <Paragraph position="0"> We already have a successful NQL implementation as part of NXT, NXT Search. However, as always, there are a number of things that could be improved about it. We are considering a re-implementation with the following aims in mind: Faster query execution. Although many queries run quite quickly in NXT Search, more complicated queries can take long enough to execute on a large corpus that they have to be scheduled overnight. This is partially due to the approach of checking every possible combination of the variables declared in the query, resulting in a large search space for some queries. Our aim is to have the vast majority of queries that exploit NXT's multi-rooted tree structure run quickly enough on single observations that users will be happy to run them in an interactive GUI environment. null The ability to load more data. NXT loads data into a structure that is 5-7 times the size of the data on disk. A smaller memory representation would allow larger data sets to be loaded for querying. Because it has a &amp;quot;lazy&amp;quot; implementation that only loads annotations when they are required, the current performance is sufficient for many purposes, as this allows all of the annotations relating a single observation to be loaded unless the observation is both long and very heavily annotated. It is too limited (a) when the user requires a query to relate annotations drawn from different observations, for instance, as a convenience when working on sparse phenomena, or when working on multiple-document applications such as the extraction of named entities from newspaper articles; (b) for queries that draw on very many kinds of annotation all at the same time on longer observations; and (c) when the user is in an interactive environment such as a GUI using a wide range of queries on different phenomena. In the last case, our goal could be achieved by memory management that throws loaded data away instead of increasing the loading capacity.</Paragraph>
    <Paragraph position="1">  4 XQuery as a Basis for Re-implementation XQuery (Boag et al., 2005), currently a W3C Candidate Recommendation, is a Turing-complete functional query/programming language designed for querying (sets of) XML documents. It subsumes XPath, which is &amp;quot;a language for addressing parts of an XML document&amp;quot; (Clark and DeRose, 1999). XPath supports the navigation, selection and extraction of fragments of XML documents, by the specification of 'paths' through the XML hierarchy. XQuery queries can include a mixture of XML, XPath expressions, and function calls; and also FLWOR expressions, which provide various programmatical constructs such as for, let, where, orderby and return keywords for looping and variable assignment. XQuery is designed to make efficient use of the inherent structure of XML to calculate the results of a query.</Paragraph>
    <Paragraph position="2"> XQuery thus appears a natural choice for querying XML of the sort over which NQL operates. Although the axes exposed in XPath allow comprehensive navigation around tree structures, NXT's object model allows individual nodes to draw multiple parents from different trees that make up the data; expressing navigation over this multi-tree structure can be cumbersome in XPath alone. XQuery allows us to combine fragments of XML, selected by XPath, in meaningful ways to construct the results of a given query.</Paragraph>
    <Paragraph position="3"> There are other possible implementation options that would not use XQuery. The first of these would use extensions to the standard XPath axes to query concurrent markup, as has been demonstrated by (Iacob and Dekhtyar, 2005). We have not yet investigated this option.</Paragraph>
    <Paragraph position="4"> The second is to come up with an indexing scheme that allows us to recast the data as a relational database, the approach taken in LPath (Bird et al., 2006). We chose not to explore this option. It is not difficult to design a relational database to match a particular NXT corpus as long as editing is not enabled. However, a key part of NXT's data model permits annotations to descend recursively through different layers of the same set of data types, in order to make it easy to represent things like syntax trees. This makes it difficult to build a generic transform to a relational database - such a transform would need to inspect the entire data set to see what the largest depth is. It also makes it impossible to allow editing, at least without placing some hard limit on the recursion. It is admittedly true that any strategy based on XQuery will also be limited to static data sets for the present, but update mechanisms for XQuery are already beginning to appear and are likely to become part of some future standard.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML