File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/99/e99-1041_abstr.xml

Size: 3,460 bytes

Last Modified: 2025-10-06 13:49:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1041">
  <Title>The Treegram Index An Efficient Technique for Retrieval in Linguistic Treebanks</Title>
  <Section position="2" start_page="267" end_page="268" type="abstr">
    <SectionTitle>
2 Efficient query evaluation
</SectionTitle>
    <Paragraph position="0"> The treegram-index retrieval method given above encounters the following interesting  problems: (A) A single treegram may be very complex because of its unlimited degree and label strings; this leads to costly look-up operations.</Paragraph>
    <Paragraph position="1"> (B) There are many treegrams rooting at a given node in a database tree: To accomodate queries with subtree variables, the index has to contain all matching treegrams for that subtree. (c) It is quite expensive to intersect the  tree sets T(DB, g) for all treegrams g contained in the query q.</Paragraph>
    <Paragraph position="2"> VENONA addresses these problems by the following approach:  gram: (1) Node labels hash to an integer of a few bytes: We do not consider labels structured; to model the structure of word forms, feature terms should be used 1. (2) VENONA deals only with treegrams of a maximal degree d; if a tree is of greater degree, it will be transformed automatically to a d-ary tree. 2 (3) For describing a single treegram g, VENONA takes each of g's hashed labels and combines it with the position of its corresponding node in a complete d-ary tree; an integer encoding g's structure completes this representation: Structure is at least as essential for tree retrieval as label information.</Paragraph>
    <Paragraph position="3">  well-known transformation of trees to binary trees. d's value is a configurable parameter of the indexgeneration. null Problem B VENONA uses only one treegram per node v: the treegram including every node found on the first h levels of the subtree rooted in v. This approach keeps the index small but introduces another problem: A query treegram may not appear in the treegram index as it is. Therefore, VENONA expands all query treegram structures at runtime; for a given query treegram g, this expansion yields all database treegrams with a structure compatible to g. That approach keeps the treegram index small and preserves efficiency. Problem C The evaluation of a given query q is processed along the following steps: (1) According to q's degree and height, VENONA chooses a treegram index among those available for the tree database. (2) VENONA collects q's treegrams and represents them by sets of treegram parts. For a given query treegram, VENONA expands the structure number to a set of index treegram structures and removes those labels that consist of a variable: Variables and the constraints that they impose belong to the matching phase.</Paragraph>
    <Paragraph position="4"> (3) VENONA sorts q's treegrams according to their .selectivity by estimating a treegram's selectivity based on the selectivity of its treegram parts. (4) VENONA estimates how many query treegrams it has to evaluate to yield a candidate set small enough for the tree matcher; only for those it determines the corresponding index treegrams. (5) VENONA processes these selected treegrams until the candidate set has the desired size--if necessary, falling back on some of the treegrams put aside.</Paragraph>
    <Paragraph position="5"> (6) Finally, the tree matcher selects the answer trees from these candidates.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML