File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1038_metho.xml

Size: 15,972 bytes

Last Modified: 2025-10-06 14:14:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1038">
  <Title>PAT-Trees with the Deletion Function as the Learning Device for Linguistic Patterns</Title>
  <Section position="3" start_page="244" end_page="248" type="metho">
    <SectionTitle>
2. The Original PAT-tree
</SectionTitle>
    <Paragraph position="0"> In this section, we review the original version of the PAT-tree and provide enough background information for the following discussion.</Paragraph>
    <Section position="1" start_page="244" end_page="246" type="sub_section">
      <SectionTitle>
2.1 Definition of Pat-tree
2.1.1 PATRICIA
</SectionTitle>
      <Paragraph position="0"> Before defining the PAT-tree, we first show how PATRICIA works.</Paragraph>
      <Paragraph position="1"> PATRICIA is a special kind of trie\[Fredkin 60\]. In a trie, there are two different kinds of nodesqbranch decision nodes and element nodes. Branch decision nodes are the search decisionmakers, and the element nodes contain real data. To process strings, if branch decisions are made on each bit, a complete binary tree is formed where the depth is equal to the number of bits of the longest strings. For example, suppose there are 6 strings in the data set, and that each is 4 bits long. Then, the complete binary search tree is that shown in Fig. 2.1.</Paragraph>
      <Paragraph position="2"> IO011 Fig 2.1 The complete binar,' tree of the 6 data Apparently, it is very wasteful. Many element nodes and branch nodes are null. If those nodes are removed, then a tree called a &amp;quot;compressed digital search trie&amp;quot; \[Flajolet 86\], as shown in Fig. 2.2, is formed. It is more efficient, but an additional field to denote the comparing bit for branching decision should be included in each decision node. In addition, the searched results may not exactly match the input keys, since only some of the bits are compared during the search process. Therefore, a matching between the searched results and their search keys is required. Morrison \[Morrison, 68\] improved the trie structure further. Instead of classifying nodes into branch nodes and element nodes, Morrison combined the above two kinds of nodes into a uniform representation, called an augmented branch node. The structure of an augmented branch node is the same as that of a decision node of the trie except that an additional field for storing elements is included. Whenever an element should be inserted, it is inserted &amp;quot;up&amp;quot; to a branch node instead of creating a new element node as a leaf node. For example, the compressed digital search trie shown in Fig 2.2 has the equivalent PATRICIA like Fig 2.3. It is noticed that each element is stored in an upper node or in itself. How the data elements are inserted will be discussed in the next section. Another difference here is the additional root node. This is because in a binary tree, the number of leaf nodes is always greater than that of internal nodes by one.</Paragraph>
      <Paragraph position="3"> Whether a leaf node is reached is determined by the upward links.</Paragraph>
      <Paragraph position="4">  Gonnet \[Gonnet, 87\] extended PATRICIA to handle semi-infinite strings. The data structure is called a PAT-tree. It is exactly like PATRICIA except that storage for the finite strings is replaced by the starting position of the semi-infinite strings in the text.</Paragraph>
      <Paragraph position="5"> Suppose there is a text T with n basic units.</Paragraph>
      <Paragraph position="6"> T = Ultl2...tt,,. Consider the prefix sub-strings of T's which start from certain positions and go on as necessary to the right, such as u,u2...u,, .....</Paragraph>
      <Paragraph position="7"> U2U 3...tt,, ..... U3U4...U n .... and so on. Since each of these strings has got an end to the left but none to the right, they are so-called semi-infinite strings. Note here that whenever a semi-infinite string extends beyond the end of the text, null characters are appended. These null characters are different from any basic units in the text. Then, all the semi-infinite strings starting from different positions are different. Owing to the additional field for comparing bits in each decision node of PATRICIA, PATRICIA can handle branch decisions for the semi-infinite strings (since after all, there is only a finite number of sensible decisions to separate all the elements of semi-infinite strings in each input set). A PAT-tree is constructed by storing all the starting positions of semi-infinite strings in a text using PATRICIA.</Paragraph>
      <Paragraph position="8"> There are many useful functions which can easily be implemented on PAT-trees, such as prefix searching, range searching, longest repetition searching and so on.</Paragraph>
      <Paragraph position="9"> Insert(to-end substring Sub, PAT tree rooted at R) {</Paragraph>
      <Paragraph position="11"> if the same bit as CompareBit ( p ) at Sub is 0</Paragraph>
      <Paragraph position="13"> // Find the appropriate position to insert SUb // into the PAT tree (SUb will be inserted // between p and n) b &lt;--- the first bit where Data ( n ) and Sub differ;</Paragraph>
      <Paragraph position="15"> if the same bit as CompareBit ( p ) at Sub is 0  Hung \[Hung, 96\] took advantage of prefix searching in Chinese processing and revised the PAT-tree. All the different basic unit positions were exhaustively visited as in a PAT-tree, but the strings did not go right to the end of the text. They only stopped at the ends of the sentences. We call these finite strings &amp;quot;to-end sub-strings&amp;quot;. In this  way, the saved strings will not necessarily be unique. Thus, the frequency counts of the strings must be added. A field denoting the frequency of a prefix was also added to the tree node. With these changes, the PAT-tree is more than a tool for searching prefixes; it also provides their frequencies.</Paragraph>
      <Paragraph position="16"> The data structure of a complete node of a PAT-tree is as follows.</Paragraph>
      <Paragraph position="17"> Node: a record of Decision hit: an integer to denote the decision bit. Frequency: the frequency count of the prefix substring. .</Paragraph>
      <Paragraph position="18"> Data element: a data string or a pointer of a semi-infinite string.</Paragraph>
      <Paragraph position="19"> Data count: the frequency count of the data string. Left: the left pointer points downward to the left sub-tree or points upward to a data node.</Paragraph>
      <Paragraph position="20"> Right: the right pointer points downward to the right sub-tree or points upward to a data node. End of the record.</Paragraph>
      <Paragraph position="21"> The construction process for a PAT-tree is nothing more than a consecutive insertion process for input strings. The detailed insertion procedure is given in Algorithm 2.1 and the searching procedure in Algorithm 2.2.</Paragraph>
      <Paragraph position="23"> (1) They are easy to construct and maintain. (2) Any prefix sub-string and its frequency count can be found very quickly using a PAT-tree. (3) The space requirement for a PAT-tree is linear to the size of the input text.</Paragraph>
      <Paragraph position="24"> 3. Pat-tree with the deletion function  The block diagram of the PAT-tree with the deletion function is shown in figure 3.1.</Paragraph>
      <Paragraph position="25">  Implementing the deletion function requires two functions. One is the evaluation function that evaluates the data elements to find the least important element. The second function is release of the least important element from the PAT-tree and return of the freed node.</Paragraph>
    </Section>
    <Section position="2" start_page="246" end_page="247" type="sub_section">
      <SectionTitle>
3.1 The Evaluation function
</SectionTitle>
      <Paragraph position="0"> Due to the limited memory capacity of a PAT-tree, old and unimportant elements have to be identified and then deleted from the tree in order to accommodate new elements. Evaluation is based on the following two criteria: a) the oldness of the elements, and b) the importance of the elements. Evaluation of an element has to be balanced between these criteria. The oldness of an element is judged by how long the element has resided in the PAT-tree. It seems that a new field in each node of a PAT-tree is needed to store the time when the element was inserted. When the n-th element was inserted, the time was n. The resident element will become old when new elements are gradually inserted into the tree.</Paragraph>
      <Paragraph position="1"> However, old elements might become more and more important if they reoccur in the input text.</Paragraph>
      <Paragraph position="2"> The frequency count of an element is a simple criterion for measuring the importance of an  element. Of course, different importance measures can be employed, such as mutual information or conditional probability between a prefix and suffix. Nonetheless, the frequency count is a very simple and useful measurement.</Paragraph>
      <Paragraph position="3"> To simplify the matter, a unified criterion is adopted. Under this criterion no additional storage is needed to register time. A time lapse will be delayed in order to revisit and evaluate a node, and hopefully, the frequency counts of important elements will be increased during the time lapse. It is implemented by way of a circular-like array of tree nodes. A PAT-tree will be constructed by inserting new elements. The insertion process takes a free node for each element from the array in the increasing order of their indexes until the array is exhausted. The deletion process will then be triggered. The evaluation process will scan the elements according to the array index sequence, which is different from the tree order, to find the least important element in the first k elements to delete. The freed node will be used to store the newly arriving element. The next position of the current deleted node will be the starting index of the next k nodes for evaluation. In this way, it is guaranteed that the minimal time lapse to visit the same node will be at least the size of the PAT-tree divided by k.</Paragraph>
      <Paragraph position="4"> In section 4, we describe experiments carried out on the learning of high frequency word bigrams. The above mentioned time lapse and the frequency measurement for importance were used as the evaluation criteria to determine the learning performance under different memory constraints.</Paragraph>
    </Section>
    <Section position="3" start_page="247" end_page="248" type="sub_section">
      <SectionTitle>
3.2 The Deletion function
</SectionTitle>
      <Paragraph position="0"> Deleting a node from a PAT-tree is a bit complicated since the proper structure of the PAT-tree has to be maintained after the deletion process. The pointers and the last decision node have to be modified. The deletion procedure is illustrated step by step by the example in Fig. 3.2. Suppose that the element in the node x has to be deleted, i.e.</Paragraph>
      <Paragraph position="1"> the node x has to be returned free. Hence, the last decision node y is no longer necessary since it is the last decision bit which makes the branch decision between DATA(x) and the strings in the left subtree of y. Therefore, DATA(x) and DECISION(y) can be removed, and the pointers have to be reset properly. In step 1, a) DATA(x) is replaced by DATA(y), b) the backward pointer in z pointing to y is replaced by x, and c) the pointer of the parent node of y which points to y is replaced by the left pointer of y. After step 1, the PAT-tree structure is properly reset. However the node y is deleted instead of x. This will not affect the searching of the PAT-tree, but it will damage the algorithm of the evaluation function to keep the time lapse properly. Therefore, the whole record of the data in x is copied to y, and is reset to the left pointer of the parent node of x be y in the step 2. Of course, it is not necessary to divide the deletion process into the above two steps. This is just for the sake of clear illustration. In the actual implementation, management of those pointers has to be handled carefully. Since there is no backward pointer which points to a parent decision node, the relevant nodes and their ancestor relations have to be accessed and retained after searching DATA(x) and DATA(y).</Paragraph>
      <Paragraph position="2">  The following simple experiments were carried out in order to determine the learning performance of the PAT-tree under different memory constraints. We wanted to find out how the high frequency word bi-grams were retained when the total number of different word bi-grams much greater than the size of the PAT-tree.</Paragraph>
    </Section>
    <Section position="4" start_page="248" end_page="248" type="sub_section">
      <SectionTitle>
4.1 The testing environment
</SectionTitle>
      <Paragraph position="0"> We used the Sinica corpus as our testing data. The Sinica corpus is a 3,500,000-word Chinese corpus in which the words are delimited by blanks and tagged with their part-of-speeches\[Chen 96\]. To simplify the experimental process, the word length was limited to 4 characters. Those words that had more than four characters were truncated. A preprocessor, called reader, read the word bi-grams from the corpus sequentially and did the truncation. Then the reader fed the bi-grams to the construction process for the Pat-tree. There were 2172634 bigrams and 1180399 different bi-grams.</Paragraph>
      <Paragraph position="1"> Since the number of nodes in the PAT-trees was much less than the number of input bi-grams, the deletion process was carried out and some bi-grams was removed from the PAT-tree. The recall rates of each different frequency bi-grams under the different memory constraints were examined to determine how the PAT-tree performed with learning important information.</Paragraph>
    </Section>
    <Section position="5" start_page="248" end_page="248" type="sub_section">
      <SectionTitle>
4.2 Experimental Results
</SectionTitle>
      <Paragraph position="0"> Different time lapses and PAT-tree sizes were tested to see how they performed by comparing the results with the ideal cases.</Paragraph>
      <Paragraph position="1"> The ideal cases were obtained using a procedure in which the input bi-grams were pre-sorted according to their frequency counts. The bi-grams were inserted in descending order of their frequencies. Each bi-gram was inserted n times, where n was its frequency.</Paragraph>
      <Paragraph position="2"> According to the deletion criterion, under such an ideal case, the PAT-tree will retain as many high frequency bi-grams as it can.</Paragraph>
      <Paragraph position="3">  The deletion process worked as follows. A fixed number of nodes were checked starting from the last modified node, and the one with the minimal frequency was chosen for deletion. Since the pointer was moving forward along the index of the array, a time lapse was guaranteed to revisit a node. Hopefully the high frequency bi-grams would reoccur during the time lapse. Different forward steps, such as 100, 150, 200, 250, and 300, were tested, and the results show that deletion of the least important elements within 200 nodes led to the best result. However the performance results of different steps were not very different. Table 4.1 shows the testing results of step size 200 with different PAT-tree sizes. Table 4.2 shows the results under the ideal cases. Comparing the results between Table 4.1 and Table 4.2, it is seen that the recall rates of the important bi-grams under the normal learning process were satisfactory. Each row denotes the recall rates of a bi-gram greater than the frequency under different sizes of PAT-tree. For instance, the row 10 in Table 4.1 shows that the bi-grams which had the frequency greater than 20, were retained as follows: 85.46%, 97.02%, 99.63%, 99.95%, 100%, 100%, 100%, and 100%, when the size of the PAT-tree was 1/64, 2/64 ..... 8/64 of the total number of the different bi-grams, respectively.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML