File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-1209_evalu.xml

Size: 2,310 bytes

Last Modified: 2025-10-06 13:58:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1209">
  <Title>Decomposition for ISO/IEC 10646 Ideographic Characters</Title>
  <Section position="4" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
3. Performance Evaluation
</SectionTitle>
    <Paragraph position="0"> Since the algorithms have to do excessive search for many combinations in many levels recursively, performance becomes a very important issue especially if we want to make this for public access through the internet.</Paragraph>
    <Paragraph position="1"> However, since the decomposition is static, it does not need to be done in real time. as the search doesn't need to be done online, In other words, searching of the same data will always give the same result unless the decomposition rules or algorithms are changed. Consequently, we built two pre-searched tables to store the results of both &amp;quot;Compnt-to-Char&amp;quot; algorithm and the &amp;quot;Char-to-Compnt&amp;quot;algorithm. Once we have the pre-searched tables, we can totally avoid the recursive search. Instead, the search result can be directly retrieved in a single tuple. This results in much better performance both in terms of usage of CPU time and I/O usage.</Paragraph>
    <Paragraph position="2">  pre-searched tables for the downward search and the upward search, respectively.</Paragraph>
    <Paragraph position="3"> Although the advanced control algorithms can retrieve most Chinese characters correctly, they also return some components that do not make much sense. For example, the character &amp;quot;Zhang &amp;quot; has a structure of IDC{B}, and components &amp;quot;Li &amp;quot; and &amp;quot;Zao &amp;quot;. However, when it is eventually decomposed into &amp;quot;Li &amp;quot;, &amp;quot;Ri &amp;quot; and &amp;quot;Shi &amp;quot;. Using the algorithm &amp;quot;Char-to-Compnt&amp;quot;, the component &amp;quot;Xin &amp;quot; will also be returned, even though &amp;quot;Xin &amp;quot; has no cognate relationship with the character &amp;quot;Zhang &amp;quot;. We can take into consideration of only a subset of characters that can be split in character formation, such as &amp;quot;Xing &amp;quot; and &amp;quot;Yi &amp;quot;. This way, the insertion components will only be considered for these characters.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML