File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1070_metho.xml

Size: 6,740 bytes

Last Modified: 2025-10-06 14:07:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1070">
  <Title>Towards an Intelligent Multilingual Keyboard System</Title>
  <Section position="5" start_page="22" end_page="22" type="metho">
    <SectionTitle>
3 THE APPROACH
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
3.1 Overview
</SectionTitle>
      <Paragraph position="0"> In the traditional Thai keyboard input system, a key button with the help of language-switching key and the shift key can output 4 different characters. For example, in the Thai keyboard the 'a'key button can represent 4 different characters in different modes as shown in Table 1.</Paragraph>
      <Paragraph position="1">  However, using NLP technique, the Thai-English keyboard system which can predict the key users intend to type without the language-selection key and the shift key, should be efficiently implemented. We propose an intelligent keyboard system to solve this problem and have implemented with successful result.</Paragraph>
      <Paragraph position="2"> To solve this problem, there are basically two steps:</Paragraph>
    </Section>
    <Section position="2" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
3.2 Language Identification
</SectionTitle>
      <Paragraph position="0"> The following example illustrates the disturbance of language switching. In the Thai input mode, typing a word &amp;quot;language&amp;quot; will result &amp;quot;GADGA2G67GC1GA2GC1&amp;quot;. It is certain that the user has to delete sequence &amp;quot;GADGA2G67GC1GA2GC1&amp;quot; and then switches to the English mode before retyping the key sequence to get the correct result of &amp;quot;language&amp;quot;.</Paragraph>
      <Paragraph position="1">  Therefore an intelligent system to perform language switching automatically is helpful in eliminating the annoyance.</Paragraph>
      <Paragraph position="2"> In general, different languages are not typed connectedly without spaces between them. The languageidentification process starts when a non-space character is typed after a space. Many works in language identification, [3] and [5], have claimed that the n-gram model gives a high accuracy on language identification. After trying both trigrams and bigrams, we found that bigrams were superior. We then compare the following bigram probability of each language.</Paragraph>
      <Paragraph position="4"> G53 is the probability of the bi-gram key buttons considered in Thai texts.</Paragraph>
      <Paragraph position="5"> K is the key button considered.</Paragraph>
      <Paragraph position="7"> G53 is the probability of the bi-gram key buttons considered in English texts.</Paragraph>
      <Paragraph position="8"> Tprob is the probability of the considered key-button sequence to be Thai.</Paragraph>
      <Paragraph position="9"> Eprob is the probability of the considered key-button sequence to be English.</Paragraph>
      <Paragraph position="10"> m is the number of the leftmost characters of the token considered. (See more details in the experiment.) The language being inputted is identified by comparing the key sequence probability. The language will be identified as Thai if Tprob &gt; Eprob and vice versa.</Paragraph>
    </Section>
    <Section position="3" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
3.3 Key Prediction without Using Shift Key
for Thai Input
3.3.1 Trigram Key Prediction
</SectionTitle>
      <Paragraph position="0"> The trigram model is selected to apply for the Thai key prediction. The problem of the Thai key prediction can be defined as:</Paragraph>
      <Paragraph position="2"> where t is the sequence of characters that maximizes the character string sequence probability, c is the possible input character for the key button K, K is the key button, n is the length of the token considered.  In some cases of Thai character sequence, the trigram model fails to predict the correct key. To correct these errors, the error-correction rules proposed by [1] and [2] is employed. 3.3.2.1 Error-correction Rule Extraction After applying trigram prediction to the training corpus are considered to prepare the error correction rule. The left and right three keys input around each error character and the correct pattern corresponding with the error will be collected as an error-correction pattern. For example, if the input key sequence &amp;quot;glik[lkl9in&amp;quot; is predicted as &amp;quot;GC1GABGA6GACG93GADGB5GADG98GA6G72&amp;quot;, where the correct prediction is &amp;quot;GC1GABGA6GACG93GABGB5GADG98GA6G72&amp;quot;. The string &amp;quot;ik[lkl9&amp;quot; is then collected as an error sequence and &amp;quot;GACG93GABGB5GADG98&amp;quot; is collected as the correct pattern to amend the error.</Paragraph>
      <Paragraph position="3">  In the process of collecting the patterns, there are a lot of redundant patterns collected. For example, patterns no.1-3 in Table 2 should be reduced to pattern 4. To reduce the number of rules, left mutual information and right mutual information ([7]) are employed. When all patterns are shortened, the duplicate patterns are then eliminated in the final.</Paragraph>
      <Paragraph position="5"> where xyz is the pattern being considered, x is the leftmost character of xyz, y is the middle substring of xyz, z is the rightmost character of xyz, p( ) is the probability function.  The pattern-shortening rules are as follows.  1) If the Rm(xyz) is less than 1.20 then pattern xyz is reduced to xy.</Paragraph>
      <Paragraph position="6"> 2) Similarly, If the Lm(xyz) is less than 1.20 then pattern xyz is reduced to yz.</Paragraph>
      <Paragraph position="7"> 3) Rules 1 and 2 are applied recursively until the considered pattern cannot be shortened anymore. After all patterns are shortened, the following rules are applied to eliminate the redundant patterns.</Paragraph>
      <Paragraph position="8"> 1) All duplicate rules are unified.</Paragraph>
      <Paragraph position="9"> 2) The rules that contribute less 0.2 per cent of error corrections are eliminated.</Paragraph>
      <Paragraph position="10"> 3.3.3 Applying Error-correction Rules There are three steps in applying the error-correction rules: 1) Search the error patterns in the text being typed. 2) Replace the error patterns with the correct patterns. 3) If there are more than one pattern matched, the longest  pattern will be selected.</Paragraph>
      <Paragraph position="11"> In order to optimize the speed of error-correction processing and correct the error in the real time, the finite-automata pattern matching ([4] and [6]) is applied to search error sequences. We constructed an automaton for each pattern, then merge these automata into one as illustrated in Figure 3.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML