File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/p00-1078_metho.xml
Size: 3,257 bytes
Last Modified: 2025-10-06 14:07:23
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1078"> <Title>The State of the Art in Thai Language Processing</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Machine Translation </SectionTitle> <Paragraph position="0"> Currently, there is only one machine translation system available to the public, called ParSit (http://www.</Paragraph> <Paragraph position="1"> links.nectec.or.th/services/ parsit) , it is a service of English-to-Thai webpage translation. ParSiT is a collaborative work of NECTEC, Thailand and NEC, Japan. This system is based on an i n terlingual approach MT and the translation acc u racy is about 80%. Other approaches such as generate-and-repair [7] and sentence pattern mapping have been also studied [8].</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Language Resources </SectionTitle> <Paragraph position="0"> The only Thai text corpus available for research use is the ORCHID corpus. ORCHID is a 9-MB Thai part-of-speech tagged corpus initiated by NECTEC, Thailand and Communications R e search Laboratory, Japan. ORCHID is available at http://www.links.nectec.or.th /orchid.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Research in Thai OCR </SectionTitle> <Paragraph position="0"> Frequently used Thai characters are about 80 characters , including alphabets, vowels, tone marks, special marks, and numerals. Thai wri t ing are in 4 levels, without spaces between words , and the problem of similarity among many patterns has made research challenging .</Paragraph> <Paragraph position="1"> Moreover, the use of English and Thai in general Thai text creates many more patterns which must be recognized by OCR.</Paragraph> <Paragraph position="2"> For more than 10 years, there has been a co n siderable growth in Thai OCR research, especially for &quot;printed character&quot; task. The early proposed approaches focused on structural matching and tended towards neural-network-based algorithms with input for some special characteristics of Thai characters e.g., curves, heads of characters, and placements. At least 3 commercial products have been launched i n cluding &quot; ArnThai&quot; by NECTEC, which claims to achieve 95% recognition performance on clean input. Recent technical improvement of ArnThai has been reported in [9]. Recently, f o cus has been changed to develop system that are more robust with any unclean scanning input.</Paragraph> <Paragraph position="3"> The approach of using more efficient features, fuzzy algorithms, and document analysis is r e quired in this step.</Paragraph> <Paragraph position="4"> At the same time, &quot;Offline Thai handwritten character recognition&quot; task has been investigated but is only in the research phase of isolated characters. Almost all proposed engines were neural network-based with several styles of i n put features [10], [11]. There has been a small amount of research on &quot;Online handwritten character recognition&quot;. One attempt was pr o posed by [12], which was also neural network-based with chain code input.</Paragraph> </Section> class="xml-element"></Paper>