File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/c04-1175_relat.xml
Size: 1,353 bytes
Last Modified: 2025-10-06 14:15:47
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1175"> <Title>Combining Prediction by Partial Matching and Logistic Regression for Thai Word Segmentation</Title> <Section position="3" start_page="0" end_page="0" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> In addition to the longest matching algorithm, discussed earlier, the maximum matching algorithm (Sornlertlamvanich, 1993) was proposed to get around the greedy characteristic of the longest matching algorithm by generating all possible segmentations for a sentence and then selecting the one which contains the fewest number of entries in the dictionary.</Paragraph> <Paragraph position="1"> An application of statistical techniques was proposed by (Pornprasertkul, 1994), using a Viterbi-based approach to exploit statistical information derived from grammatical tags. Later, (Kawtrakul and Chalathip, 1995) and (Meknawin et al., 1997) used variants of the trigram model to compute the most likely segmentation.</Paragraph> <Paragraph position="2"> (Theeramunkong and Sornlertlamvanich, 2000) observed that, in Thai language, some contiguous characters tend to be inseparable units, called Thai character cluster (TCC), and proposed a set of rules to group characters into TCCs for the purpose of text retrieval.</Paragraph> </Section> class="xml-element"></Paper>