File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-2035_metho.xml
Size: 1,097 bytes
Last Modified: 2025-10-06 14:08:17
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2035"> <Title>A Context-Sensitive Homograph Disambiguation in Thai Text-to-Speech Synthesis</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Preliminary Experiment </SectionTitle> <Paragraph position="0"> To test the performance of the different approaches, we select sentences containing Thai homographs and boundary ambiguity strings from our 25K-words corpus to use in benchmark tests. Every sentence is manually separated into words. Their parts of speech and pronunciations are manually tagged by linguists. The resulting corpus is divided into two parts; the first part, about 80% of corpus, is utilized for training and the rest is used for testing.</Paragraph> <Paragraph position="1"> In the experiment, we classify the data into three group depending on types of text ambiguity according to section 2: CDSA, CISA and Homograph, and compare the results from different approaches; Winnow, Bayseian hybrid [3] and POS trigram. The results are shown in Table 1.</Paragraph> </Section> class="xml-element"></Paper>