File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2025_evalu.xml
Size: 2,885 bytes
Last Modified: 2025-10-06 13:59:46
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2025"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Modified Joint Source-Channel Model for Transliteration Asif Ekbal</Title> <Section position="8" start_page="196" end_page="196" type="evalu"> <SectionTitle> 5 Results of the Proposed Models </SectionTitle> <Paragraph position="0"> Approximately 6000 Indian person names have been collected and their English transliterations have been stored manually. This set acts as the training corpus on which the system is trained to generate the collocational statistics. These statistics serve as the decision list classifier to identify the target language TU given the source language TU and its context. The system also includes the linguistic knowledge in the form of valid conjuncts and diphthongs in Bengali and their English representation.</Paragraph> <Paragraph position="1"> All the models have been tested with an open test corpus of about 1200 Bengali names that contains their English transliterations. The total number of transliteration units (TU) in these 1200 (Sample Size, i.e., S) Bengali names is 4755 (this is the value of L), i.e., on an average a Bengali name contains 4 TUs. The test set was collected from users and it was checked that it does not contain names that are present in the training set. The total number of transliteration unit errors (Err) in the system-generated transliterations and the total number of words erroneously generated (Err/) by the system have been shown in Table 1 for each individual model.</Paragraph> <Paragraph position="2"> The models are evaluated on the basis of the two evaluation metrics, Word Agreement Ratio (WAR) and Transliteration Unit Agreement Ratio (TUAR). The results of the tests in terms of the evaluation metrics are shown in Table 2.</Paragraph> <Paragraph position="3"> The modified joint source-channel model (Model F) that incorporates linguistic knowledge performs best among all the models with a Word Agreement Ratio (WAR) of 69.3% and a Transliteration Unit Agreement Ratio (TUAR) of 89.8%. The joint source-channel model with linguistic knowledge (Model D) has not performed well in the Bengali-English machine transliteration whereas the trigram model (Model E) needs further attention as its result are comparable to the modified joint source-channel model (Model F). All the models were also tested for back-transliteration, i.e., English to Bengali transliteration, with an open test corpus of 1000 English names that contain their Bengali transliterations. The results of these tests in terms of the evaluation metrics WAR and TUAR are shown in Table 3. It is observed that the modified joint source-channel model performs best in back-transliteration with a WAR of 67.9% and a TUAR of 89%.</Paragraph> <Paragraph position="5"/> </Section> class="xml-element"></Paper>