File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-1614_abstr.xml

Size: 1,408 bytes

Last Modified: 2025-10-06 13:43:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1614">
  <Title>Urdu Localization Project: Lexicon, MT and TTS (ULP)</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Pakistan has a population of 140 million speaking more than 56 different languages.</Paragraph>
    <Paragraph position="1"> Urdu is the lingua franca of these people, as many speak Urdu as a second language, also the national language of Pakistan. Being a developing population, Pakistani people need access to information. Most of the information over the ICT infrastructure is only available in English and only 5-10% of these people are familiar with English. Therefore, Government of Pakistan has embarked on a project which will generate software to automatically translate the information available in English to Urdu. The project will also be able to convert Urdu text to speech to extend this information to the illiterate population as well. This paper overviews the overall architecture of the project and provides briefs on the three components of this project, namely Urdu Lexicon, English to Urdu</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Machine Translation System and Urdu Text to
Speech System.
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML