File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2005_intro.xml

Size: 1,272 bytes

Last Modified: 2025-10-06 14:03:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2005">
  <Title>phrasing Based on Parallel Corpus for Normaliza-</Title>
  <Section position="4" start_page="1" end_page="33" type="intro">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> There is little work reported on SMS normalization and translation. Bangalore et al. (2002) used  a consensus translation technique to bootstrap parallel data using off-the-shelf translation systems for training a hierarchical statistical translation model for general domain instant messaging used in Internet chat rooms. Their method deals with the special phenomena of the instant messaging language (rather than the SMS language) in each individual MT system. Clark (2003) proposed to unify the process of tokenization, segmentation and spelling correction for normalization of general noisy text (rather than SMS or instant messaging texts) based on a noisy channel model at the character level. However, results of the normalization are not reported. Aw et al. (2005) gave a brief description on their input pre-processing work for an English-to-Chinese SMS translation system using a wordgroup model. In addition, in most of the commercial SMS translation applications</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML