File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/p03-1020_abstr.xml
Size: 985 bytes
Last Modified: 2025-10-06 13:42:55
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1020"> <Title>tRuEcasIng</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Truecasing is the process of restoring case information to badly-cased or non-cased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of 98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing.</Paragraph> <Paragraph position="1"> In the context of automatic content extraction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80:2%. This paper argues for the use of truecasing as a valuable component in text processing applications.</Paragraph> </Section> class="xml-element"></Paper>