File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-1631_abstr.xml

Size: 1,024 bytes

Last Modified: 2025-10-06 13:45:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1631">
  <Title>Capturing Out-of-Vocabulary Words in Arabic Text</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> The increasing flow of information between languages has led to a rise in the frequency of non-native or loan words, where terms of one language appear transliterated in another. Dealing with such out of vocabulary words is essential for successful cross-lingual information retrieval.</Paragraph>
    <Paragraph position="1"> For example, techniques such as stemming should not be applied indiscriminately to all words in a collection, and so before any stemming, foreign words need to be identified. In this paper, we investigate three approaches for the identification of foreign words in Arabic text: lexicons, language patterns, and n-grams and present that results show that lexicon-based approaches outperform the other techniques.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML