File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-1631_abstr.xml
Size: 1,024 bytes
Last Modified: 2025-10-06 13:45:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1631"> <Title>Capturing Out-of-Vocabulary Words in Arabic Text</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> The increasing flow of information between languages has led to a rise in the frequency of non-native or loan words, where terms of one language appear transliterated in another. Dealing with such out of vocabulary words is essential for successful cross-lingual information retrieval.</Paragraph> <Paragraph position="1"> For example, techniques such as stemming should not be applied indiscriminately to all words in a collection, and so before any stemming, foreign words need to be identified. In this paper, we investigate three approaches for the identification of foreign words in Arabic text: lexicons, language patterns, and n-grams and present that results show that lexicon-based approaches outperform the other techniques.</Paragraph> </Section> class="xml-element"></Paper>