File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-0409_abstr.xml
Size: 1,020 bytes
Last Modified: 2025-10-06 13:43:43
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0409"> <Title>Integrating Morphology with Multi-word Expression Processing in Turkish</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes a multi-word expression processor for preprocessing Turkish text for various language engineering applications. In addition to the fairly standard set of lexicalized collocations and multi-word expressions such as named-entities, Turkish uses a quite wide range of semi-lexicalized and non-lexicalized collocations. After an overview of relevant aspects of Turkish, we present a description of the multi-word expressions we handle. We then summarize the computational setting in which we employ a series of components for tokenization, morphological analysis, and multi-word expression extraction. We nally present results from runs over a large corpus and a small gold-standard corpus.</Paragraph> </Section> class="xml-element"></Paper>