File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-0409_abstr.xml

Size: 1,020 bytes

Last Modified: 2025-10-06 13:43:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0409">
  <Title>Integrating Morphology with Multi-word Expression Processing in Turkish</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper describes a multi-word expression processor for preprocessing Turkish text for various language engineering applications. In addition to the fairly standard set of lexicalized collocations and multi-word expressions such as named-entities, Turkish uses a quite wide range of semi-lexicalized and non-lexicalized collocations. After an overview of relevant aspects of Turkish, we present a description of the multi-word expressions we handle. We then summarize the computational setting in which we employ a series of components for tokenization, morphological analysis, and multi-word expression extraction. We nally present results from runs over a large corpus and a small gold-standard corpus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML