File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1030_intro.xml

Size: 2,417 bytes

Last Modified: 2025-10-06 14:06:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1030">
  <Title>Terminology Finite-State Preprocessing for Computational LFG</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The general issue we are dealing with here is to determine whether there is an advantage to treating multiword expressions as single tokens, by recognizing them before parsing. Possible advantages are the reduction of ambiguity in the parse results, perspicuity in the structure of analyses, and reduction in parsing time. The possible disadvantage is the loss of valid analyses. There is probably no single answer to this issue, as there are many different kinds of multiword expressions. This work follows the integration 1 of (French) fixed multiword expressions like a priori, and time expressions, like le 12janvier 1988, in the preprocessing stage.</Paragraph>
    <Paragraph position="1"> Terminology is an interesting kind of multiword expressions because such expressions are almost but not completely fixed, and there is an intuition that you won't loose many good anal~This integration has been done by Fr6d~rique Segond.</Paragraph>
    <Paragraph position="2"> yses by treating them as single tokens. Moreover, terminology can be semi or fully automatically extracted. Our goal in the present paper is to compare efficiency and syntactic coverage of a French LFG grammar on a technical text, with and without terminology recognition in the preprocessing stage. The preprocessing consists mainly in two stages: tokenization and morphological analysis. Both stages are performed by use of finite-state lexical transducers (Kartunnen, 1994). In the following, we describe the insertion of terminology in these finite-state transducers, as well as the consequences of such an insertion on the syntactic analysis, in terms of number of valid analyses produced, parsing time and nature of the results. We are part of a project, which aims at developing LFG grammars, (Bresnan and Kaplan, 1982), in parallel for French, English and German, (Butt et al., To appear). The grammar is developed in a computational environment called XLE (Xerox Linguistic Environment), (Maxwell and Kaplan, 1996), which provides automatic parsing and generation, as well as an interface to the preprocessing tools we are describing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML