File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0611_intro.xml

Size: 2,517 bytes

Last Modified: 2025-10-06 14:06:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0611">
  <Title>Tools for locating noun phrases with finite state transducers.</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> To be able to retrieve information in a text, one must be able to describe in details the forms under which this information is expressed. For example, if we do not have the information that the chancellor of the Exchequer is the British finance minister, that the shadow minister for Trade and Industry is not a minister, but refers to the British opposition or that the Christ's ministry has nothing to do with politics, we will be misled in our search. We are thus talking about linguistic &amp;quot;information&amp;quot;. We have shown in (1998a), how descriptions of occupations and proper noun phrases could be processed by finite state transducer (FST). In a French daily newspaper, one sentence out of two contains such a noun phrase, thus, one cannot attempt to parse texts without recognizing such sequences. The recognition at this stage, should be accurate: minister of.finance and the minister, Mr Jones, for Scottish affairs are well-formed nominals, but minister for .finance is not. The syntactic and semantic description relies on the same basis: we must be able to fully enumerate such nominals.</Paragraph>
    <Paragraph position="1"> We can find many formalisms proved to be powerful enough to describe such or such natural language phenomenon, but the real problem is the linguistic description. Generally, few examples are given, and it is assumed that formalisms will accommodate a completed database. The  reasons of this situation are several: first, the description stage is considered as trivial; however when seriously attempted, many new theoretical problems appear. Second, if we are able to handle without problem a dozen of rules, when their number increase (to several thousand), processes become rapidly difficult to check and to understand, because interactions between rules are not treated.</Paragraph>
    <Paragraph position="2"> We present here practical methods that allow to create dynamically, to maintain and to debug a large database of finite state transducers. We will develop our example, and show, when one starts from scratch, how to construct a precise and large database. The text we use is one year of the International Herald Tribune (about 10 millions words).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML