File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/w98-1308_abstr.xml

Size: 3,994 bytes

Last Modified: 2025-10-06 13:49:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1308">
  <Title>A Multilingual Natural-Language Interface to Regular Expressions</Title>
  <Section position="2" start_page="0" end_page="79" type="abstr">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Regular expressions are a mathematical formalism widely used for many computational tasks, ranging from simple search-and-replace procedures to full-scale implementations of grammars of natural languages. While the basic system of regular expressions is simple and widely known, there are lots of variants designed for specific purposes. In this report, we shall consider one particular system of regural expressions, the formalism of XFST (Xerox Finite State Tool). It is a system mainly used for various tasks of linguistic processing. We shall use XFST as the object of a case study of programming and program documentation in natural language. XFST is simple enough to make a fairly complete interface easily manageable. At the same time, it is a system that has lots of occasional and non-programmer users, who might profit from a natural-language interface.</Paragraph>
    <Paragraph position="1"> The users of XFST typically write scripts, which are programs consisting of regular expressions mixed with some instructions about what to do with them. In some cases, an XFST script is accompanied by some informal description of its content. For instance, the XFST programmer first writes an English text consisting of a sequence of grammatical rules, and then encodes this text rule by rule into the XFST formalism. The text then remains as a perspicuous and reliable document of the script, particularly useful for those who have not themselves written the program but just use it or want to modify it. It is partly in the very purpose of making various intuitive ways of expression possible that the rich formalism of XFST has been developed.</Paragraph>
    <Paragraph position="2">  However, there is no systematic guarantee of the script's being accompanied by a corresponding informal text. Scripts are often created from scratch, without providing a full document even afterwards. And even if a document were written in parallel with the script, the two can differ because of human error. They can, of course, also be intentionally different, because the author wants to hide some details of the script.</Paragraph>
    <Paragraph position="3"> But let us take it for granted that it is interesting to have natural language texts exactly corresponding to XFST scripts. This is at least an interesting theoretical problem, since it requires a very precise grammar of natural languagc zo precise that it would be possible to use natural language as a programming language instead of the formalism of XFST. Once such a grammar has been given to one language (to a small but sufficient fragment of it, of course), it is relatively easy to do it for other languages as well. This will immediately lead to the simultaneous multilin~al documentation of XFST scripts.</Paragraph>
    <Paragraph position="4"> Thus we are going to build up a system with the following functionalities: translating of English and French texts into XFST scripts, translating of XFST scripts into English and French texts, translating between English and French texts (via XFST) so that equivalence of meaning is guaranteed, stepwise editing of XFST scripts, of corresponding English and French texts, and of specialized technical lexica.</Paragraph>
    <Paragraph position="5"> We shall argue that the last functionality, stepwise editing, is the most useful one. It produces more comprehensible results than direct translation, and it encourages a structured style of programming. Translating from English and French to XFST is probably not useful as such, because it is difficult for a human writer to stay within the recognized fragment. But it can be used for checking that a text once produced is unambiguous.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML