XML Viewer - c86-1094

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1094_metho.xml
Size: 24,569 bytes
Last Modified: 2025-10-06 14:11:47
<?xml version="1.0" standalone="yes"?>
<Paper uid="C86-1094">
  <Title>A User Friendly A T N Programming Environment (APE) tIans Haugeneder, Manfred Gehrke Siemens AG, ZT ZTI INF</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. Design Considerations for an ATN Environment
</SectionTitle>
    <Paragraph position="0"> Examining various ATN environments as \[KEH 80\], \[GNE 82\] and \[CHR 831 for example we developed our ATN programming environment (APE) along following design principles.</Paragraph>
    <Paragraph position="1"> 1) The various tools the environment offers must be integrated allowing simultaneous grammar editing and testing.</Paragraph>
    <Paragraph position="2"> 2) The grarmnar editor has to represent the network structures graphically allowing the user to access the grammar via the contextfree skeleton of the various networks.</Paragraph>
    <Paragraph position="3"> 3) The desigu of the system should make use of techniques like multi-windowing, menue- and mouse-based interaction facilities, in order to make the system usable in an easy manner.</Paragraph>
    <Paragraph position="4"> With this desiderata concerning the design of such a system, certain requirements concerning the hardware and software fi)r such an implementation are necessary. We have chosen Interlisp-D (Trademark of XEROX) as basis of APE, which due to its comprehensive displayand interaction facilitie:i~ proved to be an adequate starting point for the realisa-tion of our ideas.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. Active Chart Parsing as a Framework for an
ATN Environment
</SectionTitle>
    <Paragraph position="0"> Active Chart parsing (\[KAP 73\]) is a highly general framework to implement parsers. The two main ideas of this approach are to represent the parser's control structure explicitly allowing high flexibility in scheduling the various paths to be followed and to prevent the parser from doing the same thing twice using a comprehensive bookkeeping mechanism. The interaction of these components is shown schematically in figure 1.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="399" type="metho">
    <SectionTitle>
PARSER
</SectionTitle>
    <Paragraph position="0"> Figure 1 The possibilities of a flexible scheduling is achieved by means of an agenda, which at any state of the parser contains all the tasks that are induced by the grammar and not processed so far. The ordering of the agenda thereby determines the way, the search space is traversed. With this agenda-based scheduling facility the parser can apply various control structures like depth-first, breadth-first or heuristic scheduling, even changing it during one parse.</Paragraph>
    <Paragraph position="1"> Such facilities are of interest for &amp;quot;tuning&amp;quot; the parser's behaviour in an intended way. Agenda-based task scheduling also offers the operational facilities for pruning parts of the search space which amounts to switching off certain parts of a grammar during a parse.</Paragraph>
    <Paragraph position="2"> The second central concept in active chart parsing, the chart, is a graph structure, which does not only do the bookkeeping of the parsed constituents (the inactive edges). It also records each of the partial intermediate steps (the active edges), thus logically representing all the paths in work and all constituents parsed so far offering the possibility to inspect the uptothen parsing process.</Paragraph>
    <Paragraph position="3"> But more important, e.g. for perspicuity, the chart (i.e. its graphical representation) also can be seen as a descriptive representation of the parser's state from a naive  grammar writers' point of view. It is a conceptually simple representation, whose atomic constructs, the graph's nodes, the active and inactive edges, have clear counterparts to the conceptual entities a grammar writer has a naive understanding of, namely the positions in the sentence to parse (i.e. the nodes), the partial parses spawning between two nodes (i.e. the active edges) and already analysed constituents (i.e. the inactive edges spawning the sequence of words between two nodes).</Paragraph>
    <Paragraph position="4"> Thus a graphical representation of the chart growing as the parser proceeds makes the parsing process easily perspicuable for the user.</Paragraph>
  </Section>
  <Section position="5" start_page="399" end_page="400" type="metho">
    <SectionTitle>
4. Description of the Environment
4.1. The Grammar-Editor
</SectionTitle>
    <Paragraph position="0"> The user interface to the ATN grammar is built on top of an active graph-like representation of the single networks, which is initiated by the user in a menu-based manner. This bird's eye view gives the user an overall first impression of the global structure of the whole grammar with the type of the ,arc (PUSH, POP, CAT, JUMP) and the specification of categorial&amp;quot; information with CAT- and PUSH-arcs.</Paragraph>
    <Paragraph position="1"> Thus the user is not beaten with an unnatural, artificially linearized (for example lispish) way to represent the basic graph-like concepts of ATNs. The benefits of such network-based grammar specification facilities have been pointed out by Grimes \[GR175\].</Paragraph>
    <Paragraph position="2"> The networks, displayed in the way described above, additionally offers the user a number of operational facilities, such as getting:more specific information on a certain arc as for example its actions or additional tests. The user can activate the displayed network's arc and nodes respectively by clicking the mouse.</Paragraph>
    <Paragraph position="3"> Activating an arc hereby pops a menu with the follo- null wing possibilities: - info: Gives a detailed printout of the arc, including its status (broken vs. unbroken).</Paragraph>
    <Paragraph position="4"> - delete: Deletes the arc from the network, causing a new graphical layout of the network.</Paragraph>
    <Paragraph position="5"> - edit: Edits the complete arc in a mouse- and menuoriented editor with all necessary facilities to modify various parts of the arc, such as tests, actions and forms as well as its weight. Lea null ving the editor several checks are performed, putting the user back into the edit mode, if the modified arc structure is incorrect (e.g. if it contains too many items or items of an incorrect type at the wrong place).</Paragraph>
    <Paragraph position="6"> - break: Puts a break on the arc taking the user into the break mode with interactive facilities (as described below) after the broken arc's actions are performed.</Paragraph>
    <Paragraph position="7">  - unbreak: Removes a break from the arc.</Paragraph>
    <Paragraph position="8"> Activating a node in the network offers the following facilities: - info: - insert: Gives a detailed printout of all the arcs starting at that node.</Paragraph>
    <Paragraph position="9">  Allows the user to insert an arc starting at the node activated, the arc's ending node (except POP-arcs) being determined via the mouse. To introduce additional new nodes the user is prompted by the system for subsequent arcs until he specifies a POP-arc or an already existing node as ending node of the last prompted arc.</Paragraph>
    <Paragraph position="10">  - merge: A new node N1 is inserted after node N with  the leaving arcs of N now beginning at N1 and a new arc between N and N1.</Paragraph>
    <Paragraph position="11"> 4.2. Grammar-Debugger The user can specify in advance certain constructions he wants to be parsed, thus having the possiblitiy to test certain NP-constructions for example without the overhead of parsing a whole sentence.</Paragraph>
    <Paragraph position="12"> These debugging facilities can be involved in three ways: primarily while the parser is working in a stepper-mode oy means of a user interaction, secondly during the parser's run by means of a break put on an grammar arc and thirdly system-initiated at the end of the parse giving the user the possibility to restart.</Paragraph>
    <Paragraph position="13"> In the stepper-mode the user can cause a break while watching the chart growing as the parser processes one task after another in the following way. During the single steps of creating of the chart graphically the system is interruptable to give the user the opportunity to put APE's stepper into the break-mode ~oy mouseclicking the relevant menu's item).</Paragraph>
    <Paragraph position="14"> In the break-mode the user is offered a number of operational facilities which can be accessed activating the chart nodes and edges with the mouse. When selecting an edge the user can get more detailled info~anation as for example its weight, its register environment and its history, consisting of the path through the grammar each arc being augmented with additional information as its current inputword, its register environment and the number of the task being responsible for processing that arc, which directly reflects the way the scheduling is performed. But more importantly the ~.ammar tester can also modify the edges in various dimensions, including the following options:  - registers: - weight: - ending edge:  The user can change registers by employing the same language he is used to as a grammar writer, i.e. in terms of actions defined in the ATN formalism as for example SETRs, ADDRs or form to be evaluated such as BUILDQs.</Paragraph>
    <Paragraph position="15"> Allows to change of the weight of an edge, affecting the order of further processing.</Paragraph>
    <Paragraph position="16"> With this option an edge can be modified with respect to the part of the input being spanned by it.</Paragraph>
    <Paragraph position="17"> This last option together with the possibilities of register modifying renders for example the simple simulation of the parser's behaviour under the assumption of a (effectively missing or due to not matching tests blocked) grammar arc by enlarging the span of an edge. Another more powerful possibility in testing a grammar is the introduction of additional (in)active edges, connecting two arbitrary nodes, which can be achieved via an activation of the starting arc. This allows the specification of partial parses or parsed constituents, which though missing due to some defect in the grammar - the user wants the to make use of in further parsing process. Parallel to all the options presented so far the user can edit the grammar on the fly, thus being able to modify the grammar just when he recognises certain bugs. Additionally APE gives the user the possibilitiy to manipulate the agenda offering him various actions to be performed on the single tasks l!ke freezing and killing a task, or changing its weight, this facility provides an advanced grammar writer with very effective means to focus the parser on things that are interesting for him in a certain situation, abondoning with irrelevcnt paths or postponing them.</Paragraph>
    <Paragraph position="18"> Finally, when the user has done all the things that seemed useful to him at this break point he can continue the parsing process leaving the stepper options as they are or changing them appropriately.</Paragraph>
    <Paragraph position="19"> At the end of the parsing process the user again gets in a break mode giving him the opportunuity of inserting new edges with the facility to restart the parsing process with this new information. Thus adding a new inactive edge and restarting for example amounts to asking the parser &amp;quot;what would yours results have been with an additional constituent ci from word wj to word Wk?&amp;quot;. With the facilities described above the user also can easily analyse a configuration when the parser did not succeed in parsing a certain construction* This description, though sketchy, should give an impression of the ~acilitities of APE and the ideas behind it. An illustration of APE's environment is shown in the appendix. null</Paragraph>
  </Section>
  <Section position="6" start_page="400" end_page="400" type="metho">
    <SectionTitle>
5. Outlook
</SectionTitle>
    <Paragraph position="0"> The described ATN programming environment gives substantial support to the user in building up a working grammar, but some of APE's aspects aren't completely sattsfymg. ~o a lot of polishing the user interface as well as improving the functionality is still to be done.</Paragraph>
    <Paragraph position="1"> Appendix Snapshot of the system in the breakmode.</Paragraph>
    <Paragraph position="2"> Primarily we are currently working on an user friendly lexicon handling. Another augmentation will be the easier global specification of very flne-grained breaks. We'd like to thank U. Hochgesand, C. Maienborn and L. Simon for implementing parts of the environment and the colleagues of our lab fbr many fruitful discussions.</Paragraph>
  </Section>
  <Section position="7" start_page="400" end_page="400" type="metho">
    <SectionTitle>
7. Literature
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="8" start_page="400" end_page="400" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> To deal with specific alphabets is a necessity tn natural language processing. In Grenoble, thls problem is solved with held of transcriptions. Here we present a language (LT) designed to the rapid writing of passage from one transcription to another (transducers) and give some examples of its use.</Paragraph>
    <Paragraph position="1"> KEY-WORDS Transcriptions, transducers, multi-alphabet text processing, logical and physical processing of texts.</Paragraph>
  </Section>
  <Section position="9" start_page="400" end_page="400" type="metho">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> In the general framework or natural language processing, the possibilities of interfaces provtded by the current devices are rather poor, when censtder|ng~ for example, the number of alphabets to be used. The problem of uppercase/lowercase letters, that of non-latin alphabets, not mentioning ideograms, is UsUally solved by the use of transcriptions in computer science circle dealing wlth natural languages &lt;BOITET83&gt;.</Paragraph>
    <Paragraph position="1"> Our idea is to provide a rather simple device allowing rapid writing of programs performing the passage from one transcription to another (transducers, &lt;KAIN72&gt;), with help of a language (LT or Language for Transcriptions) based on an abstract automaton. The definition and the implementation of this language were initiated during an engineering school project &lt;MENGAB4&gt;. The work on this Speclallsed Language for Linguistic Programming (SLLP) has led to a First version &lt;LT85&gt; in the context oPS a GETA/USMG project. It has then been extended in the frame of EUROTRA contract ETS-5 &lt;ETS5&gt;.</Paragraph>
    <Paragraph position="2"> This paper presents: the semantics of LT in automata theory; the syntax of LT briefly described; Indications on the Implementation; some appllcatlons.</Paragraph>
    <Paragraph position="3">  I. SEMANTICS OF I_T IN AUTOMATA THEORY input tape ... a &lt;str x&gt; ... a &lt;str x&gt; READING HEAD ....... ~ .................. l ........ ..... I ....... I state E q I --&gt; ! q' I ---t ...... I--writing head ~ output tape &lt;str z&gt; &lt;str z&gt; b .....................................  Given a state and a character read, the transducer goes Into another state and determines which character to write onto the output tape (transition).</Paragraph>
    <Paragraph position="4"> Tile most simple transducer is deterministic and r~la~r. : it has only one input tape and only one output tape; there is only one way for reading and writing (rlghtwards); It reads only one symbol at a time; it writes only one symbol for one symbol read; there are no other objects such as stacks or balloons.</Paragraph>
  </Section>
  <Section position="10" start_page="400" end_page="402" type="metho">
    <SectionTitle>
2, THE ABSTRACT AUTOMATON OF LT
</SectionTitle>
    <Paragraph position="0"> Tile &amp;quot;basic&amp;quot; automaton is extended in LT into three directions 1. availability of the right context by means of two reading heads. The transition is function of the head (&amp;quot;forward&amp;quot; or &amp;quot;current&amp;quot;) used in the reading of the input tape. A special transition performs the return of the &amp;quot;forward&amp;quot; head onto the position of the &amp;quot;current&amp;quot; one. This permits to simulates the readlng of the empty string and places the abstract automaton or LT in the class of tile &amp;quot;sequential transducers&amp;quot; as defined in &lt;KAIN72&gt;;  2. use of the notions of attributes in the states. A state Is an etiquette with attributes. The values of some attributes are tested before a transition (condition) and the values of some attributes are changed after (actions). This theoretically increases the non-determinism of the automaton; 3. work on strings and not only on characters, which sets deflnitlveiy the automaton In the class of &amp;quot;sequential transducers&amp;quot;.</Paragraph>
    <Paragraph position="1"> 1. &amp;quot;BASIC&amp;quot; TRAN~JCER Transductlon may be regarded as a simultaneous operation of reading and writing, writing being a function of reading &lt;AHO,UL?2&gt;, &lt;CHAUCHE74&gt;. A transducer Is a machine with an input tape and an output tape.</Paragraph>
    <Paragraph position="3"> The bower of the LT automaton 1!; restrained to a transducer wlth the following characteristics: one input tape and one output tape; determinism; states defined by etiquettes and attributes; two reading heads.</Paragraph>
    <Paragraph position="4"> The abstract LT transducer may b(~ under-used as a deterministic ftnlte-state machine. So the class of languages which carl be analysed by LT comprises tl~e class of regular languages.</Paragraph>
    <Paragraph position="5"> On the contrary to what we wrote in &lt;ETS5&gt;, LF can be used to define an accepter of tile famous context-dependent language anbncn. It Is the semi-regularity which perlnlts to slmul~Lte stacks. This means that \[tie class of languages analysed by the abstract LT transducer comprises some of the cent ex t-dependent languages.</Paragraph>
    <Paragraph position="6"> Using the Chomsl{y hierarchy we say that I_T can analyse all the languages of class L3; some of the languages of class L~; to know If all laeguagl.~s In 1_2 can be analysed by LT Is an open pr ob 1 em; some of the languages of class I_1; II. SYNTAX OF. LT</Paragraph>
  </Section>
  <Section position="11" start_page="402" end_page="402" type="metho">
    <SectionTitle>
SUMMARY
</SectionTitle>
    <Paragraph position="0"> After&amp;quot; a presentation of the synta;~ of the strlngs, we introduce the definition of conditions and actions based on the attributes.</Paragraph>
    <Paragraph position="1"> With these three objects (strings, conditions and actions) we define the rules which serve to write the bundles.</Paragraph>
    <Paragraph position="2"> Finally, we sketch the general structure of a I.T program. Incidentally, the concrete syntax of LT has taken its Inspiration from tllat of Artane--78 &lt;DSEI&gt;. J-. ZHB_ _S_! R_I NG$_ A string Is a concatenation of simple strings. A simple strlng may be a string el characters or hexadecimal codes or special symbols for the end of the line and tile end of the file.</Paragraph>
    <Paragraph position="3"> Any strlng of a certain length may be read wtth help of a speclal designator.</Paragraph>
    <Paragraph position="4"> There exist three other conventions for the output tape to des I~nate the same string as read in input , or wlth letters only in UDDer-case or lll lower-case.</Paragraph>
  </Section>
  <Section position="12" start_page="402" end_page="402" type="metho">
    <SectionTitle>
2._ THE OON\[!\[TIONS AND~IO~S
</SectionTitle>
    <Paragraph position="0"> A cgnd l__t _l on Is a first order predicate on the attributes, expressed in the usual syntax (logical connectors: no, and, or; parentheses allowed). The attributes belong to one of the three classes: scalar, set or arithmetic (inferior to an upper bound).</Paragraph>
    <Paragraph position="1"> An acU_oP_ can be an assignment of a value to a variable, a l~st of actions carried out conditionally, or, a block containing a llst of actions.</Paragraph>
    <Paragraph position="2"> Thls notion is extended to three predeflned actions.</Paragraph>
    <Paragraph position="3"> The first has no impact at all oil the semantics of the transductton (displaying a message on an auxiliary file); the two others, on the contrary, are significant for the transduction (displaying a return code on the error file and stopping the transductlon; moving the &amp;quot;forward&amp;quot; head back to the position of the &amp;quot;current&amp;quot; head (seml-regularlty)).</Paragraph>
    <Paragraph position="4"> A rule describes a (;lass of transitions of the shape : input string / condition == output string / actions .</Paragraph>
    <Paragraph position="5"> the symbol ? at the head of the rule signifies that the inpqt string is to be read under the &amp;quot;forward 't head. The philosophy of LT |s to put together the possible passages from one etlquetto to another into a bundle of tile shape: de &lt;etlquettel&gt; a &lt;etiquette2&gt; via rlllel rule2 . . . ruleN</Paragraph>
  </Section>
  <Section position="13" start_page="402" end_page="402" type="metho">
    <SectionTitle>
4. GENERAL STRUC.TU, RE_OF A _T PROGRAM
</SectionTitle>
    <Paragraph position="0"> A LT prograln Is divided into sections.</Paragraph>
    <Paragraph position="1"> One mLISt give the lnitlal stair of the automaton. Others give the definition of att r!butes and their lelttallsatlon.</Paragraph>
    <Paragraph position="2"> Other optional sections define c~n~l~lons, aPSt.JQDs and LUleA which can be referred to by their names in the bundles.</Paragraph>
    <Paragraph position="3"> The other sectlons glve the bundles expltcltely. III~_IM@LEMENTAjION In order- to facl 11 late prograrrlYH ng in LT, an environment for this language was written tn Prolog-Cr I ss &lt;PROLOG85&gt; : The manager allows the manipulation of LT programs. The usua~ functions of an interactive environment (PROI_OG, APL) are defined: loading, saving, editing, ltsttng ....</Paragraph>
    <Paragraph position="4"> The compiler was Implemented with use of a generator Of aoa lyser s inspired from METAL &lt;METAL82&gt;, but less powerful.</Paragraph>
    <Paragraph position="5"> The Interpreter Is a mock-up in Prolog which works on the abstract trees resulting from com#t lat ion.</Paragraph>
    <Paragraph position="6"> The user must specify the files wbtch will be the input and output tapes, and the LT program to be interpreted. Interactive traces are possible. The design of a Pascal version of the Interpreter in order to increase the rate of execution is currently in work.</Paragraph>
  </Section>
  <Section position="14" start_page="402" end_page="403" type="metho">
    <SectionTitle>
IV. APPLICATIONS
!~EX~MPL# OF. EBO~BA~
</SectionTitle>
    <Paragraph position="0"> To Illustrate the syntax of LT, we glve a blece of the program for the analysis Of AnBnCn on the next page.</Paragraph>
  </Section>
  <Section position="15" start_page="403" end_page="403" type="metho">
    <SectionTitle>
2. TRA~SCRIPTIONS_FOR DIACRITICS LETTERS
</SectionTitle>
    <Paragraph position="0"> There exists in French a lot of diacritics and accents. In the frame of Eurotra, a transcrlptlon for the diacritics was proposed. 14ere ts a text in the Eurotra Short Transcription and its responding form In the actual French orthograph. The passage between the two forms was performed by a LT program.</Paragraph>
  </Section>
  <Section position="16" start_page="403" end_page="404" type="metho">
    <SectionTitle>
3~ PHYSICAL A N_~_GICAL PROCESSING OF TEXTS_
</SectionTitle>
    <Paragraph position="0"> The use o# the LT language is not limited to the transcrlbtlons; one of Its interesting features, and not the least one, is that physical and logical processing of texts coutd be. carried out with Its help.  In the previous text, the first two llnes correspond to formatting commands of SCRIBERE (a text formatting software developped at GETA and based on SCRIPT, an IBM text formatting software, &lt;SCRIBERE85&gt;) Transducers have been written which reflect tables oPS lnformations about punctuation, formatting commands and structural separators. Here Is the result of the application of the sequence of those transducers written In LT on the fallowlng text.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML