S$ I:Mct ur a 1 Cor r e AponfJet\]Ge S~ec \[ f I c_a t j O0_#DvLEonmer~ 
Yongfeng YAN 
Groupe d'Etudes pour la TradlJctlon Automatlque 
(GETA) 
B.P. 68 
Unlverslty of Grenoble 
38402 Saint Martln d'H~res 
FRANCE 
ABSTRACT 
This article presents tile Structural Correspondence 
Speclflcatlon Environment (S('SE) being Implemented at 
GETA - 
The SCSE is designed to help linguists to develop, 
consult and verify the SCS Gr'alt~nar s (SCSG) which 
specify I lngulst ic models. It Integrates the 
t eclln 1 clues of' data bases, structured edltors and 
language interpreters. We argue that formalisms and 
tools of specification are as Important as the 
specification itself. 
z NT ROD_UCT tqN 
For quite some time, It has been recognized that tile 
specification Is very important in tile development of 
large computer systems as well as the linguistic 
computer systenls. But it ls very difficult to make good 
use of specification without a well defined formalism 
and convenient tool. 
The Structural Correspondence Specification Gran~ar 
(SCSG) is a powerful linguist ic specification 
Formalism. the SCSGs were ftrst studied in S.Chappuy's 
thesis (1}, under the supervision of Professor 
B. VaLIqUOt s. In their paper presented at Colgate 
University in 1985 {6} SCSG was called Static Greener, 
as opposed to dynamic grammars which are executable 
programs, because the 8CSG aims at specifying WI4AT the 
linguistic models are rather than IIOW they are 
calculated, 
A SCSG describes a llnqulstlc medel by specifying 
the correspondence between the valid surface strings of 
words and the multi=level structures or a language. 
Thus, from a SCSG, one can obtain at the same tlme 
valid str lngs, valid structures and the relat ton 
between them. A SCSG can be used for the synthesis of' 
dynamic gralr~}lars (analyser and generator) and as a 
reference for large linguistic systems. An SOS 
Language (SCSL) has been designed at GETA, 
tn whlcll the SCSG can be \]lnearly written. 
The SCS Environment (SCSE) presented here ts a 
compLIter aided SCSG des lgn system. I t wl 1 1 al low 
lhlgulsts to create, modify, consult and verify their 
granlnars in a convenient way and therefore to augment 
their productivity. 
Sect 1on I gives a outline of the system: Its 
architecture, pr Inciple, data structure and comdnand 
syntax. Section II describes the malrl functions of the 
system. We conclude by gtvtng a perspective for luther 
developments el' the system. 
I=.AN OVERVIEW OF TI4E S YSTE_M 
1. ARC HI\]EC T URE 
The SCSE can be logically divided tn five parts: 1 
SCSG base 2. monitor 3. input 4. output 5. procedures 
The SCSG base consists of a set of files Contalnlng 
tile grarrlnars, lhe base has a hlerarchtca\] structure. A 
tree form directory describes tile relationship between 
the data of the base. 
The monitor Is the interface between the system and 
the user. It reads and analyses colTinands from the input 
and then calls the procedures to execute the cormlands. 
1he input is the support containing the COlrrnands to 
be executed and the data to update the base. rhere is a 
standard input (usually the keyboard) from which the 
data and cormlands sllould be read unless an Input ls 
explicitly specified by a conlnand. 
The output is a supper t receiving the system's 
dialogue messages and execution results. There is a 
standard output (usual ly the screen) to which tile 
message and results should be sent unless all output Is 
explicitly specified by a con~and. 
The procedures are the most irnportant part of tl~e 
systenl. It ls the execution of procedures that carries 
out a COn~land. The procedures can communicate directly 
wtth the user and with other procedures. 
2. THE_E.RJNCU}LE 
An SCSE session begins by loading the original SCSG 
base or the one saved from the last Session, Then the 
monitor reads lines from tile com~nand input and calls 
the corresponding procedures to execute the COmd~lands 
found. When an SCSE session Is ended by the colm~and 
"QUIT", the current state of tlqe base Is saved. The 
SCSG base can only be updated by the execution of 
c omrlland s, 
The original SCSG base contains two SCSGs : one 
describes the syntax of the SCSI_ and the other gives 
the correspondence between the directory's nodes and 
the syntactic units of the SCSL. The first gralmlar ls 
read-only but the second one can be modified by a user. 
This allows a user to have his prefered logical view 
over the base's physical data. These two grammars serve 
also as all Oil-line reference of the system. 
Several Interactive levels can be chosen by the user 
or by the system according to the number of errors in 
the con~aapd lines. The system sends a prc~npt message 
only when a "RETURN" ls met in the CO~nand lines. So 
gee carl avoid prompt messages by entering several 
cen~nands at a time. 
;3. DATA S:\[f~UCTURE 
There are two data structure levels. 
The lower one Is linear, supported by the host 
system. Tile base Is a set of files containing a llst of 
strings of characters. Tile base carl be seen as a single 
string of characters thai: Is the concatenation of all 
lines tn the ft\]es of the Llase so that tile structure is 
said to be llnear. TIlls structure is the physical 
structure. 
The higher one Is hierarchical , defined by the 
directory of the base. Tile base is composed of a 
number of SCSGs ; each gral~ar contains a declaration 
sect Ion, a rule (chart) sect Ion ... etc. and the 
components of a gran~nar (declarat 1Ol1 , rules . . . etc, ) 
have their own structure. The hierarchical structure 
ts the logical structure of the base. 
The directory has a tree form. A node In the tree 
represents a logical data unit that ts its content (for 
instance a gran~nar). Every node has a type and a list 
of attributes characterlslng the node's content, rhe 
lnternode's content is the composition of those of its 
descendents, \]he lear's content Is directly associated 
81 
with a physical data unit (a string o1' characters). 
The following figure shows the relation between the two 
structures. 
LOGICAL STRUCTURE (i) 
7 , 
2Y 
LOGICAL S'\[RUCTURE (2) language date 
\[Grammar English -----i 
node type attributes 
The directory is slmllar to a UNIX directory. But In 
our directory, tile leaves do not correspond to Flies 
but to loglcal data units and Furthermore an attribute 
list is attached to each node. The correspondence 
between two structures is maintained by SCSE. We shall 
see later that this organlsatlon allows a more 
efficient Information retrieval. 
It ls possible For" users to have access to the data 
by means of both structures. The logical one Is more 
convenient but the physical one may be more efficient in 
some cases. 
4:~ _COMMANp__SyNTAX 
The general command format is : 
<operator> <operand> <options> 
- The "operator" is a word or an abbreviation 
recalling the operation of tile colmland. 
The "operand" is a pattern giving the range OF 
the operation. 
The "options" is a list of optlonal parameters of 
the COw,land. 
For example, the Con~nand : V GRAMMAR ( LANGUAGE = 
ENGLISH ) 
visualizes, at the standard output, all the English 
grammars In the base. Here V is the operator, 
GRAMMAR(LANGUAGE=ENGLISti) ls tile operand pattern and no 
option Is given. 
The operand being mostly a node in the directory 
tree, the pattern is USUally a tree pattern. When the 
pattern matches a subtree of the directory, the part 
that matches a specially marked node Is the effective 
operand. 
The pattern is expressed by a geometric structure 
and a constraint condition. The structure ts a tree 
written in parenthesized form perhaps containing 
variables eacll representing a tree or a forest. The 
coeditlon Is a first order logic predicate In terms of 
the attributes of the nodes occurring in the geometric 
structure. More sophisticated conditions may be 
expressed by a predicate combined with geometric 
structure to efficiently select information from the 
base. 
Pattern writing should be reduced to a minimum. In 
the abeve example, the geometric structure is shnply a 
grammar type node and the constraint is the node's 
language attribote having the value= Erlgllsl\]. 
The use of a current node tn the directory allows 
not only the simplification of pattern writing but also 
the reduction of the pattern matching range. The 
effective operand becomes the new current node after 
the execution of a command. 
II. THE MAIN FUNCTIONS 
We shall Just descrlbe the functions ttiat seem 
essential, lhe functions may be divided Into four 
groups= 1. general 2. SCSG base updating 3. SCSG base 
inquiry 4. SCSG verification. 
_1 ~ _GI~ t>\[E__R A L _F U_N__C__T._I.D_N S 
These functions Include: SCSE session options 
setting, the system's miscellaneous lnformatlon inquiry 
and access to host system's commands. 
The following options can'be set by user co,hands: 
1. tnteractivtty 2. dlalogue language 3. 
auto--verlfilcatlon 4. session trace 5. standard 
Input/output. 
One of the 4 Following Interactive modes may be 
chosen: 1. non-interactive 2. brief 3. detalled 4. 
system controled. 
In non-interactive mode, no question is asked by tile 
system. An error con~and Is ignored and a message will 
be sent but the process continues. In brief mode, the 
current accesslble command names are displayed when a 
corm, and Is completed and a RETURN in the command lines 
is Found. In detailed mode, the functton and parameters 
of the accessible commands are displayed and 1F an 
error ts Found in the user's Input data, the system 
will diagnose it and help him to complete the command. 
A prompt message ls sent every time RETURN is Found in 
the COn~nand lines. In the system controlled mode, the 
lnteractlvlty Is dynamically chosen by tile system 
according to the system=user dialogue. 
For the tlme being, only French is used as the 
dialogue language. But the mu.ltl-langueage dialogue is 
taken tnto account tn design. It is simpler In PROLOG 
to add a new dialogue language. 
The auto-verification option Indicates whether the 
static coherence (see 4. SCSG verification) of a 
gra~nar will be verified each time it ls modified. 
The trace option is a switch that turns on or off 
the trace of the session. 
The standard Input/output option changes the 
standard input/output. 
Some Inquiries about the system's general 
Information, such as the current options and directory 
content, are also ~ncluded in this group of Functions. 
The access to host system's co~Ylands without leaving 
SCSE can augment the efficiency. But any object 
modlfted out of SCSE is consided no more coherent. 
2. SCSG BASE UPDATING 
This group of fiuectlons are: CREATE, COPY, CHANGE, 
LOCATE, DESTROY and MODIFY. \]hey may be found In all 
the classic editors or file management systems. The 
advantage of our system is that the operand of commands 
can be specified according to the logical structure of 
the base. 
For example, the col~nand : DESTROY CI4ARTS(TYPE=NP) 
Destroys all the charts which describe a Noun Phrase. 
82 
The SCSE has a syntaci Ic editor that knows the 
logical structure of the texts being edited. Ihls 
editor Is used by tile con'Jnands MODIF and CREATE. 
The command CREA1 <operand> <options> 
calls the edltor, creating a logical data unit 
specified by tile operaod. If the interactive option ts 
demanded, the editor will guide the user to write 
correct ly according to the nature of the data. 
Following the same tdea of different interact lye 
levels, we try to improve on tile classical structural 
editor, Per instance that of Cornell University \]\[5}, so 
that one carl enter a piece of text longer than that 
prompted by the system. If the interactive option Is 
not demanded, one Just enters into the editor wlth an 
empty work space. 
The CO~T~nand "MODIF <logical unit>" calls the 
system's edltor with the logical data untt as the 
workspace. The data ill the workspace may be displayed 
In a legible form which reflects Its logical structure. 
The mul t l-w \]ndows facll ity of the editor makes it 
possible to see simultaneously on tile screen the source 
text and tile text In structured form. 
The SCSE editor inherits the usual editing con~llands 
from the host editor. Thus one can change all the 
occurrences Of a rule's name fn a grarrnlar without 
cilanglng the strlngs containing the same characters, 
using a loglcal structure change : 
C NAME(type=rule) old name new _nan/e, 
while tile physlcal structure command : 
C/o 1 d.. name/new .name/* * 
changes all the strings "old_name" In the workspace by 
new name. 
When an obJect's deflnltloo Is modified, all Its 
occurrences may need to be revised and vice versa even 
if the modification does not cause a syntactic error. A 
structure location command flndlng the definition and 
all the occurrences of an object can be used In this 
case. 
Only tile logical units defined in the directory and 
the SCSL syntax can be manipulated by the structural 
COrr~land s. 
SCSGBA=SI~_INQUIRY 
These functions allow users to express what they are 
interested ill and to get the Inquiry results In a 
legible form. A part of the on-llne manual of usage in 
the form of SCSG may also be consulted by them. 
The operand patterns discussed above are used to 
select the relevant data. The operator and options of 
co~nands choose the output device and corresponding 
parameters. A parametered output form for each logical 
data unit has been defined. The data matching the 
operand pattern are shaped according to their output 
form. The data may of course be obtained in their 
source form. 
One may wish to examine an object at different 
levels (e.g. Just tile abstract or some comments). The 
options of the con~and can specify this. If one Just 
wants to change the current node in the directory for 
factlltatlng the following retrieval, the same locating 
co~nand as before may be used. 
4. SCSG VERIEICAT#ONS. 
Two klnds of verifications may be distinguished : 
static and dynamic. Tile static verification checks 
whether a grammar or a part of a gra~nar respects the 
syntax and semantics of the formalism. The dynamic 
verification tests whether a given gran'mnar specifies 
what we want It to. 
S tatlc_ve, r Ifica~ton 
All internal representation of the analyzed text ts 
produced and used by the system for structural 
manipulation, the analyser may produce a list of cross 
references of = nameable objects and a list of 
syntaxo-semantlc errors found In the text. The exemples 
of nameable objects are the charts, tile macros, the 
attributes. The list of cross-references reveals the 
objects which are used but never defined or those 
defined but never used. 
A chart may refer to other charts. This reference 
relation can be represented by an oriented graph where 
the nodes stand for a set of charts. A hlerarciltcal 
reference graph is often given before writing the 
charts. A program can calculate the effective graph of 
a grammar according to the result o£ analysis and 
compare It with the given one. 
The cornlland options may cancel the output of tllese 
two llsts and the graph calculat Ion. The graph 
calculation may also be executed alone. One of optlons 
Indicates whether the analysis wtll be Interactive. 
D.y.n ~!# J c. v. ~gr :1 f i canon 
Tile dynamic verification Is tile calculatlon of a 
subset of the st ring-tree relation defined by a 
gr altrnar. A member of the relation is a pair 
<string,tree>. Ti)e command gives the granYnar and the 
subset to be calculated. The subset may be one of the 
four following forms : 
I. a pair with a given string and a given tree (to 
see whether It belongs to the relation) 
2. pairs with a given string and an arbitrary tree 
3. pairs with an arbitrary string and a given tree 
4. all possible pairs 
rhe calculation is carried out by all interpreter. 
The user may give interpretation parameters Indicating 
interactive and trace modes, slze o£ the subset to be 
calculated and other constraints such as a list of 
passive (or active) charts during this interpretation, 
the depth and width of trees and length of the string 
etc.. 
As SCSGs are statlc gral~nars, no heuristic strategy 
wllt be used In the lnterprete's algorithm. So the 
interpretation will not be efficient. Since the goal ts 
rather to test gramnars than to apply them on a real 
scale, the efficiency of the interpreter Is of no 
import ance. 
CONCLUS I0N 
The system presented Is being implemented at GETA. 
In thls article, we Put emphasis on the system's design 
principles and specification rather tilan on the detalis 
of lmplementatlon. 
We have to1 lowed three widely recommended des ign 
principles: a} early focus on users and tasks, b) 
empirical measurement and c) Interactive design \]\[2\]\[. 
The specification of the functions are checked by 
the system's future users before implementation. The 
user's advice Is taken into account. This dtalogue 
continues during lhe implementation. The top-down and 
modular programming approaches are followed so ttlat, 
even 1f the Implementation ls not completly acilieved, 
the implemented part can still be used. 
The system Is designed for being rapidly implemented 
and east ly modt f led thanks to Its modular lty and 
especially to a htgh level logic programming language: 
PROLOG (3\]. We have tried our best to make the system 
as user-fr lendly as possible. The system's most 
remarkable character is that the users manage their 
data according to the loglcal structure adapted to tile 
human be I rig. 
What ts interesting In our system ls not that it 
shows sonle very original ideas or the most recent 
techniques In state-of-the-art but tt shows that tile 
combination of well-known techniques used orignally In 
different fields may flnd its application in other 
fields. 
83 
Long term perspectives of the system are numerous. 
Wlth the evaluation o£ the SCSG, some strategic and 
heuristic meta.-rules may be added to a grammar. 
Equipped by an expert system of SCSG, SCSE could 
lnterprete effclently a static grammar and synthetlse 
from It efficacious dynamic grammars. 
It Is also interesting to integrate into SCSE an 
expert system which could compare two SCSGs of two 
languages and produce a transfer grammar or' at least 
glve some advice for constructing it. 
Using its logical structure manipulation mechanism, 
SCSE can be extended to deal with other types of 
structured texts. Thanks to Its efficient Interpreter 
or in Cooperation with a powerful machine translation 
system such as ARIANE, SCSE could be capable of 
offering multi-llngual editing facilities (4~. 
-O--O--O-O--O-O-O-O- 
84 

BIBLIOGRAPHY

S.Chappuy, 
"Formallsatlon de la Description des Niveaux 
d'Intepretation des Langues Nature\]les. Etude 
Men~e en Vue de l'Analyse et de la G6n@ratlon au 
Moyeo de Transducteur.", 
Th~se de trotsl~me cycle & I'USMG-INPG, Juillet 
1983. 

2. John G. Gould and Clayton Lewis.
"Designing for Useabllity: Key Principles and 
What Designers Think", 
Co~nunIcatlon of the ACM, March 1985 Volume 28 N .

Ph. Donz, 
"PROLOGCRISS, une extention du langage PROLOG", 
CRISS, Unlverslte II de Grenoble, Verston 4.0, 
Juillet 1985. 

HEIDORN G.E., JENSEN K., MILLER L.A., BYRD R.J.0 
CHODOROW M.S., 
"The EPISTLE text-crltlauing system.", 
IBM Syst. Journal, 21/3, 1982. 

TEITELBAUM 1. et al, 
"The Cornell Program Synthesizer: a syntax 
directed pr'ogra~ntng environments. " , 
Co~nunicatlon of ACM, 24(9), Sept. 1981. 

VAUOOIS & S. CHAPPUY, 
"Static Gran~ars : a formalism for the 
descrlbtion of linguistic models", 
Proceedings of the conference on theoretical and 
methodological issues in machtne translation of 
natural language, Colgate University, Hamilton 
N.-Y., USA, August 14-16, 1985.
