Per~: Idngw for Parsing and ~ Transfer 
Kenneth R. Beesley David Hefner 
A.L.P. Systems 
190 West 800 North 
Provo, Utah 84604 USA 
Abstract 
PeriPhrase is a high-level computer language 
developed by A.L.P. Systems to facilitate parsing and 
structural transfer. It is designed to speed the 
development of cc~puter-assisted translation systems 
and grammar checkers. We describe the syntax and 
semantics of this tool, its integrated development 
environment, and some of our experience with it. 
I. IntroductiGn 
Up to 80% of the time needed to develop a new 
language pair for coni0uter translation is spent in 
writing source-language analysis and transfer 
programs. The PeriPhrase language and development 
environment were created to allow a computational 
linguist to write such programs more quickly, using 
high-level rules that are easily written, read, and 
debugged. 
The syntax of PeriPhrase was heavily influenced 
by its predecessor "PHRASE," which in turn borrowed 
from BNF, rule-based programming languages like 
PROLOG, and expert systems. There are obvious 
similarities to PARSIFAl., Marcus ' Deterministic 
Parser, and many ot/ler projects. It is perhaps true 
that few of the individual features of the language 
originated with us. However, we believe the synthesis 
of these features together with a very powerful 
debugging environment to be unique and significant, 
reflecting the practical needs of coni0utational 
linguists building large commercial systems. 
IL Per~ Syntax 
A PeriPhrase program consists of a declarations 
section followed by one or more rule packets. Each 
packet contains one or more rules. All the category 
names, variable names, attribute names, and action 
names used in the program must be declared, and the 
possible values for each attribute must be 
enumerated. As applying rules is a time-consuming 
process, packets of rules can be activated only as 
they are needed, either when a program starts or 
during execution. 
Simple PeriPhrase rules are composed of a 
pattern on the left side and a rewrite on the right 
side, separated by a rewrite operator. 
pattern => rewrite. 
PeriPhrase tries to match the pattern on the data 
being parsed. If the pattern matches, then the data 
is restructured or recoded according to the rewrite. 
One way of looking at rules is to see the pattern as 
a "before" snapshot and the rewrite as an "after" 
snapshot. 
The pattern is composed of one or more pattern 
elements, the simplest being a declared category 
name. The following are valid patterns: 
E~T AD/ N 
V NP 
NP VP 
The most common operation performed by PeriPhrase 
rules is siaple conflation, where all the data items 
matched by the pattern are made J/mnediate sons under 
a new father node. The following simple rule forms a 
noun phrase (NP). 
! 1 2 3 
D~T AD7 N --> NP\[I, 2, 3\]. 
A cc~ment line, preceded by an exclamation mark, is 
included in this example to highlight the ~tch 
units, which are always counted in strict 
left-to-right order. In the rewrite, the presence of 
the category name NP indicates the i~sertion of a 
node of that category. ~le square brackets following 
the NP indicate that it is to be a new father node. 
The numbers appearir~7 in the rewrite are formal 
pronotms referring back to the match units of the 
pattern. This rewrite indicates that the first, t/le 
second and the third match units (i.e. all the match 
units) are to be amde sons under a new NP node, ill 
the order indicated. When the rule fires, a tree like 
the following will be built. 
NP 
E~ AD7 N 
Many other m11es are constructed on the same pattern° 
V NP => VP\[I, 2\]° 
NP VP => S\[I, 2\]. 
Because simple conflation is so conm~n, the 
abbreviation \[..°\], which references all the match 
ttnits, is provided. The abbreviated rules below are 
completely equivalent to the rules just described. 
V NP => VP\[...\]. 
NP VP => S\[...\]. 
When explicit formal pronouns, rather than \[... \] 
are used, the omission of any formal pronoun causes 
the corresponding match unit to be deleted. The 
presence of a category name in the rewrite always 
causes an J~sertion, either of a new father node or a 
new terminal node. 
Simple conflation rules are muc/% like the 
context-free phrase-structure rules familiar to 
formal linguists, but PeriPhrase rules can also be 
context-sensitive. Suppose that we declared a 
category N V for marking noun-verb hom(m/raphs. (It 
should be ~phasized that all category names, and the 
significance given to them, are determined by the 
progranm~r. ) When a noun-verb homograph like "walk" 
occurs in the context "the walk," the following 
PeriPhrase rule will disambiguate it. 
! 1 2 
E~T NV --> 1 2:=N. 
That is, if an N V is found ~iately preceded by a 
E~T, that N_V (the second match unit) is 
recategorized as a noun (an N). 
390 
Pattern ele~ents can be preceded by a prefix, 
like the Klee/le Star, indicating that zero or more of 
the indicated item~ can appear in the data. 
Other prefixes available are i+, which indicates that 
one or more. of the matching items ~m~st appear, and 
0=i, which :hndicates optionality. 
Similar to the sinlole pattern elements based on 
a category name are WIID pattern elements, which will 
match a dat~ item of any category. The following rule 
matches whatever is left ~ fo~s it into a 
sentence. 
~WILD ~--> s\[..o\]. 
it is often convenient to co~%strain categories by 
specifyir~ attributes or "features" which must also 
match or not match. 
D~T(r~/3~plr~l) *AE~ N(r~m~Jngular) => 
~P\[... \] (mm~.r:----sin~iar). 
Ehe pattern element N(r~ing~lar) will match 
only if Per/Phrase finds an iteml of category ~\[ whose 
ntm~oer attribute is equal to 'singular.' The pattern 
element ~f(~plt~cal) will n~tch only if the 
item is of category E~T and the n~ attribute of 
the item is N~D equal to 'plura\]..' The := or 
assignment operator in t/~e rewrite indicates that an 
attribute is to be set to a particular value.. The 
rewrite ~P\[ .... \] (n~:=sJngular) indicates that an 
NP is to be. built tip in the way already described, 
and the r~ feature of tlle overall hrP is to ~ set 
to 'singular.' Attribute restrictions can be set for 
any sii~o\]e pattern element in the pattern, ~I 
attribute sett~K~S call be sE~cified for any inserted 
or pronoun-referenced item in the rewrite. The = and 
# signs can }~e iterated. 
D~'(nrm~--191u~l=~oth ) ! ei%1~ar 'plural' or ~both' 
D~T(nnmber#pl~l#both) ! nsither 'plural' nor 'both ~ 
W~en a pattern is being matched, variables can 
be "loaded" with the attribute values of items being 
matched. For exa~iole , the following pattern would 
cause variable X to be loaded with the value of the 
TKmtber attribute for the N and t/~e variable Y to be 
loaded with the case value. 
E~r *ALXr N(X:=number, Yt=case) => 
~P\[. o. \] (nu~er:=X, oase:=y). 
Inside a rewrite, attributes can also be set frc~ 
loaded variables, as in the example above, where the 
number and (~se of the head noun of a noun phrase are 
effectively passed up to the noun phrase itself. 
PeriPhrase also provides pattern el~ents more 
exotic than category names and WILD. An OR pattern 
ele/~lent, enclosed in curly brac~kets, l~atches when one 
of an enumerated set of possibilities is found. An 
exclusion pattern element, enclosed in angle 
brackets, ~m~tches ~le/~ none of an enume~rated set of 
possibiliti~ is fotnld. 
<N & Alia> 
! OR pattern element 
! exclusion pattern element 
As it is sc~tJ~es cenvenient to specify 
commence of patterns with/x~ patterns, PeriPhrase 
provides the subpattern, whose elephants are bounded 
by parentheses. The following example assuages that we 
have declared a category O~MA, which would be 
assigned to the ptnlctuation r~k of the s~ne name.. 
The second pattern elea~nt will ~tch zero or \]~:ce 
instances of the s~/bpattern (A~\[ 0=IOOMMA). 
i. 1 2 3 
*(An~ 0-1COMMA) N => NP\[I, 2, 3\]. 
Most powerful of all are the hierarchi~l 
pattern elements, which allow rules to match whole 
trees and subtrees that have l~.n built up previously 
during t/%e analysis. The follow;hlg exile will n~tc~h 
an NP which consists of a E~9, ~ ADE, and an N. 
! \] 2 3 4 
NP\[D~, AIIY~ N\] 
lhe transfer operations of inse~t:Lon ~x\] 
deletion have already \]:~aen me/%tioned. Transfer ~l 
also involve ~ordering and rest~Ict.uring. ~lle 
following rule is a simplified example of reordering 
for transferrJn~g f~xm~ EngliE'~, ~lere adjectiw~s 
generally precede the noun tJ%ey modify, to Fre/\]cJl, 
where the adjectives generally follow the noun. Norm 
the reordering of the third and fourth match unit~o 
i 1 2 3 4 
NP\[E~T, *ran, ~\] => l\[2, 4, 3\]. 
~he fo\]lowi~Ig t~es ~iow san~01e data before and after 
tAis rule ilas fired. 
NP NP 
.t / I~-. t / I "~~ E~T ALIY N E~' N ADE 
(the) (brc~;n) (dog) (the) (dog) (br~;n) 
It was recogniz~i from the beginning that 
PeriPhrase itself could not do everything and that it 
should not try to do everything necessary for 
analysis and transfer. To accommodate the need to 
integrate lower-level c~de, PeriPhrase allows the 
user to call actie~s, arbitrarily co1~iolex C prograr~, 
during l~eriPln.~ase processing. Actions appear 
optionally in rules, both after a pattern and after a 
rewrite, qhe following is a ~m~ple rule containing 
action calls to chec~k flag, prirrt ~ and 
set flag. 
E~ */Hg/ N; c~ak fl~tg(X) => 
NP\[... \] ; print a~=sage, set_flag(Y, Z). 
Actions can also be called as l~Ickets are entered a~l 
exited. Constani~ and variabl~3 (X, Y and Z in the 
example above) can optionally be ~Icluded in action 
calls as paramete/~. Because pa~\[meters are passed by 
address, action routines can change the value of 
variables in the calling PeriPhrase program. 
391 
V. Search Order 
The rules for each packet are retched 
left-to-right or right-to-left at the discretion of 
the pro~. In addition, programmers can 
optionally specify a trave/sal order for each packet, 
either preorder or postorder. If a traversal order is 
specified, the packet is ,,free," and PeriPhrase will 
search down inside tree structures already built up 
when trying to match rule patterns. Otherwise, a 
packet is "fixed," and the pattern-matching search is 
limited to the topmost visible roots of the trees 
already formed. 
v~. ~i~ity ~ ~lex la~les 
In most natural languages, especially written 
English, there are many genuinely ambiguous 
constructions where an analysis could go two or more 
ways. For example, the noun phrase the small car 
fac~cory is ambiguous as to whether the writer means a 
small factory that makes cars or a factory that makes 
s~all cars. The analysis chosen will make a big 
difference if the goal is translation into French or 
a similar language. 
Assuming that small is categorized as an AIIT and 
that car and factory are categorized as Ns, either of 
the following trees could be built. 
NP 
L~q' N Cthe  / 
ADJ N 
N N 
(car) (factory) 
NP 
n~T N (~) / ~ 
N N 
AD~ N 
(s~u) (car) 
A PeriPhrase pro~ might determine that one 
reading is statistically more ~n than the other 
and simply default every time to that one reading. 
Only one of the following two rules would appear in 
the gramm3r, depending on the reading desired. 
AIIT N N ---> NP\[I, N\[2, N\[3, 4\]\]\]. 
!sm~ll (car factory) 
AII7 N N => NP\[I, N\[N\[2, 3\], 4\]\]. 
! (small car) factory 
In a similar vein, the analysis could diverge into 
two parallel paths, and each structure and each 
analysis would be given a confidence rating. At the 
end, the analysis with the highest overall confidence 
rating would win. Another possibility is human 
interaction. 
These three possibilities, statistical 
defaulting, parallel processing, and interaction, are 
all responses to the same kind of problem: deciding 
how to make a genuine choice during analysis. In 
PeriPhrase, all three possibilities are a~ted 
by a single specialization, the compleK rule, which 
is perhaps the most novel feature of the language. A 
complex rule lists a set of possible rewrites, one 
for each alternate path. A rule to handle the small 
car factory structure is the following. 
392 
I~T ~ N N; action(X) => 
clxx~se(X) { NP\[I, N\[2, N\[3, 4\]\]\] I 
~p\[i, N\[N\[2, 3\], 4\]\] }. 
The rewrite section begins with the reserved word 
choose, which takes a ,,discriminator" variable, here 
X, as an argument. Following choose(X) is an OR list 
of possible rewrites, enclosed in curly brackets and 
separated by vertical lines. Where n is the value of 
X at the time of execution of the rewrite, the nth 
rewrite rule in the list is performed. 
Usually the variable used to choose a rewrite is 
set by an action routine in the same rule, but this 
is not required. Any variable can control the choice, 
and it could even be a reserved variable set to a 
desired default reading. 
The most straightforward way for an action to 
set the discr/iminator variable is to interact with 
the user. The choices would be presented in some menu 
form to the screen, and the user's answer would 
directly choose the rewrite. Actions could also be 
written to set the discriminator after performing 
complex syntactic and semantic checks. 
Setting the discrim/nator variable to 0 (zero) 
causes PeriPhrase to pursue both paths in a 
pseudo-parallel fashion. 
V~. ~ nevelq~t ~ 
The PeriPhrase user is provided a development 
environment which is designed to enhance productivity 
and shelter the user from irrelevant system-level 
details. The development environment consists of an 
editor, an incremental compiler, a source-level 
debugger, and a user--interface n~mUo 
From the menu, the user can edit any packet, 
which will be incrementally compiled when execution 
is restarted. The debugger allows the user to set 
virtually unlimited numbers of breakpoints on 
individual rules, packets, and actions. In addition 
to breakpointing, the user may examine the working 
memory (database), the production ~emory (the 
PeriPhrase source code), PeriPhrase variables, action 
parameters, and other data relevant to the state of 
the PeriPhrase program execution. PeriPhrase programs 
can be "animated." The debugger itself is 
conm~qnd-driven and user-customizable, with full macro 
capabilities. 
VIII. Cunclusion 
We at A.L.P. Systems are finding PeriPhrase to 
be a valuable software tool for building practical 
natural-language systems. An earlier version of the 
language, called PHRASE, is already being used in our 
translation products as part of the front-end 
routines that divide a text into sentences. An 
English analysis program has been started, and we 
already have a German analysis program with about 600 
rules in 80 packets. PeriPhrase is also being used in 
our Writing Aids division to build a grammar checker 
for English. We anticipate that PeriPhrase will be 
used increasingly over the coming years as A.L.P. 
Systems develops new products and expands its 
translation line to cover more language pairs. We 
also expect that PeriPhrase and its development 
environme/%t will continue to evolve within the 
established framework. 
