Generating Hebrew verb morphology by default inheritance hierarchies
Raphael Finkel
Department of Computer Science
University of Kentucky
raphael@cs.uky.edu
Gregory Stump
Department of English
University of Kentucky
gstump@uky.edu
Abstract
We apply default inheritance hierarchies
to generating the morphology of Hebrew
verbs. Instead of lexically listing each
of a word form’s various parts, this strat-
egy represents inflectional exponents as
markings associated with the application
of rules by which complex word forms
are deduced from simpler roots or stems.
The high degree of similarity among verbs
of different binyanim allows us to for-
mulate general rules; these general rules
are, however, sometimes overridden by
binyan-specific rules. Similarly, a verb’s
form within a particular binyan is deter-
mined both by default rules and by over-
riding rules specific to individual verbs.
Our result is a concise set of rules defin-
ing the morphology of all strong verbs in
all binyanim. We express these rules in
KATR, both a formalism for default in-
heritance hierarchies and associated soft-
ware for computing the forms specified by
those rules. As we describe the rules, we
point out general strategies for express-
ing morphology in KATR and we discuss
KATR’s advantages over ordinary DATR
for the representation of morphological
systems.
1 Introduction
Recent research into the nature of morphology sug-
gests that the best definitions of a natural lan-
guage’s inflectional system are inferential and real-
izational (Stump, 2001). A definition is inferential
if it represents inflectional exponents as markings
associated with the application of rules by which
complex word forms are deduced from simpler roots
and stems; an inferential definition of this sort con-
trasts with a lexical definition, according to which
an inflectional exponent’s association with a par-
ticular set of morphosyntactic properties is simply
stated in the lexicon, in exactly the way that the as-
sociation between a lexeme’s formal and contentive
properties is stipulated. In addition, a definition
of a language’s inflectional system is realizational
if it deduces a word’s inflectional exponents from
its grammatical properties; a realizational definition
contrasts with an incremental definition, according
to which words acquire morphosyntactic properties
only by acquiring the morphology which expresses
those properties.
The conclusion that inflectional systems should
be defined realizationally rather than incrementally
is favored by a range of evidence, such as the
widespread incidence of extended exponence in in-
flectional morphology and the fact that a word’s in-
flectional exponents often under-determine its mor-
phosyntactic content (Stump, 2001). Moreover,
inferential-realizational definitions can avoid certain
theoretically unmotivated distinctions upon which
lexical or incremental definitions often depend. For
instance, inferential-realizational definitions do not
entail that concatenative and nonconcatenative mor-
phology are fundamentally different in their gram-
matical status; they do not necessitate the postula-
tion of any relation between inflectional markings
and morphosyntactic properties other than the rela-
tion of simple exponence; and they are compatible
with the assumption that a word form’s morpholog-
ical representation is not distinct from its phonolog-
ical representation.
Various means of defining a language’s inflec-
tional morphology in inferential-realizational terms
are imaginable. In an important series of arti-
cles (Corbett and Fraser, 1993; Fraser and Cor-
bett, 1995; Fraser and Corbett, 1997), Greville Cor-
bett and Norman Fraser proposed Network Mor-
phology, an inferential-realizational morphological
framework that makes extensive use of nonmono-
tonic inheritance hierarchies to represent the infor-
mation constituting a language’s inflectional system.
Analyses in Network Morphology are implemented
in DATR, a formal language for representing lexi-
cal knowledge designed and implemented by Roger
Evans and Gerald Gazdar (Evans and Gazdar, 1989).
In recent work, we have extended DATR, creating
KATR, which is both a formal language and a com-
puter program that generates desired forms by inter-
preting that language.
In this paper, we show how KATR can be used to
provide an inferential-realizational definition of He-
brew verb morphology. Our objectives are twofold.
First, we propose some general strategies for ex-
ploiting the capabilities of nonmonotonic inheri-
tance hierarchies in accounting for the properties of
“root-and-pattern” verb inflection in Hebrew; sec-
ond, we discuss some specific capabilities that dis-
tinguish KATR from DATR and show why these
added capabilities are helpful to account for the He-
brew facts.
2 The pi‘el verb a0a2a1a4a3
The purpose of the KATR theory described here is to
generate perfect and imperfect forms of strong verbs
belonging to various binyanim in Hebrew. In par-
ticular, given a verbal lexeme L and a sequence a5
of morphosyntactic properties appropriate for verbs,
the theory evaluates the pairing of L with a5 as an
inflected verb form. For instance, it evaluates the
pairing of the lexeme a6a8a7a10a9 “speak” with the property
sequence <perfect 3 sg masc> as the verb
form a6a11a7a12a13 a9a15a14 “he spoke”.
A theory in KATR is a network of nodes; the
network of nodes constituting our verb morphology
theory is represented in Figure 1. The overarching
organizational principle in this network is hierarchi-
cal: The tree structure’s terminal nodes represent in-
dividual verbal lexemes, and each of the nontermi-
nal nodes in the tree defines default properties shared
by the lexemes that it dominates. The status of the
boxed nodes is taken up below.
Each of the nodes in a theory houses a set of rules.
We represent the verb a6a11a7a16a9 by a node:
Speak:
<root> = a9a17a7a18a6 % 1
<> = PIEL % 2
.
The node is named Speak, and it has two rules, ter-
minated by a single dot. Our convention is to name
the node for a verb by a capitalized English word
representing its meaning. We use KATR-style com-
ments (starting with % and continuing to the end of
the line) to number the rules so we can refer to them
easily.
Rule 1 says that a query asking for the root of this
verb should produce a three-atom result containing
a9 , a7 , and a6 . Our rules assemble Hebrew words in
logical order, which appears in this document as left-
to-right. We accomplish reversal by rules in a RE-
VERSE node, not shown in this paper.
Rule 2 says that all other queries are to be referred
to the PIEL node, which we introduce below.
A query is a list of atoms, such as <root>
or <vowel2 perfect 3 sg masc>; in our
theory, the atoms generally represent form cate-
gories (such as root, binyanprefix, vowel1,
cons2), morphosyntactic properties (such as per-
fect, sg, fem) or specific Hebrew characters.
Queries are directed to a particular node. The query
directed to a given node is matched against all the
rules housed at that node. A rule matches if all the
atoms on its left-hand side match the atoms in the
query. A rule can match even if its atoms do not
exhaust the entire query. In the case of Speak, a
query <root perfect> would match both rules,
but not a rule begining with <spelling>. When
several rules match, KATR picks the best match, that
is, the one whose left-hand side “uses up” the most
of the query. This algorithm means that Rule 2 of
Speak is only used when Rule 1 does not apply,
Separate FinishSlaughter Doom SlipSpeak
VERBPREFIX
VERBSUFFIX
STEM
ROOT1
ROOT2
ROOT3
PIEL HIPHIL NIPHAL PUAL HOPHAL HITHPAEL
VERB
ACCENT
QAL1 QAL2 QAL3
Guard BeHeavy BeSmall
QAL
Figure 1: A network of nodes for generating forms of strong verbs in seven binyanim.
because Rule 1 is always a better match if it applies
at all. Rule 2 is called a default rule, because it
applies by default if no other rule applies. Default
rules define a hierarchical relation among some of
the nodes in a KATR theory; thus, in the tree struc-
ture depicted in Figure 1, node X dominates node Y
iff Y houses a default rule that refers queries to X.
KATR generates output based on queries directed
to nodes representing individual words. Since these
nodes, such as Speak, are not referred to by other
nodes, they are called leaves, as opposed to nodes
like PIEL, which are called internal nodes.
Here is the output that KATR generates for the
Speak node and various queries.
Speak:<perfect sg 3 masc> a19a20a6a11a7a12a13 a9a21a14
Speak:<perfect sg 3 fem> a19a23a22a24a6a25a26a7a12a27a28 a9a21a14
Speak:<perfect sg 2 masc> a19a23a29a30a12a25a8a6a27 a7a12a31a32a9a21a14
Speak:<perfect sg 2 fem> a19a23a29a30a12a27 a6a27 a7a12a31 a9a15a14
Speak:<perfect sg 1 masc> a19a34a33a35a29a12a14a36a6a27 a7a12a31a32a9a15a14
Speak:<perfect sg 1 fem> a19a34a33a35a29a12a14a36a6a27 a7a12a31a32a9a15a14
Speak:<perfect pl 3 masc> a19a34a37a12a38a6a11a7a12a27a28a39a9a21a14
Speak:<perfect pl 3 fem> a19a34a37a12 a6a11a7a12a27a28 a9a21a14
Speak:<perfect pl 2 masc> a19a34a40a10a29a12a41 a6a27 a7a12a31 a9a21a14
Speak:<perfect pl 2 fem> a19a34a42a38a29a12a41 a6a27 a7a12a31 a9a21a14
Speak:<perfect pl 1 masc> a19a34a37a12a15a43a39a6a27 a7a12a31a32a9a21a14
Speak:<perfect pl 1 fem> a19a34a37a12a15a43a39a6a27 a7a12a31a32a9a21a14
Speak:<imperfect sg 3 masc> a19a23a6a11a7a12a13 a9a31 a33a27a28
Speak:<imperfect sg 3 fem> a19a23a6a11a7a12a13 a9a31 a29a12a27a28
Speak:<imperfect sg 2 masc> a19a23a6a11a7a12a13 a9a31 a29a12a27a28
Speak:<imperfect sg 2 fem> a19a20a33a38a6a14a38a7a12a27a28a39a9a31 a29a30a12a27a28
Speak:<imperfect sg 1 masc> a19a23a6a11a7a12a13 a9a31a45a44a27a31
Speak:<imperfect sg 1 fem> a19a23a6a11a7a12a13 a9a31a45a44a27a31
Speak:<imperfect pl 3 masc> a19a20a37a12 a6a11a7a12a27a28a39a9a31 a33a27a28
Speak:<imperfect pl 3 fem> a19a46a22a10a43a25a36a6a27 a7a12a13a32a9a31 a29a12a27a28
Speak:<imperfect pl 2 masc> a19a20a37a12 a6a11a7a12a27a28 a9a31 a29a12a27a28
Speak:<imperfect pl 2 fem> a19a46a22a10a43a25a36a6a27 a7a12a13a32a9a31 a29a12a27a28
Speak:<imperfect pl 1 masc> a19a23a6a11a7a12a13 a9a31 a43a27a28
Speak:<imperfect pl 1 fem> a19a23a6a11a7a12a13 a9a31 a43a27a28
Our theory represents Hebrew characters and
vowels in Unicode characters (Daniels, 1993). We
use ´ to indicate the accented syllable if it is not the
ultima, and we mark shewa na bya47 .
The rule for Speak illustrates one of the strate-
gies upon which we build KATR theories: A node
representing a category (here, a particular verb) may
provide information (here, the letters of the verb’s
root) needed by more general nodes (here,PIELand
the nodes to which it, in turn, refers). We refer to this
strategy as priming. As we see below, rules in the
more general nodes refer to primed information by
means of quoted queries.
3 The PIEL node
We now turn to thePIEL node, to which theSpeak
node refers.
PIEL:
<> = VERB % 1
<cons2> = ROOT2:<"<root>"> a12 % 2
a48 binyanprefix perfect
a49 = % 3
a48 binyanprefix imperfect
a49 = a27 % 4
a48 binyanprefix imperfect 1 sg
a49 = a27a31 % 5
a48 vowel1 perfect
a49 = a14 % 6
a48 vowel1 imperfect
a49 = a31 % 7
a48 vowel2 perfect 3 sg masc
a49 = a13 % 8
a48 vowel2 perfect
a49 = a31 % 9
a48 vowel2 imperfect
a49 = a13 % 10
.
As with the Speak node, PIEL defers most
queries to its parent, in this case the node called
VERB, as Rule 1 indicates.
Rule 2 modifies a default that VERB will use,
namely, the nature of the second consonant of the
root. Pi‘el verbs double their second consonant by
applying a dagesh. This rule exemplifies a second
strategy of KATR theories: A node representing a
specific category (here, pi‘el verbs) may override in-
formation (here, the nature of the second consonant)
that is assumed by more general nodes (here, VERB
and the nodes to which it, in turn, refers). We refer
to this strategy as overriding. Rule 2 is an overrid-
ing rule because the value it assigns to the sequence
<cons2> is distinct from the value assigned at the
VERBnode to whichPIELrefers queries by default.
We momentarily defer discussing the strange right-
hand side of this rule.
The other rules in PIEL are all priming rules.
Instead of using angle brackets (“<” and “>”) to
match queries, they use braces (“a50 ” and “a51 ”). This
syntax causes the left-hand side of a rule to be
treated as a set instead of an ordered list. The rule
whose left-hand side is a50 binyanprefix per-
fecta51 matches any query containing both the atom
binyanprefix and the atom perfect, in any
order. As before, more than one rule can match a
given query, and the rule with the most comprehen-
sive match is chosen. If there are equally good best
rules, the KATR theory is considered malformed.
In formulating Rules 3a52 5, we assume a distinc-
tion between binyan prefixes (specific to particular
binyanim) and the personal prefixes (which cross-
cut the various binyanim); thus, the form a6a8a7a12a13 a9a31 a43a27 “we
will speak” contains the binyan prefix a27 and the per-
sonal prefix a43 .
An empty right-hand side in a rule means that the
result of a matching query is the empty string. In
particular, Rule 3,
a48 binyanprefix perfect
a49 =
indicates that there is no binyan prefix for pi‘el verbs
in the perfect form, in contrast to, for instance, hif‘il
verbs. The next two rules indicate the binyan prefix
for a pi‘el verb’s imperfect forms. By Rule 4, this
prefix is generally shewa ( a27); but because the per-
sonal prefix a44 cannot co-occur with the binyan prefix
shewa, Rule 5 specifies a different binyan prefix for
a pi‘el verb’s first-person singular imperfect form.
(We can adjust the combination a44a27 to a44a27a31 as a postpro-
cessing step instead, as we show later when we treat
guttural letters.)
Every form of a verb separates the three letters
of the root by two vowels, which we call vowel1
and vowel2. The pi‘el is characterized by the fact
that in the imperfect, these vowels are the patah. (by
Rule 7) and the tseyre (by Rule 10), as in a6a11a7a12a13 a9a31 a43a27 “we
will speak”; in the perfect, they are instead generally
the h. iriq (by Rule 6) and the patah. (by Rule 9), as
in a37a12 a43a39a6a27 a7a12a31a53a9a15a14 “we spoke”. There is an exception in the
perfect third singular masculine (a6a11a7a12a13 a9a21a14 ), as specified
in Rule 8.
Rules 5 and 8 are examples of a third strategy for
building KATR theories: A rule may show an excep-
tion to a more general pattern introduced by another
rule housed at the same node. For instance, Rule 8
establishes a special value forvowel2for one com-
bination of person, number, and gender, supplanting
the more typical value for vowel2 established for
imperfect forms by Rule 9. We refer to this strategy
as specializing.
We now revisit the strange right-hand side of
Rule 2. The term on its right-hand side is a node
name (ROOT2), a colon, and new query to present
to that node. The new query involves a quoted path,
"<root>". KATR treats quoted paths in this con-
text as queries on the node from which we started,
that is, Speak. In our case, the right-hand side of
this rule is equivalent to ROOT2:<a9a54a7a55a6 >, because
of the first rule in the Speak node.
ROOT2 is one of a family of three nodes each
of which isolates a particular consonant in a verb’s
triliteral root.
#vars $consonant: a44 a7a57a56a58a9a59a22a57a37a54a60a62a61a57a63a64a33a66a65a68a67
a69a71a70
a40a59a43a72a42a4a73a75a74a17a76a75a77a68a78a57a79a59a80a71a6a75a81a82a81
a12
a29a83a29a30a12 .
ROOT1: <$consonant#1 $consonant#2
$consonant#3> = $consonant#1 .
ROOT2: <$consonant#1 $consonant#2
$consonant#3> = $consonant#2 .
ROOT3: <$consonant#1 $consonant#2
$consonant#3> = $consonant#3 .
The #vars declaration introduces a class of
atoms: Hebrew consonant characters. Each of the
three ROOT nodes has a single rule that matches a
three-consonant sequence, assigning each member
of the sequence a local number. The rule selects one
of those consonants as the result.
These three nodes follow a fourth strategy for
writing KATR theories: A node may be invoked
solely to provide information (here, a particular con-
sonant in a verb’s root) needed by other rules. We
refer to this strategy as lookup. Lookup nodes (such
as the boxed nodes in Figure 1) do not participate
in the hierarchical relationships defined by the net-
work’s default rules.
To demonstrate that the PIEL node character-
izes its binyan, we present the somewhat simpler
HOPHAL node as a point of comparison.
HOPHAL:
<> = VERB % 1
a48 binyanprefix perfect
a49 = a22a71a25 % 2
a48 binyanprefix imperfect
a49 = a25 % 3
a48 vowel1
a49 = a27 % 4
a48 vowel2
a49 = a31 % 5
.
4 The VERB node
Queries on Speak are generally reflected to its
parent, PIEL, which then reflects them further to
VERB.
VERB:
<cons1> = ROOT1:<"<root>"> % 1
<cons2> = ROOT2:<"<root>"> % 2
<cons3> = ROOT3:<"<root>"> % 3
a48 shortvowel2
a49 = a27 % 4
<> = ACCENT:<VERBPREFIX STEM
VERBSUFFIX endofword> % 5
.
Rules 1a52 3 of VERB determine the three conso-
nants of the root if they have not already been de-
termined by earlier processing. In the case of pi‘el
verbs, <cons2> has been determined (by Rule 2
at the pi‘el node), but the other consonants have not.
That is, if we pose the querySpeak:<cons2>, the
Speak node reflects it to the PIEL node, which re-
solves it. But the query Speak:<cons3> is not
resolved by PIEL; it is reflected to VERB, which re-
solves it now by means of lookup.
Rule 4 introduces a priming that is needed by the
lookup node STEM: Usually, the shortened version
of <vowel2> is the shewa. In one binyan, namely
hif‘il, the shortened version of<vowel2> is special
and overrides this priming.
Rule 5 is the most complicated. It exemplifies
two more strategies of programming KATR theo-
ries: (1) Combining: It combines various pieces of
morphology, namely those represented by the nodes
VERBPREFIX, STEM, and VERBSUFFIX, each of
which is referred to by VERB, and (2) Postprocess-
ing: It presents the entire result of that combina-
tion to a postprocessing step represented by the node
ACCENT.
Combining works by invoking each of the
nodes VERBPREFIX, STEM, and VERBSUF-
FIX with the query presented originally to
Speak; such a query might be, for example,
Speak:<imperfect sg 3 masc>. (The fact
that no query list is explicitly presented to those
nodes implies that KATR should use the original
query.)
5 Nodes for stems and affixes
Verbs in the imperfect take personal prefixes.
VERBPREFIX:
a48 perfect
a49 = % 1
a48 imperfect 1 sg
a49 = a44 % 2
a48 imperfect 2 sg
a49 = a29a12 % 3
a48 imperfect 3 sg masc
a49 = a33 % 4
a48 imperfect 3 sg fem
a49 = a29a12 % 5
a48 imperfect 1 pl
a49 = a43 % 6
a48 imperfect 2 pl
a49 = a29a12 % 7
a48 imperfect 3 pl masc
a49 = a33 % 8
a48 imperfect 3 pl fem
a49 = a29a12 % 9
.
We choose not to include the vowel following the
prefix as part of this node, but rather as part ofSTEM.
Such decisions are common in cases of combining;
it often makes little difference whether such “bound-
ary” markers are placed at the end of one combining
formative or the start of the next one.
Rule 1 indicates that for all queries containing the
atom perfect, there is no verb prefix. This single
rule concisely covers many cases, which are implic-
itly included because the atoms pertaining to num-
ber, person, and gender are omitted. The other rules
all apply to the imperfect tense. In the first and sec-
ond person, the prefix is independent of gender, so
the rules there are shorter, again concisely covering
multiple cases with only a few rules.
Suffixes have a similar node; here we choose to
include the vowel that separates the suffix from the
stem.
VERBSUFFIX:
a48 perfect 1 sg
a49 = a27 a29a12 a14a33 @ % 1
a48 perfect 2 sg masc
a49 = a27 a29a12 a25 @ % 2
a48 perfect 2 sg fem
a49 = a27 a29a12a84a27 % 3
a48 perfect 3 sg masc
a49 = a27 % 4
a48 perfect 3 sg fem
a49 = a25a36a22 % 5
a48 perfect 1 pl
a49 = a27 a43a54a37a12 @ % 6
a48 perfect 2 pl masc
a49 = a27 a29a30a12 a41 a40 % 8
a48 perfect 2 pl fem
a49 = a27 a29a12 a41 a42 % 9
a48 perfect 3 pl
a49 = a37a12 % 10
a48 imperfect sg
a49 = a27 % 11
a48 imperfect 2 sg fem
a49 = a14a33 % 12
a48 imperfect 1 pl ++
a49 = a27 % 13
a48 imperfect pl masc
a49 = a37a12 % 14
a48 imperfect pl fem
a49 = a27 a43a66a25a85a22 @ % 15
.
Rules 1, 2, 6, and 15 include the @ character,
which we use to indicate that the given syllable
should not be accented. Hebrew words are gener-
ally accented on the ultima; we place @ on the ul-
tima to force the accent to the penultima. Placing of
accents is one of the jobs relegated to the postpro-
cessing step.
The left-hand side of rule 13 includes the symbol
++. This symbol tells KATR that even if another,
seemingly better rule matches a query, this rule
should take precedence if it matches. The situation
arises for the query <imperfect pl 1 masc>,
for instance. Both rules 13 and 14 match, but the for-
mer is preferred. The other way we could have rep-
resented this situation is by restricting rule 14 to 2nd
or 3rd person, either by explicitly indicating these
morphosyntactic properties or by adding the atom
!1, which means “not first person”. We choose to
use the disambiguator ++ in Rule 13 instead; in the
terminology of (Stump, 2001), the ++ symbol iden-
tifies rules that apply in “expanded mode”.
The most complex node defines the stem part of a
verb.
STEM:
<> = "<binyanprefix>" "<cons1>"
"<vowel1>" "<cons2>" <anyvowel2>
"<cons3>" % 1
<anyvowel2> = "<vowel2>" % 2
a48 anyvowel2 perfect 3 sg fem
a49 =
"<shortvowel2>" % 3
a48 anyvowel2 perfect 3 pl
a49 =
"<shortvowel2>" % 4
a48 anyvowel2 imperfect 2 sg fem
a49 =
"<shortvowel2>" % 5
a48 anyvowel2 imperfect !1 pl masc
a49 =
"<shortvowel2>" % 6
.
Rule 1 uses combining to assemble the parts of
the stem, starting with the binyan prefix, then alter-
nating all the consonants and vowels. Most of these
parts are surrounded in quote marks, meaning that
these elements are queries to be reflected back to the
starting node, in our case, Speak. These queries
percolate through Speak, PIEL, and VERB until a
priming rule satisfies them.
The only exception is that instead of <vowel2>,
this rule queries <anyvowel2> without quote
marks. The absence of quote marks directs this
query to the current node, that is, STEM; the remain-
ing rules determine what vowel is appropriate.
Rule 2 indicates that unless another rule is bet-
ter, anyvowel2 is just vowel2. However, in
four cases, vowel2 must be replaced by short-
vowel2, typically shewa (primed by the VERB
node), but occasionally something else (overridden
by hif‘il verbs).
6 Postprocessing
Many languages have rules of euphony. These
rules are often called sandhi operations, based on
a term used in Sanskrit morphology. We use the
node ACCENT to introduce sandhi operations. Its
name comes from the fact that the first operation we
needed was to place the accent on the penultima, but
we use it for other purposes as well.
We begin by defining character classes similar to
the $consonant class introduced earlier.
#vars $vowel: a31 a25a66a25a85a22 a27 a37a12a4a86a87a14a88a14a33 a12 a41 a13 a27a31 .
#vars $accent: ´ .
#vars $unaccentableVowel: a27 .
#vars $accentableVowel: $vowel -
$unaccentableVowel .
#vars $letter: $vowel + $consonant +
$accent .
#vars $noAccent: $letter -
$accentableVowel .
Each class contains a subset of the Hebrew char-
acters. We treat some combinations as single charac-
ters for this purpose, in particular, the vowels a25a39a22 and
a14a33 . The first three classes are defined by enumeration.
The fourth class, $accentableVowel, is defined
in terms of previously defined classes, specifically,
all vowels except those that are unaccentable. Simi-
larly, the$letterclass includes all vowels, conso-
nants, and accents, and the $noAccent class con-
tains all letters except for accentable vowels. These
classes are used in the ACCENT node.
ACCENT:
<$letter> = $letter <> % 1
<endofword> = % 2
<$accentableVowel#1 $noAccent*
$accentableVowel#2 @> =
$accentableVowel#1 ´ $noAccent*
$accentableVowel#2 <> % 3
<a27 endofword> = % 4
<a67 a27 endofword> = a67 a27 % 5
<a27 $consonant a27 endofword> = a27
$consonant a27 % 6
.
A query to ACCENT is a fully formed Hebrew
word ready for postprocessing, with the endof-
word tag placed at the end. The first rule is a de-
fault that often is overridden by later rules; it says
that whatever letter the query starts with, that let-
ter can be removed from the query, and placed as a
result. Furthermore, the unmatched portion of the
query, indicated by <> on the right-hand side, is to
be directed to the ACCENT node for further process-
ing. Rule 2 says that if a resulting query has only
endofword, that tag should be removed, and no
further processing is needed.
Rule 3 places accents in words that contain the @
sign, which we use to indicate “do not accent this
syllable.” The left-hand side matches queries that
contain an accentable vowel, followed by any num-
ber (zero or more, indicated by the Kleene star *)
of letters that cannot be accented, followed by a sec-
ond accentable vowel, followed by the@ mark. Such
words must have the @ removed and an accent mark
placed after the first accentable vowel matched, as
indicated in the right-hand side. The empty<>at the
end of the right-hand side directs unused portions of
the query to ACCENT for further processing.
Rules 4, 5, and 6 deal with shewa near the end
of a word. Generally, shewa is deleted at the very
end (rule 4), but not if it follows a67 (rule 5) or if the
previous vowel is also a shewa (rule 6).
7 Accommodating guttural letters
Our current efforts involve accommodating verb
roots containing guttural letters. We have found that
new rules in the postprocessing step, that is, theAC-
CENT node, cover many of the cases.
We first introduce postprocessing rules that con-
vert shewa nah. (which we continue to represent as a27)
to shewa na (which we represent asa47 ).
#vars $longVowel: a25a89a37a12 a14a33
a12
a13
a12
a44 .
ACCENT:
... % other rules as before
<startofword $consonant $dagesh? a27/a28 >
=+= <> % 8
<a27/a28 $consonant#1 $dagesh? a27
$consonant#2> =+= <> % 9
<$longVowel $consonant a27/a28 > =+= <>
% 10
<$consonant $dagesh a27/a28 > =+= <> % 11
<$consonant#1 $dagesh?
a27/a28 $consonant#1> =+= <> % 12
Rule 8 converts shewa nah. to shewa na on the
first consonant of the word. We introduce the atom
startofword in order to detect this situation, and
we modify the reference to the ACCENT node in the
VERB node to include this new atom. This rule uses
=+= instead of = to separate the two sides. This
notation indicates a non-subtractive rule; the right-
hand side path encompasses the entire query, includ-
ing that part matched by the left-hand side, except
that the shewa nah. has been replaced by shewa na.
After this replacement, KATR continues to process
the new query at the same node. The left-hand side
uses the ? operator, which means “zero or one in-
stances.” This notation allows a single rule to match
situations both with and without a dagesh.
The other rules use similar notation. Rule 9 con-
verts the first of two shewas in a row to a shewa na,
except at the end of the word. Rule 10 converts a
shewa nah. following a long vowel. Rule 11 converts
a shewa nah. on a consonant with a dagesh. Rule 12
converts the shewa nah. on the first of two identical
consonants.
Given the distinction between the two shewas, we
now add postprocessing rules that convert a guttural
with a shewa na to an appropriate alternative.
#vars $guttural: a74a18a22a18a61 a44 .
ACCENT:
... % other rules as before
<$guttural a28 > = $guttural a27a31 <> % 13
<a14 $guttural a28 > = a31 $guttural a27a31 <> % 14
<a41 $guttural a28 > = a41 $guttural a27a41 <> % 15
<a14 a44 a27 $letter> = a12 a44 <$letter> % 16
<a44 a41 a44 a27> = a44 a12 <> % 17
Rule 13 corrects, for example, a37a12a21a63a90a61a27 a81a25 a12 to a37a12a21a63a90a61a27a31 a81a25 a12;
Rule 14 corrects a9 a70a12 a74a27 a29a12a14 to a9 a70a12 a74a27a31 a29a12a31 , and Rule 15 corrects
a9
a70
a12
a74a27 a44a41 to a9
a70
a12
a74a27a41 a44a41 . Rules 16 and 17 correct the initial a44
in a44 ”a76 verbs in the qal.
We add other rules, such as the following Rule 18,
to correct situations where a guttural letter would
otherwise acquire a dagesh.
< a14 $guttural a12> = <a13 $guttural> % 18
We have not begun work on weak verbs contain-
ing a22 , a37 , and a33 , which might require different ap-
proaches.
8 Further work
We continue to develop our Hebrew KATR theory.
Our goal is to cover all forms, including the waw
consecutive, infinitive, makor, and predicate suf-
fixes, for both strong and weak verbs. We will then
turn to nouns, including personal suffixes. Our suc-
cess so far indicates that KATR is capable of repre-
senting Hebrew morphology in a concise yet read-
able form.
Our larger goal is to host a library of KATR theo-
ries for various languages as a resource for linguists.
Such a library will provide interested researchers
with morphological descriptions that can be directly
converted into actual word forms and will serve as
a substitute, to some extent, for voluminous natural-
language and table-based descriptions. In the case of
endangered languages, it will act as a repository for
linguistic data that may be essential for preservation.
9 DATR and KATR
We discuss KATR and its relation to DATR exten-
sively elsewhere (Finkel et al., 2002); here we only
summarize the differences. The DATR formalism is
quite powerful; we have demonstrated that it is capa-
ble of emulating a Turing machine. The KATR en-
hancements are therefore aimed at usability, not the-
oretical power. The principal innovations of KATR
are:
a91 Set notation. The left-hand sides of DATR rules
may only use list notation. KATR allows set
notation as well, which allows us to deal with
morphosyntactic properties in any order.
Hebrew verb morphology provides abundant
motivation for this enhancement. In theVERB-
SUFFIX node, Rule 15 identifies a22a10a43a25 a27 as an ex-
ponent of number and gender but not of person;
Rule 10 identifies a37a12 as an exponent of person
and number but not of gender. Both rules are
indifferent to the order in which properties of
person, number, and gender are listed in any
matching query. If a rule’s left-hand side were
required to be a list (as in ordinary DATR), then
one of these two rules would have to be compli-
cated by the inclusion of either a variable over
properties of person (Rule 15) or a variable
over properties of gender (Rule 10); moreover,
all queries would have to adhere to a fixed (but
otherwise unmotivated) ordering among prop-
erties of person, number, and gender.
a91 Regular expressions. KATR allows limited reg-
ular expressions in lists in left-hand sides of
rules; DATR has no such expressions. We use
this facility in the ACCENT node in the Hebrew
theory, both for the Kleene star * and for the ?
operator. More generally, we often find regular
expressions valuable in representing non-local
sandhi phenomena, such as the Sanskrit rule of
n-retroflexion.
a91 Non-subtractive rules. DATR rules have a
subtractive quality: The atoms of the query
matched by the left-hand side are removed
from the query used for subsequent evaluation
in the right-hand side. The KATR =+= opera-
tor allows us to represent rules that preserve the
atoms matched by the left-hand side, substitut-
ing new atoms where necessary. We generally
use this facility for rules of referral. For exam-
ple, Latin neuter nouns share the same nomina-
tive and accusative plural; we capture this fact
by a rule that converts accusative to nominative
in the context of neuter plural. In the Hebrew
theory, we use non-subtractive rules to convert
shewa nah. to shewa na.
a91 Enhanced matching length. In some cases,
competing rules have left-hand sides of the
same length, but one of the rules should always
be chosen when both apply. KATR includes the
++syntax for explicitly enhancing the effective
length of the preferred left-hand side; we use
this facility in the VERBSUFFIX node. DATR
does not have this syntax.
a91 Syntax. KATR has several minor syntax en-
hancements. It allows special characters to be
used as atoms if escaped by the \ character.
The atom $$ can be used to match the end
of the query. Variables can be computed in-
stead of being enumerated; we use this facility
in defining the $letter variable. KATR al-
lows greater control over which nodes are to be
displayed under default queries. The interac-
tive KATR program has new facilities for rapid
testing and debugging of theories.
KATR is entirely coded in Java, making it quite
portable to a variety of platforms. It runs as an inter-
active program, with commands for compiling the-
ories, executing queries, and performing various de-
bugging functions. The KATR algorithm is based
on evaluating a query at a node within a context.
First, KATR identifies the rule within the node with
the best matching left-hand side. The result of the
query involves evaluating the associated right-hand
side, which might require further evaluations of new
queries at a variety of nodes and contexts; KATR
recursively undertakes these evaluations. The al-
gorithm is completely deterministic and reasonably
fast: Compiling the entire Hebrew theory and eval-
uating all the forms of a verb takes about 2 seconds
on an 863MHz Linux machine.
The interested reader can acquire KATR and our
Hebrew morphology theory from the authors (under
the GNU General Public License).
10 Strategies for building KATR theories
We have been applying KATR to generation of
natural-language morphology for several years. In
addition to Hebrew, we have built a complete mor-
phology of Latin verbs and nouns, large parts of
Sanskrit (and other related languages), and smaller
studies of Bulgarian, Swahili, Georgian, and Turk-
ish. We have found that KATR allows us to rep-
resent morphological rules for these languages with
great elegance. It is especially well-suited to cases
like Hebrew verbs, where a similar structure applies
across the entire spectrum of words, and where that
spectrum is partitioned into binyanim with distin-
guishable rules, but where euphony introduces stan-
dard vowel shifts based on accent, guttural letters,
and weak letters.
As we have gained experience with KATR, we
have noted encoding strategies that apply across lan-
guage families; we used each of these in our Hebrew
verb specification.
a91 Priming. A node representing a specific cate-
gory provides information needed by more gen-
eral nodes to which it refers queries. Rules in
the more general nodes refer to primed infor-
mation by means of quoted queries.
a91 Lookup. A node is invoked solely to provide
information needed by other rules.
a91 Overriding. A node representing a specific
category answers a query that is usually an-
swered (with different results) by a more gen-
eral node to which queries are usually referred.
a91 Specializing. A rule introduces a specific
exception to a more general pattern speci-
fied by another rule housed at the same node.
The strategies of overriding and specializing
both exploit the nonmonotonicity inherent in
KATR’s semantics.
a91 Combining. A rule concatenates various mor-
phological units by referring queries to multi-
ple nodes.
a91 Postprocessing. The result of combining mor-
phological units is referred to a node that makes
local adjustments to account for euphony and
other sandhi principles.
We do not want to leave the impression that writ-
ing specifications in KATR is easy. The tool is ca-
pable of presenting elegant specifications, but arriv-
ing at those specifications requires considerable ef-
fort. Early choices color the entire structure of the
resulting KATR specification, and it happens fre-
quently that the author of a specification must dis-
card code and rethink how to represent the mor-
phological structures that are being specified. Per-
haps our experience will eventually lead to a second-
generation KATR that better facilitates the linguist’s
task.
The definition of Hebrew verb inflection that we
have sketched here rests on the hypothesis that an in-
flected word’s morphological form is determined by
a system of realization rules organized in a default
inheritance hierarchy. There are other approaches to
defining Hebrew verb inflection; one could, for ex-
ample, assume that an inflected word’s form is de-
termined by a ranked system of violable constraints
on morphological structure, as in Optimality The-
ory (Prince and Smolensky, 1993), or by a finite-
state machine (Karttunen, 1993). The facts of He-
brew verb inflection are apparently compatible with
any of these approaches. Even so, there are strong
theoretical grounds for preferring our approach. It
provides a uniform, well-defined architecture for the
representation of both morphological rules and lexi-
cal information. Moreover, it embodies the assump-
tion that inflectional morphology is inferential and
realizational, readily accommodating such phenom-
ena as extended exponence and the frequent under-
determination of morphosyntactic content by inflec-
tional form; in this sense, it effectively excludes a
morpheme-based conception of word structure, un-
like both the optimality-theoretic and the finite-state
approaches.
Acknowledgements
This work was partially supported by the National
Science Foundation under Grant 0097278 and by
the University of Kentucky Center for Computa-
tional Science. Any opinions, findings, conclusions
or recommendations expressed in this material are
those of the authors and do not necessarily reflect
the views of the funding agencies.

References
Greville G. Corbett and Norman M. Fraser. 1993. Net-
work Morphology: A DATR account of Russian nom-
inal inflection. Journal of Linguistics, 29:113–142.
P. T. Daniels. 1993. The Unicode Consortium: The Uni-
code standard. Language: journal of the Linguistic
Society of America, 69(1):225–225, March.
Roger Evans and Gerald Gazdar. 1989. Inference in
DATR. In Proceedings of the Fourth Conference of
the European Chapter of the Association for Compu-
tational Linguistics, pages 66–71, Manchester.
Raphael Finkel, Lei Shen, Gregory Stump, and Suresh
Thesayi. 2002. KATR: A set-based extension of
DATR. under review.
Norman M. Fraser and Greville G. Corbett. 1995. Gen-
der, animacy, and declensional class assignment: a
unified account for Russian. In G. Booij and J. van
Marle, editors, Yearbook of Morphology 1994, pages
123–150. Kluwer, Dordrecht.
Norman M. Fraser and Greville G. Corbett. 1997. De-
faults in Arapesh. Lingua, 103:25–57.
Lauri Karttunen. 1993. Finite-state constraints. In John
Goldsmith, editor, The Last Phonological Rule. Uni-
versity of Chicago Press, Chicago.
Alan S. Prince and Paul Smolensky. 1993. Optimality
theory: Constraint interaction in generative grammar.
Technical Report RuCCs Technical Report #2, Rutgers
University Center for Cognitive Science, Piscataway,
NJ.
Gregory T. Stump. 2001. Inflectional morphology. Cam-
bridge University Press, Cambridge, England.
