A Sign Expansion Approach to Dynamic, Multi-Purpose 
Lexicons 
Jon Atle Gulla 
GMD - IPSI 
Dolivostratle 15 
D-64293 Darmstadt, Germany 
gulla0gmd, de. 
Sjur NCrsteb¢ Moshagen 
Computing Centre for the Humanities 
Harald H£rfagres gt. 31, 
N-5007 Bergen, Norway 
s jut. mo shagen0hd, uib. no 
Abstract 
Two problematic issues in most lexi- 
con systems today are their size and re- 
stricted domain of use. In this paper, we 
introduce a new approach to lexical or- 
ganization that leads to more compact 
and flexible lexicons. The lexical en- 
tries are conceptual/phonological frames 
rather than word entries, and a num- 
ber of expansion rules are used to gen- 
erate entries of actual words from these 
frames. A single frame supports not only 
all forms of a word, but also words of dif- 
ferent categories that are derived from 
the same semantic basis. The whole 
theory is now being implemented in the 
TROLL lexicon project. 
1 Introduction 
Due to the complexity and wide coverage of lex- 
ical information, full-fledged lexicon systems eas- 
ily grow undesirably big and must cope with in- 
tricate ~ nets of dependencies among lexical items. 
For keeping the speed of access at a satisfactory 
level, lexical information is often repeated in dif- 
ferent entries to reduce the number of consulta- 
tions needed for a single user query. This sim- 
plifies and speeds up the access of lexical infor- 
mation, but also blows up the size of the lexi- 
con and leads to huge maintenance problems. In 
many cases, it also clutters the lexicon structure, 
so that important lexical relationships and gener- 
alizations are lost. 
Structuring the lexicon in inheritance hierar- 
chies opens for more compact lexicon represen- 
tations. So far, lexicons have been structured in 
syntactic inheritance hierarchies, in which more 
or less abstract syntactic classes form the upper 
nodes and actual words are associated with the 
leaf nodes (Flickinger and Nerbonne, 1992; Rus- 
sell et al., 1992). However, the nature and num- 
ber of these abstract syntactic classes are not very 
clear, and it seems difficult to come up with a 
sound method for how to decide on such classes. 
At the same time, there are also good reasons 
for assuming a similar hierarchy based on seman- 
tic properties (Hellan and Dimitrova-Vulchanova, 
1994). Representing many competing hierarchies 
in the lexicon is a problem in itself and is here 
even more problematic as there are many com- 
plex relationships between semantic and syntac- 
tic properties (Gropen et al., 1992; Hellan and 
Dimitrova-Vulchanova, 1996). 
Another problem is related to the notions and 
structures adopted in the lexicon systems. Most 
lexicons today are constructed within the frame- 
work of some syntactic theory. This theory guides 
the structuring of lexical information and also de- 
cides what information should be available to the 
user (Andry et al., 1992; Flickinger and Nerbonne, 
1992; Mel'Suk and Polgu~re, 1987; Russell et ~l., 
1992; Krieger and Nerbonne, 1991). Some lexicon 
systems try to be reasonably theory-independent, 
though they still have to adopt some basic syn- 
tactic notions that locate them into a family of 
theories (Gofii and GonzAlez, 1995; Grimshaw and 
Jackendoff, 1985; Grishman et al., 1994). 
The Sign Expansion Approach forms a basis for 
creating non-redundant lexicon systems that are 
structured along semantic lines. The stored lexical 
entries are sign frames rather than actual words, 
and a whole system of expansion rules and consis- 
tency rules are used to generate dynamic entries 
of words that contain all the necessary semantic, 
syntactic, and morphological information. 
In Section 2, we give a brief introduction to a 
sign expansion theory called the Sign Model. Sec- 
tion 3 explains the use of lexical expansion rules, 
whereas some concluding remarks and directions 
for further work are found in Section 4. 
478 
/ 
paintN 
paint(($ SUB.I)) 
paint((~ SUBJ)(\]" OBJ)} -~ paint((t SUBJ)(J" OBJ)(J" XCOMP)> 
paint((~ SuB J)(\]" OBL)} 
Figure 1: The stored frame PAINT is expanded into actual words with syntactic properties. 
2 The Sign Model 
In the sign expansion approach, the lexicon is 
viewed as a dynamic rule system with lexical 
frames and various kinds of expansion rules. 
The Sign Model (SM) by Hcllan and Dimitrova- 
Vulchanova (Hellan and Dimitrova-Vnlchanova, 
1994) is a semantically based sign expansion the- 
ory and is used as the lexical basis of our lexicon. 
It posits an abstract level of sign representation 
that is not associated with any word classes and 
establishes a framework, within which word rela- 
tionships as well as relationships between different 
kinds of linguistic properties can be described in 
a systematic way. At the abstract level of rep- 
resentation, one defines conceptual/phonological 
fi'ames that underly the actual words found in a 
language. The fi'ames combine with lexical ex- 
pansion rules to create dynamic entries of actual 
words with morphological and syntactic proper- 
ties, as illustrated by the LFG representations in 
Figure 1. No particular syntactic terminology is 
assumed, since the theory is intended to fit into 
any syntactic theory. 
2.1 Minimal Signs 
The conceptual/phonological frame, which is re- 
ferred to as a minimal sign, is made up of a se- 
mantic (conceptual) part and a realizational part. 
As we do not have very much to say about phono- 
logical representations here, we assmne in the 
following that tim realizational part is a simple 
graphemic representation. The semantic part is a 
conceptual structure of the sign, which is to cap- 
ture all grammar-relevant aspects of its meaning. 
The meaning of a sign is analyzed as a situation 
involving a number of participants (also called ar- 
guments), and these participants as well as the sit- 
uation as a whole are modeled in terms of aspec- 
tual values, semantic roles, criterial factors, and 
realizational and selectional properties. 
Consider the minimal sign PAINT in Figure 2, 
which is the lexical entry underlying the re- 
lated words paintv , paintN , paintingN , paintableA, 
etc. The realizational part is the string "t)aint", 
whereas the semantic part denotes a situation 
with two arguments, indexed as 1 and 2. The 
Real : 
Sem : 
"paint" 
- Junctual 
SOURCE 
CONTROLLER 
DIM 
LIMIT 
GOAL 
MONOTONIC 
coloring 
noncriterial 
2-dim 
coloring 
noncriterial 
coloring 
Figure 2: Stored entry for minimal sign PAINT. 
aspectual value (-punctuaO describes the situa- 
tion as durative, whereas the selectional restric- 
tion DIM states that argument 2 is to serve as 
some two-dimensional surface. Argument 1, the 
painter, possesses the semantic roles SOURCE and 
CONTROLLER. SOURCE means that this argument 
is the source of energy for the force involved in 
a painting process, whereas CONTROLLER indi- 
cates that the argument is in control of the pro- 
cess. Correspondingly, argument 2 is the entity 
on which the force is used (LIMIT) and the entity 
being controlled by argument 1 (GOAL). Argu- 
ment 2 is also given the MONOTONIC role, which 
means that it undergoes some monotonic change 
in the course of painting. The change, of course, 
is that the surface is gradually covered by some 
paint. Each semantic role is further characterized 
by means of a criterial factor that imposes cer- 
tain role-related observational properties on the 
argument. Specifying SOURCE and LIMIT as col- 
oring means that the painter's use of force in- 
volves some observable actions that identifies him 
as painting, and that the surface being painted 
is recognizable from the same force. The gradual 
covering of the surface with paint, which is mod- 
eled by MONOTONIC, is also of the coloring type, 
since we can verify the covering by looking at the 
surface. CONTROLLER's and GOAL'S factor non- 
criterial means that no particular observable be- 
havior is required for an argument to play these 
479 
Real : 
Sem : 
"walk" 
- punctual 
CONTROLLER noncriterial 
MONOTONIC 1-dim t 
Figure 3: Stored entry for minimal sign WALK. 
particular roles. In general, the criterial factors 
affect, the implicitation of arguments in syntac- 
tic expressions (e.g. argument 2 in ,Ion painted) 
and the introduction of new ones (e.g. red in Jon 
painted the house red). 
As shown by the lexical entry of WALK in 
Figure 3, naturally intransitive verbs are rooted 
in minimal signs with only one conceptual argu- 
ment. The argument of WALK is a SOURCE and 
a CONTROLLER, and it undergoes a monotonic de- 
velopment with respect to some one-dimensional 
path. In a sentence like Jon walked to the school, 
the phrase to the school describes this mono- 
tonic development of argument 1. Away in gon 
walked away is another optional constituent, that 
can describe argument l's nmvement along a one- 
dimensional path. 
2.2 Lexical Rules 
The general format of the expansion rules is as 
follows: 
(1) X IF Y COMPOSITION S 
X contains the information to be added and Y 
the requirement for using the rule. S concerns the 
structure on which the rule is used and specifies 
which parts of this structure should be considered 
by the rule. Interpretationally, the rule in (1) can 
be applied on a structure Z if Y is a substructure 
of Z and X unifies with the selection of Z specified 
in S. The result of the operation is exactly this uni- 
fied structure, and the operation itself is referred 
to as a derivation. If the whole lexical entry is to 
be addressed by the rule, the COMPOSITION part is 
omitted in the rule specification. Similarly, if the 
IF Y part is not present, it means that there is 
no requirement for using the rule. The expansion 
rules fall into five categories, depending on what 
kind of information they insert into the lexical 
representations: (1) Morpho-syntactic augmenta- 
tions, (2) inflections, (3) conceptual expansions, 
(4) syntactic mappings, and (5) compositions. 
Morpho-syntactic augmentation rules add a 
word category and an inflectional paradigm to a 
minimal sign. The morpho-syntactic augmenta- 
tion rule shown in Figure 4(a), for example, de- 
rives the basic entry for the verb paintv from tile 
minimal sign PAINT. 
Assuming that tile lexical entry has already 
been given a word class and a paradigm, the inflec- 
tional rule expands the graphemic representation 
into a particular inflected word form. The rule 
in Figure 4(b) expands the basic entry for paintv 
into the more specialized entry for the past form 
paintedv. The inflectional rules m'e grouped to- 
gether into paradigms that are associated with the 
appropriate words (e.g. vl is linked to paintv). 
Conceptual e,r, pansion rules are rules that ex- 
tend the semantic part of the signs without com- 
bining them with other sign structures. These 
rules are semantically conditioned and typically 
explain how a particular sign can support a vari- 
ety of subcategorization frames. The rule in Fig- 
ure 4(c) shows how a resultative construction like 
Jon painted the wall red is supported by a mini- 
mal sign like PAINT. If the conceptual structure 
contains an argument that undergoes some mono- 
tonic development, the conceptual structure can 
be expanded with a new argument that serves 
as the medium for this development and has a 
dimension matching the criterial property of the 
MONOTONIC role. When an argument is a medium 
for some other argument, it means that its mono- 
toni(: development is manifested or materialized 
through this other argument. Hence, as argument 
2 of PAINThas a MONOTONIC role, the rule is able 
to add an argument that describes the resul}ing 
monotonic change of the surface being painted. 
The realization of this argument as an adjective 
(like red) comes from the fact that the new argu- 
ment, is of dimension coloring. For a minimal sign 
like WALK (see Figure 3), which contains an argu- 
ment (the walker) that monotonically moves along 
some one-dimensional path, the rule adds a new 
argument of dimensionality 1-dim. The medium 
must then describe a one-dimensional path, as for 
example to the school in Jon walked to the school. 
Syntactic mapping rules are rules that derive 
syntactic properties from conceptual structures. 
Since no special syntactic notions are assumed, 
we must here decide on an existing syntactic the- 
ory before the mapping rules can be defined. The 
rule shown in Figure 4(d) is based on Gulla's rules 
(Gulla, 1994) for mapping from SM conceptual 
structures to LFG grammatical functions (Kaplan 
and Bresnan, 1982). It states that if a verb is used 
480 
Cat: V 
lIlft: \[ paradigm: vl \] 
..... (a) 
lnfl: \[ form: past, \] 
R,eah insert "ed" at end 
..... (b) 
~elIlt: I)\]M (~ 
MH)IUM - } i j 
1F 
....... (,9 .... 
Syn: \[ XCOMP.i \[\] \] 
W 
Sere: 
4. completed 
I)\[M 
MEI)IUM 
coloring 
()l{ 
existence 
>j 
..... (,0 
\[\]k 
IF 
I punctual 
SeIll-" \[ CONTROI,LFAI, 
COMPOSITION main Suttix 
wh, ere (t ¢ no'ucriterial 
......... ....... 
Figure 4: (a) Morpho-syntactic augmel~tation. (b) 
Inflectional rule. (c) Conceptual expansion. (d) 
Mapping rule. (e) Compositiolml rule. 
in a completed seIIse 1, MEDIUM arguments of (ti- 
mensionality coloring or existence can be mapped 
onto the XCOMP flmction. Used together with rule 
4(c) on PA\[N~I; it introduces an XCOMP element 
that des(:ribes the resulting state of the surface 
being painted. A similar al)proaeh to the assign- 
meat of synt;u:tic flmct, ions in LFG can be found 
in (Alsina, 1993). 
The compositional rules combine two sign stru(> 
I;ures attd create a new compound structure that 
includes parts of l)oth of them. The rule in Fig- 
ul'c 4(e) uses a suffix to create a noun \[;hat re\[ers 
to some controlled, durative activity. Except tbr 
l;hc control and duration requirement, l;he conc:ep~ 
tua.1 structure must also contain a criterially an- 
chored argument, i.e. mt argument that includes 
at least one semantic role that is not noneritc- 
rial. The (\]OMI'OSITION part says that there are 
two structures involved, a main stru(:ture and a 
s'u,J.l~x strucl,urc, whei'cas the cxpansioll i)art turns 
l;he whole conceptual structure into an &rgulilent 
k. ()n the basis ot" the minimal signs PAINT and 
WALK, l;he rule (:an create I;he notms paintingN 
and 'walkingN . 
3 The Expanding Lexicon 
In a sign extmnsion le, xi(:on system, we must dis~ 
tinguish between stored lexical entries and gen- 
erated lexical entries. The stored entries are all 
minimal signs, and I;hey are usually not very in- 
I;eresdng to the lexicon user. The generated en- 
tries are produ(:ed by combining stored entries 
with one or more ext)ansion rules, and these cn- 
t;ri(;s at'(; more or less elaborate spe(:ifica~,ions of 
actual wor(ls. A simple generated entry is the 
result of combining th(; minimal sign PAINT in 
Figure 2 with the morpho-syntactic auginen~ation 
rule in Figure 4(a). This yMds dm basic verb 
entry paintv, which (loes not contain any infor- 
mation abou|, syntactic realization. More elabo.- 
rat(; entries are then generated by expanding the 
paiutv entry with the different subcategorization 
frames that are possible for paintv. For a user re- 
questing information fl'om the lexicon, l;he stored 
entries m W be completely hidden and only the 
elaborate generated ones may be made available. 
Consider the rather elaborate entry in Figure 5, 
which rel)resents the past form painted used in the 
following resultative constru(:tion: 
lt, bllowing the ideas of felicity in (Depraetere, 
1995), we define a clause to 1)e completed if it reaches 
a natural or intt;nded endpoint. A non-repetitive re- 
sultative (:ons~ruction is always completed, whereas 
constructions like ,Ion is painting and Jon paints ev- 
e.ry day are incompleted. 
481 
Cat : 
Infl : 
Real : 
Sem : 
Syn : 
V 
paradigm: vl 
form: past 
"painted" 
- punctual 
+ completed 
SOURCE coloring \] 
CONTROLLER noncriterial 1 
DIM 2-dim 1 
LIMIT coloring 
GOAL noncriterial 
MONOTONIC4 coloring 2 
DIM coloring \] 
MEDIUM --~ 4 3 suRJl: \[\] \] 
o,J : \[ \] XCOMP : \[\] 
Figure 5: Generated entry for resultative use of 
paintedv . 
(3) Jon painted the house red. 
The entry specifies a particular word form, con- 
tains a conceptual structure with three arguments, 
and lists the syntactic functions realizing these ar- 
guments. Indexing SUBJ with 1 means that argu- 
ment 1 of the conceptual structure is to be real- 
ized as the subject. The whole entry is generated 
by a series of derivations, where each derivation 
adds a piece of information to the final lexical en- 
try. Starting with the minimal sign PAINT, we 
use the rules in Figure 4(a) and 4(b) to generate 
a simple entry for paintedy. Then we expand the 
conceptual structure into a completed description 
(+ completed) using a rule called Completed and 
apply the rule in Figure 4(c) to add a third argu- 
ment. The syntactic functions are added by the 
rule in Figure 4(d) plus two rules that we here can 
call Subjl and Objl. Subjl assigns the SUBJ func- 
tion to arguments that contain SOURCE or CON- 
TROLLER roles, whereas Objl requires a + com- 
pleted description and assigns the OBJ fimction to 
arguments that have a MONOTONIC role. The gen- 
eration of the lexical entry in Figure 5, thus, can 
be written as the following derivational sequence: 
(4) PAINT ++ 4(a) ++ 4(b) +-t- Completed ++ 
4(c) ++ Subjl ++ Objl ++ 4(d) 
When the system is to create a derivational se- 
Cat: 
Infl: 
Real: 
Sem: 
N 
\[ paradigm: nl \] 
"paint(ing) " 
- ~unctual 
SOURCE 
CONTROLLER 
DIM 
LIMIT 
GOAL 
MONOTONIC 
coloring 
noncriterial 
2-dim 
coloring 
noncriterial 
coloring 2 3 
Figure 6: Lexical entry for suffix ingN and gener- 
ated entry for paintingN. 
quence like that, we first have to indicate which 
morpho-syntactic rule to use. The system then 
chooses the correct inflectional paradigm, and it 
can start trying out the different expansion rules 
to generate complete lexical entries. The search 
space for this is restricted, since the rules are se- 
mantically conditioned and monotonic, and well- 
formedness conditions decide when to stop ex- 
panding the structure. 
In a similar vein, the noun paintingN (referring 
to a painting process) is derived from the minimal 
sign PAINT and the suffix ingN. The composi- 
tional rule from Figure 4(e) combines these two 
structures and produces the lexical entry shown 
in Figure 6. Category and Infection stem from 
ingN, Realization is a combination of the values 
in PAINT and ingN, and Semantics is the min- 
imal sign's conceptual structure expanded into a 
complex argument indexed as 3. Instead of stor= 
ing two entries for paintv and paintingN -- that 
partly contain the same information -- we derive 
the entries dynamically from a single PAINT en- 
try. 
4 Conclusions 
The Sign Model (SM) gives a theoretical founda- 
tion for structuring lexical information along se- 
mantic lines• It prescribes a strong semantic basis 
and suggests various kinds of expansion rules for 
generating complete word entries. The sign ex- 
pansion approach is now used as a basis for the 
TROLL lexicon project in Trondheim. In this 
project, a formalism for lexical representation as 
well as mechanisms for executmg lexical rules are 
implemented in LPA Prolog (Gulla and Mosha- 
gen, 1995). A lexicon of Norwegian verbs is un- 
der construction, and SM-based analyses of En- 
482 
glish, German, and Bulgarian have been used in 
the design of the lexicon (Hellan and Dimitrova- 
Vulchanova, 1996; Pitz, 1994). Due to speed con- 
cerns, the stored entries and the expansion rules 
are in the TROLL lexicon supplemented with in- 
dexes that refer to well-defined derivational se- 
quences for complete word entries. The work in 
the TROLL project is now concentrated on the 
construction of a complete lexicon for Norwegian, 
and this work is also to serve as an evaluation of 
both the lexicon structures and the Sign Model. 
The theory is still at a development stage when it 
comes to psychological and perceptional matters, 
even though some suggestions have been made 
(Gulla, 1994). The filture work also includes es- 
tablishing proper interfaces to various syntactic 
theories, so that the system can be integrated with 
existing parsers and generators. 
References 
Alsina, A. (1993). Predicate Composition: A The- 
ory of Syntactic Function Alternations. Ph. D. 
thesis, Stanford University, San Fransisco. 
Andry, F., N. M. Fraser, S. McGlashan, S. Thorn- 
ton, and N. J. Youd (1992). Making DATR 
Work for Speech: Lexicon Compilation in SUN- 
DIAL. Computational Linguistics 18(3), 245- 
268. 
Coopmans, Everaert, and Grimshaw (Eds.) 
(1996). Lexical Specification and Insertion. 
Lawrence Erlbaum Ass., Inc. 
Depraetere, I. (1995). (Un)boundedness and 
(A)telicity. Linguistics and Philosophy 18, 1- 
19. 
Flickinger, D. and J. Nerbonne (1992). Inheri- 
tance and Complementation: A Case Study of 
E~y Adjectives and Related Nouns. Computa- 
tional Linguistics 18 (3), 269-310. 
Gofii, J. M. and J. C. GonzAlez (1995). A frame- 
work for lexical representation. In AI95: 15th 
International Conference. Language Engineer- 
ing 95, Montpellier, pp. 243-252. 
Crimshaw, J. and R. Jackendoff (1985). Report 
to the NSF on grant IST-81-20403. Technical 
report, Waltham, Department of Linguistics, 
Brandeis University. 
Grishman, ll., C. Macleod, and A. Meyers 
(1994). Comlex Syntax: Building a Computa- 
tional Lexicon. In Proceedings of the Interna- 
tional Conference on Computational Linguistics 
(COLING-94), Kyoto. 
Gropen, J., S. Pinker, M. Hollander, and R. Gold- 
berg (1992). Affectedness and Direct Objects: 
The role of lexical semantics in the acquisition 
of verb argument structure. In B. Levin and 
S. Pinker (Eds.), Lexical fJ Conceptual Seman- 
tics, Cognition Special Issues, Chapter 6, pp. 
153-196. Elsevier Science Publishers. 
Gulla, J. A. (1994). A Proposal for Linking 
LFG F-structures to a Conceptual Semantics. 
Master's thesis, Department of Linguistics, The 
University of Trondheim, Trondheim. 
Gulla, J. A. and S. N. Moshagen (1995, January). 
Representations and Derivations in the TROLL 
Lcxicon. In H. Ledrup, I. Moen, and H. G. Si- 
monsen (Eds.), Proceedings of The XVth Scan- 
dinavian Conference of Linguistics, Oslo. 
Ilellan, L. and M. Dimitrova-Vulchanova (1994, 
July). Preliminary Notes on a Framework for 
'Lexically Dependent Grammar'. Lecture se- 
ries at International Summer Institute in Syn- 
tax, Central Institutue of English and Foreign 
Languages, Hyderabad, India. 
Hellan, L. and M. Dimi.trova-Vulchanova (1996). 
Criteriality and Grammatical Realization. lib 
appear in (Coopmans et al., 1996). 
Kaplan, R. M. and J. Bresnan (1982). Lexical- 
Functional Grammar: A Formal System for 
Grammatical Representation. In J. Brcsnan 
(Ed.), The Mental Representation of Grammat- 
ical Relations, Chapter 4, pp. 173-281. MIT 
Press. 
Krieger, H. U. and J. Nerbonne (1991). Feature- 
Based Inheritence Networks for Computational 
Lexicons. Technical Report DFKI-P~R-91-31, 
German Research Center for Artificial Intelli- 
gence (DFKI), Saarbrucken. 
Mel'~uk, I. and A. PolguSre (1987). A Formal 
Lexicon in Meaning-Text Theory (Or How to 
Do Lexica with Words. Computational Linguis- 
tics 13(3-4), 261-275. 
Pitz, A. (1994). Nominal Signs in German. Ph. 
D. thesis, Department of Linguistics, University 
of Trondheim, Trondheim. 
Russell, G., A. Ballim, J. Carroll, and S. Warwick- 
Armstrong (1992). A Practical Approach to 
Multiple Default Inheritance for Unification- 
Based Lexicons. Computational Linguis- 
tics 18(3), 311-337. 
483 
