Proceedings of the 43rd Annual Meeting of the ACL, pages 58–65,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Empirically-based Control of Natural Language Generation 
 
Daniel S. Paiva Roger Evans 
Department of Informatics Information Technology Research Institute 
University of Sussex University of Brighton 
Brighton, UK Brighton, UK 
danielpa@sussex.ac.uk Roger.Evans@itri.brighton.ac.uk 
 
Abstract 
In this paper we present a new approach to 
controlling the behaviour of a natural lan-
guage generation system by correlating in-
ternal decisions taken during free generation 
of a wide range of texts with the surface sty-
listic characteristics of the resulting outputs, 
and using the correlation to control the gen-
erator. This contrasts with the generate-and-
test architecture adopted by most previous 
empirically-based generation approaches, 
offering a more efficient, generic and holis-
tic method of generator control. We illus-
trate the approach by describing a system in 
which stylistic variation (in the sense of 
Biber (1988)) can be effectively controlled 
during the generation of short medical in-
formation texts.  
1 Introduction 
This paper1 is concerned with the problem of con-
trolling the output of natural language generation 
(NLG) systems. In many application scenarios the 
generator’s task is underspecified, resulting in mul-
tiple possible solutions (texts expressing the de-
sired content), all equally good to the generator, 
but not equally appropriate for the application. 
Customising the generator directly to overcome 
this generally leads to ad-hoc, non-reusable solu-
tions. A more modular approach is a generate-and-
test architecture, in which all solutions are gener-
ated, and then ranked or otherwise selected accord-
ing to their appropriateness in a separate post-
                                                          
1  Paiva and Evans (2004) provides an overview of our 
framework and detailed comparison with previous 
approaches to stylistic control (like Hovy (1988), 
Green and DiMarco (1993) and Langkilde-Geary 
(2002)). This paper provides a more detailed account 
of the system and reports additional experimental re-
sults. 
process. Such architectures have been particularly 
prominent in the recent development of empiri-
cally-based approaches to NLG, where generator 
outputs can be selected according to application 
requirements acquired directly from human sub-
jects (e.g.  Walker et al. (2002)) or statistically 
from a corpus (e.g. Langkilde-Geary (2002)). 
However, this approach suffers from a number of 
drawbacks: 
1. It requires generation of all, or at least 
many solutions (often hundreds of thou-
sands), expensive both in time and space, 
and liable to lead to unnecessary interac-
tions with other components (e.g. knowl-
edge bases) in complex systems. Recent 
advances in the use of packed representa-
tions ameliorate some of these issues, but 
the basic need to compare a large number 
of solutions in order to rank them remains. 
2. The ‘test’ component generally does not 
give fine-grained control — for example, 
in a statistically-based system it typically 
measures how close a text is to some sin-
gle notion of ideal (actually, statistically 
average) output. 
3. Use of an external filter does not combine 
well with any control mechanisms within 
the generator: e.g. controlling combinato-
rial explosion of modifier attachment or 
adjective order. 
In this paper we present an empirically-based 
method for controlling a generator which over-
comes these deficiencies. It controls the generator 
internally, so that it can produce just one (locally) 
optimal solution; it employs a model of language 
variation, so that the generator can be controlled 
within a multidimensional space of possible vari-
ants; its view of the generator is completely holis-
tic, so that it can accommodate any other control 
mechanisms intrinsic to the generation task.  
58
To illustrate our approach we describe a system 
for controlling ‘style’ in the sense of Biber (1988) 
during the generation of short texts giving instruc-
tions about doses of medicine. The paper continues 
as follows. In §2 we describe our overall approach. 
We then present the implemented system (§3) and 
report on our experimental evaluation (§4). We end 
with a discussion of conclusions and future direc-
tions (§5). 
2 Overview of the Approach 
Our overall approach has two phases: (1) offline 
calculation of the control parameters, and 
(2) online application to generation. In the first 
phase we determine a set of correlation equations, 
which capture the relationship between surface 
linguistic features of generated texts and the inter-
nal generator decisions that gave rise to those texts 
(see figure 1). In the second phase, these correla-
tions are used to guide the generator to produce 
texts with particular surface feature characteristics 
(see figure 2).  
 
corpus 
linguistic 
features 
factor 
analysis 
variation 
dimensions 
NLG 
system 
text 
CP2 
CP1 
CPn 
variation 
scores  
variation 
model 
correlation 
analysis 
correlation 
equations 
…
 
generator 
decisions 
at different 
choice 
points 
input 
 
Figure 1: Offline processing  
The starting point is a corpus of texts which 
represents all the variability that we wish to cap-
ture. Counts for (surface) linguistic features from 
the texts in the corpus are obtained, and a factor 
analysis is used to establish dimensions of varia-
tion in terms of these counts: each dimension is 
defined by a weighted sum of scores for particular 
features, and factor analysis determines the combi-
nation that best accounts for the variability across 
the whole corpus. This provides a language varia-
tion model which can be used to score a new text 
along each of the identified dimensions, that is, to 
locate the text in the variation space determined by 
the corpus. 
The next step is to take a generator which can 
generate across the range of variation in the cor-
pus, and identify within it the key choice points 
(CP1, CP2, … CPn) in its generation of a text. We 
then allow the generator to freely generate all pos-
sible texts from one or more inputs. For each text 
so generated we record (a) the text’s score accord-
ing to the variation model and (b) the set of deci-
sions made at each of the selected choice points in 
the generator. Finally, for a random sample of the 
generated texts, a statistical correlation analysis is 
undertaken between the scores and the correspond-
ing generator decisions, resulting in correlation 
equations which predict likely variation scores 
from generator decisions. 
 
NLG 
system 
text in 
specified 
style 
CP2 
CP1 
CPn 
correlation 
equations … 
target 
variation 
score input 
 
Figure 2: Online processing 
In the second phase, the generator is adapted to 
use the correlation equations to conduct a best-first 
search of the generation space. As well as the usual 
input, the generator is supplied with target scores 
for each dimension of variation. At each choice 
point, the correlation equations are used to predict 
which choice is most likely to move closer to the 
target score for the final text. 
This basic architecture makes no commitment to  
what is meant by ‘variation’, ‘linguistic features’, 
‘generator choice points’, or even ‘NLG system’. 
The key ideas are that a statistical analysis of sur-
face features of a corpus of texts can be used to 
define a model of variation; this model can then be 
used to control a generator; and the model can also 
be used to evaluate the generator’s performance. In 
the next section we describe a concrete instantia-
tion of this architecture, in which ‘variation’ is sty-
listic variation as characterised by a collection of 
shallow lexical and syntactic features. 
3 An Implemented System 
In order to evaluate the effectiveness of this gen-
eral approach, we implemented a system which 
attempts to control style of text generated as de-
59
fined by Biber (1988) in short text (typically 2-3 
sentences) describing medicine dosage instruc-
tions. 
3.1 Factor Analysis 
Biber characterised style in terms of very shallow 
linguistic features, such as presence of pronouns, 
auxiliaries, passives etc. By using factor analysis 
techniques he was able to determine complex cor-
relations between the occurrence and non-
occurrence of such features in text, which he used 
to characterise different styles of text.2  
We adopted the same basic methodology, ap-
plied to a smaller more consistent corpus of just 
over 300 texts taken from proprietary patient in-
formation leaflets. Starting with around 70 surface 
linguistic features as variables, our factor analysis 
yielded two main factors (each containing linguis-
tic features grouped in positive and negative corre-
lated subgroups) which we used as our dimensions 
of variation. We interpreted these dimensions as 
follows (this is a subjective process — factor 
analysis does not itself provide any interpretation 
of factors): dimension 1 ranges from texts that try 
to involve the reader (high positive score) to text 
that try to be distant from the reader (high negative 
score); dimension 2 ranges from texts with more 
pronominal reference and a higher proportion of 
certain verbal forms (high positive score) to text 
that use full nominal reference (high negative 
score).3 
3.2 Generator Architecture 
The generator was constructed from a mixture of 
existing components and new implementation, us-
ing a fairly standard overall architecture as shown 
in figure 3. Here, dotted lines show the control 
flow and the straight lines show data flow — the 
choice point annotations are described below. 
The input constructor takes an input specifica-
tion and, using a background database of medicine 
information, creates a network of concepts and re-
                                                          
2 Some authors (e.g. Lee (1999)) have criticised Biber 
for making assumptions about the validity and gener-
alisability of his approach to English language as a 
whole. Here, however, we use his methodology to 
characterise whatever variation exists without need-
ing to make any broader claims. 
3  Full details of the factor analysis can be found in 
(Paiva 2000). 
lations (see figure 4) using a schema-based ap-
proach (McKeown, 1985).  
input 
constructor 
split 
network 
network 
ordering 
referring 
expression 
NP pruning 
realiser 
initial input networks 
sentence-size networks 
subnetwork chosen 
referring expression net 
pruned network 
sentence 
input 
specification 
choice 
point 1: 
number of 
sentences 
choice 
point 2: 
type of 
referring 
expression 
choice 
point 3: 
choice of 
mapping 
rule 
 
Figure 3: Generator architecture with choice points 
Each network is then split into subnetworks by 
the split network module. This partitions the net-
work by locating ‘proposition’ objects (marked 
with a double-lined box in figure 4) which have no 
parent and tracing the subnetwork reachable from 
each one. We call these subnetworks propnets. In 
figure 4, there are two propnets, rooted in [1:take] 
and [9:state] — proposition [15:state] is not a root 
as it can be reached from [1:take]. A list of all pos-
sible groupings of these propnets is obtained4, and 
one of the possible combinations is passed to the 
network ordering module. This is the first source 
of non-determinism in our system, marked as 
choice point one in figure 3. A combination of 
subnetworks will be material for the realisation of 
one paragraph and each subnetwork will be real-
ised as one sentence. 
                                                          
4  For instance, with three propnets (A, B and C) the list 
of combinations would be [(A,B,C), (A,BC), (AB, C), 
(AC,B), (ABC)]. 
60
 
2:patient 1:take 
3:medicine 
12:freq 
15:state 
13:value(2xday) 
4:pres 
7:dose 
9:state 
8:value(2gram) 
10:pres 
14:pres 
arg0 arg1 
6:of 
11:of 
arg0 arg0 
arg0 
arg0 
arg0 
arg0 
arg1 
arg1 
tense 
tense 
tense 
freq 
5:patient 
proxy 
 
Figure 4: Example of semantic network produced by the 
input constructor5 
The network ordering module receives a combi-
nation of subnetworks and orders them based on 
the number of common elements between each 
subnetwork. The strategy is to try to maximise the 
possibility of having a smooth transition from one 
sentence to the next in accordance with Centering 
Theory (Grosz et al., 1995), and so increase the 
possibility of having a pronoun generated. 
The referring expression module receives one 
subnetwork at a time and decides, for each object 
that is of type [thing], which type of referring ex-
pression will be generated. The module is re-used 
from the Riches system (Cahill et al., 2001) and it 
generates either a definite description or a pronoun. 
This is the second source of non-determinism in 
our system, marked as choice point two in figure 3. 
Referring expression decisions are recorded by 
introducing additional nodes into the network, as 
shown for example in figure 5 (a fragment of the 
network in figure 4, with the additional nodes). 
NP pruning is responsible for erasing from a re-
ferring expression subnetwork all the nodes that 
can be transitively reached from a node marked to 
be pronominalised. This prevents the realiser from 
trying to express the information twice. In figure 5, 
[7:dose] is marked to be pronominalised, so the 
concepts [11:of] and [3:medicine] do not need to be 
realised, so they are pruned. 
                                                          
5 Although some of the labels in this figure look like 
words, they bear no direct relation to words in the 
surface text — for example, ‘of’ may be realised as a 
genitive construction or a possessive.  
3:medicine 
7:dose 
11:of 
arg0 
arg0 
21:pronoun refexp 
22:definite refexp 
 
Figure 5: Referring expressions and pruning 
The realiser is a re-implementation of Nicolov’s 
(1999) generator, extended to use the wide-
coverage lexicalised grammar developed in the 
LEXSYS project (Carroll et al., 2000), with further 
semantic extensions for the present system. It se-
lects grammar rules by matching their semantic 
patterns to subnetworks of the input, and tries to 
generate a sentence consuming the whole input. In 
general there are several rules linking each piece of 
semantics to its possible realisation, so this is our 
third, and most prolific, source of non-determinism 
in the architecture, marked as choice point three in 
figure 3. 
A few examples of outputs for the input repre-
sented in figure 4 are: 
the dose of the patient 's medicine is taken twice a 
day. it is two grams. 
the two-gram dose of the patient 's medicine is 
taken twice a day. 
the patient takes the two-gram dose of the patient 's 
medicine twice a day. 
From a typical input corresponding to 2-3 sen-
tences, this generator will generate over a 1000 
different texts. 
3.3 Tracing Generator Behaviour 
In order to control the generator’s behaviour we 
first allow it to run freely, recording a ‘trace’ of the 
decisions it makes at each choice point during the 
production of each text. Although there are only 
three choice points in figure 3, the control structure 
included two loops: an outer loop which ranges 
over the sequence of propnets, generating a sen-
tence for each one, and an inner loop which ranges 
over subnetworks of a propnet as realisation rules 
are chosen. So the decision structure for even a 
small text may be quite complex.  
In the experiments reported here, the trace of the 
generation process is simply a record of the num-
ber of times each decision (choice point, and what 
choice was made) occurred. Paiva (2004) discusses 
more complex tracing models, where the context of 
each decision (for example, what the preceding 
decision was) is recorded and used in the correla-
tion. However the best results were obtained using 
61
just the simple decision-counting model (perhaps 
in part due to data sparseness for more complex 
models). 
3.4 Correlating Decisions with Text Features 
By allowing the generator to freely generate all 
possible output from a single input, we recorded a 
set of <trace, text> pairs ranging across the full 
variation space. From these pairs we derived corre-
sponding <decision-count, factor-score> pairs, to 
which we applied a very simple correlational tech-
nique, multivariate linear regression analysis, 
which is used to find an estimator function for a 
linear relationship (i.e., one that can be approxi-
mated by a straight line) from the data available for 
several variables (Weisberg, 1985).  In our case we 
want to predict the value for a score in a stylistic 
dimension (SSi) based on a configuration of gen-
erator decisions (GDj) as seen in equation 1.  
(eq. 1) SSi = x0 + x1GD1 + … + xnGDn + ε 6 
We used three randomly sampled data sets of 
1400, 1400 and 5000 observations obtained from a 
potential base of about 1,400,000 different texts 
that could be produced by our generator from a 
single input. With each sample, we obtained a re-
gression equation for each stylistic dimension 
separately. In the next subsections we will present 
the final results for each of the dimensions sepa-
rately. 
Regression on Stylistic Dimension 1 
For the regression model on the first stylistic di-
mension (SS1), the generator decisions that were 
used in the regression analysis7 are: imperative 
with one object sentences (IMP_VNP), V_NP_PP 
agentless passive sentences (PAS_VNPP), V_NP by-
passives (BYPAS_VN), and N_PP clauses (NPP) and 
these are all decisions that happen in the realiser, 
i.e., at the third choice point in the architecture. 
This resulted in the regression equation shown in 
equation 2.  
                                                          
6 SS
i represents a stylistic score and is the dependent 
variable or criterion in the regression analysis; the 
GDj’s represent generator decisions and are called the 
independent variables or predictors; the xj’s are 
weights, and ε is the error. 
7 The process of determining the regression takes care 
of eliminating the variables (i.e. generator decisions) 
that are not useful to estimate the stylistic dimensions. 
(eq. 2)  
SS1 = 6.459 − (1.460∗NPP) − (1.273*BYPAS_VN) 
 − (1.826∗PAS_VNPP) + (1.200∗IMP_VNP)8 
The coefficients for the regression on SS1 are 
unstandardised coefficients, i.e. the ones that are 
used when dealing with raw counts for the genera-
tor decisions.  
The coefficient of determination (R2), which 
measures the proportion of the variance of the de-
pendent variable about its mean that is explained 
by the independent variables, had a reasonably 
high value (.895)9 and the analysis of variance ob-
tained an F test of 1701.495. 
One of the assumptions that this technique as-
sumes is the linearity of the relation between the 
dependent and the independent variables (i.e., in 
our case, between the stylistic scores in a dimen-
sion and the generator decisions). The analysis of 
the residuals resulted in a graph that had some 
problems but that resembled a normal graph (see 
(Paiva, 2004) for more details). 
Regression on Stylistic Dimension 2 
For the regression model on the second stylistic 
dimension (SS2) the variables that we used were: 
the number of times a network was split (SPLIT-
NET), generation of a pronoun (RE_PRON), auxil-
iary verb (VAUX), noun with determiner (NOUN), 
transitive verb (VNP), and agentless passive 
(PAS_VNP) — the first type of decision happens in 
the split network module (our first choice point); 
the second, in the referring expression module 
(second choice point); and the rest in the realiser 
(third choice point).  
The main results for this model are as follows: 
the coefficient of determination (R2) was .959 and 
the analysis of variance obtained an F test 
of 2298.519. The unstandardised regression coeffi-
cients for this model can be seen in eq. 3.  
(eq. 3) 
SS2 = − 27.208 − (1.530∗VNP) + (2.002∗RE_PRON) 
 − (.547∗NOUN) + (.356∗VAUX) 
 + (.860∗SPLITNET) + (.213∗PAS_VNP)10 
                                                          
8  This specific equation came from the sample with 
5,000 observations — the equations obtained from 
the other samples are very similar to this one. 
9  All the statistical results presented in this paper are 
significant at the 0.01 level (two-tailed). 
10 This specific equation comes from one of the samples 
of 1,400 observations. 
62
With this second model we did not find any prob-
lems with the linearity assumptions as the analysis 
of the residuals gave a normal graph. 
4 Controlling the Generator 
These regression equations characterise the way in 
which generator decisions influence the final style 
of the text (as measured by the stylistic factors). In 
order to control the generator, the user specifies a 
target stylistic score for each dimension of the text 
to be generated. At each choice point during gen-
eration, all possible decisions are collected in a list 
and the regression equations are used to order 
them. The equations allow us to estimate the sub-
sequent values of SS1 and SS2 for each of the pos-
sible decisions, and the decisions are ordered 
according to the distance of the resulting scores 
from the target scores — the closer the score, the 
better the decision.  
Hence the search algorithm that we are using 
here is the best-first search, i.e., the best local solu-
tion according to an evaluation function (which in 
this case is the Euclidian distance from the target 
and the resulted value obtained by using the re-
gression equation) is tried first but all the other 
local solutions are kept in order so backtracking is 
possible. 
In this paper we report on tests of two internal 
aspects of the system11. First we wish to know how 
good the generator is at hitting a user-specified 
target — i.e., how close are the scores given by the 
regression equations for the first text generated to 
the user’s input target scores. Second, we wish to 
know how good the regression equation scores are 
at modelling the original stylistic factors — i.e., we 
want to compare the regression scores of an output 
text with the factor analysis scores. We address 
these questions across the whole of the two-
dimensional stylistic space, by specifying a rectan-
gular grid of scores spanning the whole space, and 
asking the generator to produce texts for each grid 
point from the same semantic input specification. 
                                                          
11  We are not dealing with external (user) evaluation of 
the system and of the stylistic dimensions we ob-
tained — this was left for future work. Nonetheless, 
Sigley (1997) showed that the dimensions obtained 
with factor analysis and people’s perception have a 
high correlation. 
-25-30-35-40-45
10
8
6
4
2
0
-2
-4
-6
-8
-10
80797877767574737271
70696867666564636261
60595857565554535251
50494847464544434241
40393837363534333231
30292827262524232221
20191817161514131211
10987654321
 
Figure 6: Target scores for the texts 
In this case we divided the scoring space with 
an 8 by 10 grid pattern as shown in figure 6.12 Each 
point specifies the target scores for each text that 
should be generated (the number next to each point 
is an identifier of each text). For instance, text 
number 1 was targeted at coordinate (−7, −44), 
whereas text number 79 was targeted at coordinate 
(+7, −28). 
4.1 Comparing Target Points and Regression 
Scores 
In the first part of this experiment we wanted to 
know how close to the user-specified target coor-
dinates the resulting regression scores of the first 
generated text were. This can be done in two dif-
ferent ways. The first is to plot the resulting regres-
sion scores (see figure 7) and visually check if it 
mirrors the grid-shape pattern of the target points 
(figure 6) — this can be done by inspecting the text 
identifiers13. This can be a bit misleading because 
there will always be variation around the target 
point that was supposed to be achieved (i.e., there 
is a margin for error) and this can blur the com-
parison unfavourably.  
                                                          
12 The range for each scale comes from the maximum 
and minimum values for the factors obtained in the 
samples of generated texts. 
13 Note that some texts obtained the same regression 
score and, in the statistical package, only one was 
numbered. Those instances are: 1 and 7; 18 and 24; 
22 and 28. 
63
-25-30-35-40-45
10
8
6
4
2
0
-2
-4
-6
-8
-10
80797877767574372
70
6968
6766
6564636261
6059
5857
56555453
5251
5049
48
474645444342
41
4039
3837
36354333231
30292827262524
2322
21
20
1918
1716
15
1413
1211
1098
765
43
2
1
 
Figure 7: Texts scored by using the  
regression equation 
A more formal comparison can be made by plot-
ting the target points versus the regression results 
for each dimension separately and obtaining a cor-
relation measure between these values. These cor-
relations are shown in figure 8 for SS1 (left) and 
SS2 (right). The degree of correlation (R2) between 
the values of target and regression points is 0.9574 
for SS1 and 0.942 for SS2, which means that the 
search mechanism is working very satisfactorily on 
both dimensions.14  
86420-2-4-6-8-10
8
6
4
2
0
-2
-4
-6
-8
-10
-25-30-35-40-45
-25
-30
-35
-40
-45  
Figure 8: Plotting target points versus regression results 
on SS1 (left) and SS2 (right) 
4.2 Comparing Target Points and Stylistic 
Scores 
In the second part of this experiment we wanted to 
know whether the regression equations were doing 
the job they were supposed to do by comparing the 
regression scores with stylistic scores obtained 
(from the factor analysis) for each of the generated 
texts. In figure 9 we plotted the texts in a graph in 
accordance with their stylistic scores (once again, 
some texts occupy the same point so they do not 
appear).  
                                                          
14  All the correlational figures (R2) presented for this 
experiment are significant at the 0.01 level (two-
tailed). 
-25-30-35-40-45
10
8
6
4
2
0
-2
-4
-6
-8
-10
80
7978
777675
743
72
71 70696867
66
6564
63
6261
6059
58
5756
55
54
53
5251
5049
48
47
46
45
44
4342
41
4039
38
37
36
354
33
32
31
302928
27
262524
23
2221 20
1918
17
16
15
14
1312
11
1098
76
54
321
 
Figure 9: Texts scored using the two stylistic dimension 
obtained in our factor analysis 
In the ideal situation, the generator would have 
produced texts with the perfect regression scores 
and they would be identical to the stylistic scores, 
so the graph in the figure 9 would be like a grid-
shape one as in figure 6. However we have already 
seen in figure 7, that this is not the case for the re-
lation between the target coordinates and the re-
gression scores. So we did not expect the plot of 
stylistic scores 1 (SS1) against stylistic scores 2 
(SS2) to be a perfect grid. 
Figure 10 (left-hand side) shows the relation be-
tween the target points and the scores obtained 
from the original factor equation of SS1. The value 
of R2, which represents their correlation, is high 
(0.9458), considering that this represents the possi-
ble accumulation of errors of two stages: from the 
target to the regression scores, and then from the 
regression to the actual factor scores. On the right 
of figure 10 we can see the plotting of the target 
points and their respective factor scores on SS2. 
The correlation obtained is also reasonably high 
(R2 = 0.9109). 
1086420-2-4-6-8-10
10
8
6
4
2
0
-2
-4
-6
-8
-10
-25-30-35-40-45
-25
-30
-35
-40
-45  
Figure 10: Plotting target points versus factor scores on 
SS1 (left) and SS2 (right) 
5 Discussion and Future Work 
These results demonstrate that it is possible to pro-
vide effective control of a generator correlating 
internal generator behaviour with characteristics of 
the resulting texts. It is important to note that these 
64
two sets of variables (generator decision and sur-
face features) are in principle quite independent of 
each other. Although in some cases there are 
strong correlations (for example, the generator’s 
use of a ‘passive’ rule, correlates with the occur-
rence of passive participles in the text), in others  
the relationship is much less direct (for example, 
the choice of how many subnetworks to split a net-
work into, i.e., SPLITNET, does not correspond to 
any feature in the factor analysis), and the way in-
dividual features combine into significant factors 
may be quite different.  
Another feature of our approach is that we do 
not assume some pre-defined notion of parameters 
of variation – variation is characterised completely 
by a corpus (in contrast to approaches which use a 
corpus to characterise a single style). The disad-
vantage of this is that variation is not grounded in 
some ‘intuitive’ notion of style: the interpretation 
of the stylistic dimensions is subjective and tenta-
tive. However, as no comprehensive computation-
ally realisable theory of style yet exists, we believe 
that this approach has considerable promise for 
practical, empirically-based stylistic control. 
The results reported here also make us think that 
a possible avenue for future work is to explore the 
issue of what types of problems the generalisation 
induced by our framework (which will be dis-
cussed below) can be applied to. This paper dealt 
with an application to stylistic variation but, in 
theory, the approach can be applied to any kind of 
process to which there is a sorting function that can 
impose an order, using a measurable scale (e.g., 
ranking), onto the outputs of another process.  
Schematically the approach can be abstracted to 
any sort of problem of the form shown in fig-
ure 11. Here there is a producer process outputting 
a large number of solutions. There is also a sorter 
process which will classify those solutions in a cer-
tain order. The numerical value associated with the 
output by the sorter can be correlated with the de-
cisions the producer took to generate the output. 
The same correlation and control mechanism used 
in this paper can be introduced in the producer 
process, making it controllable with respect to the 
sorting dimension. 
 
pro
du
ce
r 
output 1 
output 2 
output m 
output 3 
output 4 .
.. 
 sort
ing
 di
me
ns
ion
 
so
rte
r output 3 
output 1 
output 14 
output 10 
output m 
... 
 .
..  
... 
 
... 
 
 
Figure 11: The producer-sorter scheme. 
References 
Biber, Douglas (1988) Variation across speech and writing. 
Cambridge University Press. 
Cahill, Lynne; J. Carroll; R. Evans; D. Paiva; R. Power; D. Scott; and 
K. van Deemter From RAGS to RICHES: exploiting the potential 
of a flexible generation architecture. Proceedings of ACL/EACL 
2001, pp. 98-105. 
Carroll, John; N. Nicolov; O. Shaumyan; M. Smets; and D. Weir 
(2000) Engineering a wide-coverage lexicalized grammar. Pro-
ceedings of the Fifth International Workshop on Tree Adjoining 
Grammars and Related Frameworks. 
Green, Stephen J.; and C. DiMarco (1993) Stylistic decision-making 
in NLG. In Proceedings of the 4th  European Workshop on Natu-
ral Language Generation. Pisa, Italy. 
Grosz, Barbara J.; A.K. Joshi; and S. Weinstein (1995) Centering: A 
Framework for Modelling the Local Coherence of Discourse. In-
stitute for Research in Cognitive Science, IRCS-95-01, University 
of Pennsylvania.  
Hovy, Eduard H. (1988) Generating natural language under prag-
matic constraints. Lawrence Erlbaum Associates. 
Langkilde-Geary, Irene. (2002) An empirical verification of coverage 
and correctness for a general-purpose sentence generator. Proceed-
ing of INLG’02, pp. 17-24. 
Lee, David (1999) Modelling Variation in Spoken And Written Eng-
lish: the Multi-Dimensional Approach Revisited. PhD thesis, Uni-
versity of Lancaster, UK.  
McKeown, Kathleen R. (1985) Text Generation: Using Discourse 
Strategies and Focus Constraints to Generate Natural Language 
Text. Cambridge University Press. 
Nicolov, Nicolas (1999) Approximate Text Generation from Non-
hierarchical Representations in a Declarative Framework. PhD 
Thesis, University of Edinburgh. 
Paiva, Daniel S. (2000) Investigating style in a corpus of pharmaceuti-
cal leaflets: results of a factor analysis. Proceedings of the Student 
Workshop of the 38th Annual Meeting of the Association for Com-
putational Linguistics (ACL'2000), Hong Kong, China. 
Paiva, Daniel S. (2004) Using Stylistic Parameters to Control  
a Natural Language Generation System. PhD Thesis, University of 
Brighton, Brighton, UK. 
Paiva, Daniel S.; R. Evans (2004) A Framework for Stylistically Con-
trolled Generation. In Proceedings of the 3rd International Confer-
ence on Natural Language Generation (INLG’04). New Forest, 
UK. 
Sigley, Robert (1997) Text categories and where you can stick them: a 
crude formality index. International Journal of Corpus Linguistics, 
volume 2, number 2, pp. 199-237. 
Walker, Marilyn; O. Rambow, and M. Rogati (2002) Training a Sen-
tence Planner for Spoken Dialogue Using Boosting. Computer 
Speech and Language, Special Issue on Spoken Language Genera-
tion. July. 
Weisberg, Sanford (1985) Applied Linear Regression, 2nd edition. 
John Wiley & Sons. 
65
