Searching for Grammaticality: Propagating Dependencies in the Viterbi
Algorithm
Stephen Wan12 Robert Dale1 Mark Dras1
1Centre for Language Technology
Div. of Information Communication Sciences
Macquarie University
Sydney, Australia
swan,rdale,madras@ics.mq.edu.au
C·ecile Paris2
2Information and Communication
Technologies
CSIRO
Sydney, Australia
Cecile.Paris@csiro.au
Abstract
In many text-to-text generation scenarios (for in-
stance, summarisation), we encounter human-
authored sentences that could be composed by re-
cycling portions of related sentences to form new
sentences. In this paper, we couch the generation
of such sentences as a search problem. We in-
vestigate a statistical sentence generation method
which recombines words to form new sentences.
We propose an extension to the Viterbi algorithm
designed to improve the grammaticality of gener-
ated sentences. Within a statistical framework, the
extension favours those partially generated strings
with a probable dependency tree structure. Our
preliminary evaluations show that our approach
generates less fragmented text than a bigram base-
line.
1 Introduction
In abstract-like automatic summary generation, we often re-
quire the generation of new and previously unseen summary
sentences given some closely related sentences from a source
text. We refer to these as Non-Verbatim Sentences. These
sentences are used instead of extracted sentences for a variety
of reasons including improved conciseness and coherence.
The need for a mechanism to generate such sentences is
supported by recent work showing that sentence extraction is
not suf cient to account for the scope of written human sum-
maries. Jing and McKeown [1999] found that only 42% of
sentences from summaries of news text were extracted sen-
tences. This is also supported by the work of Knight and
Marcu [2002] (cited by [Daum·e III and Marcu, 2004]), which
 nds that only 2.7% of human summary sentences are ex-
tracts. In our own work on United Nations Humanitarian
Aid Proposals, we noticed that only 30% of sentences are
extracted from the source document, either verbatim or with
trivial string replacements.
While the  gures do vary, it shows that additional mecha-
nisms beyond sentence extraction are needed. In response to
this, our general research problem is one in which given a set
of related sentences, a single summary sentence is produced
by recycling words from the input sentence set.
In this paper, we adopt a statistical technique to allow easy
portability across domains. The Viterbi algorithm [Forney,
1973] is used to search for the best traversal of a network of
words, effectively searching through the sea of possible word
sequences. We modify the algorithm so that it narrows the
search space not only to those sequences with a semblance of
grammaticality (via n-grams), but further still to those that are
grammatical sentences preserving the dependency structure
found in the source material.
Consider the following ungrammatical word sequence typ-
ical of that produced by an n-gram generator,  The quick
brown fox jumped over the lazy dog slept by the log  . One
diagnosis of the problem is that the word dog is also used
as the subject of the second verb slept. Ideally, we want to
avoid such sequences since they introduce text fragments, in
this case  slept by the log . We could, for example, record
the fact that dog is already governed by the verb jumped, and
thus avoid appending a second governing word slept.
To do so, our extension propagates dependency features
during the search and uses these features to in uence the
choice of words suitable for appending to a partially gener-
ated sentence. Dependency relations are derived from shal-
low syntactic dependency structure [Kittredge and Mel’cuk,
1983]. Speci cally, we use representations of relations be-
tween a head and modi er, ignoring relationship labels for
the present.
Within the search process, we constrain the choice of fu-
ture words by preferring words that are likely to attach to
the dependency structure of the partially generated sentence.
Thus, sequences with plausible structures are ranked higher.
The remainder of the paper is structured as follows. In Sec-
tion 2, we describe the problem in detail and our approach.
We outline our use of the Viterbi algorithm in Section 3. In
Section 4, we describe how this is extended to cater for de-
pendency features. We compare related research in Section 5.
A preliminary evaluation is presented and discussed in Sec-
tion 6. Finally, we conclude with future work in Section 7.
2 Narrowing the Search Space: A Description
of the Statistical Sentence Generation
Problem
In this work, sentence generation is essentially a search for
the most probable sequence of words, given some source text.
However, this constitutes an enormous space which requires
ef cient searching. Whilst reducing a vocabulary to a suit-
able subset narrows this space somewhat, we can use statis-
tical models, representing properties of language, to prune
the search space of word sequences further to those strings
that re ect real language usage. For example, n-gram models
limit the word sequences examined to those that seem gram-
matically correct, at least for small windows of text.
However, n-grams alone often result in sentences that,
whilst near-grammatical, are often just gibberish. When com-
bined with a (word) content selection model, we narrow the
search space even further to those sentences that appear to
make sense. Accordingly, approaches such as Witbrock and
Mittal [1999] and Wan et al. [2003] have investigated models
that improve the choice of words in the sentence. Witbrock
and Mittal’s content model chooses words that make good
headlines, whilst that of Wan et al. attempts to ensure that,
given a short document like a news article, only words from
sentences of the same subtopic are combined to form a new
sentences. In this paper, we narrow the search space to those
sequences that conserve dependency structures from within
the input text.
Our algorithm extension essentially passes along the long-
distance context of dependency head information of the pre-
ceding word sequence, in order to in uence the choice of the
next word appended to the sentence. This dependency struc-
ture is constructed statistically by an O(n) algorithm, which
is folded into the Viterbi algorithm. Thus, the extension is
in an O(n4) algorithm. The use of dependency relations fur-
ther constrains the search space. Competing paths through
the search space are ranked taking into account the proposed
dependency structures of the partially generated word se-
quences. Sentences with probable dependency structures are
ranked higher. To model the probability of a dependency re-
lation, we use the statistical dependency models inspired by
those described in Collins [1996].
3 Using The Viterbi Algorithm for Sentence
Generation
We assume that the reader is familiar with the Viterbi al-
gorithm. The interested reader is referred to Manning and
Schutze [1999] for a more complete description. Here, we
summarise our re-implementation (described in [Wan et al.,
2003]) of the Viterbi algorithm for summary sentence gener-
ation, as  rst introduced by Witbrock and Mittal [1999].
In this work, we begin with a Hidden Markov Model
(HMM) where the nodes (ie, states) of the graph are uniquely
labelled with words from a relevant vocabulary. To obtain a
suitable subset of the vocabulary, words are taken from a set
of related sentences, such as those that might occur in a news
article (as is the case for the original work by Witbrock and
Mittal). In this work, we use the clusters of event related sen-
tences from the Information Fusion work by Barzilay et al.
[1999]. The edges between nodes in the HMM are typically
weighted using bigram probabilities extracted from a related
corpus.
The three probabilities of the unmodi ed Viterbi algorithm
are de ned as follows:
Transition Probability (using the Maximum Likelihood Esti-
mate to model bigram probabilities)1:
ptrngram(wi+1jwi) = count(wi;wi+1)count(w
i)
Emission Probability: (For the purposes of testing the new
transition probability function described in Section 4, this is
set to 1 in this paper):
pem(w) = 1
Path Probability is de ned recursively as:
ppath(w0;:::;wi+1) =
ptrngram(wi+1jwi)£pem(w)£ppath(w0 :::wi)
The unmodi ed Viterbi algorithm as outlined here would
generate word sequences just using a bigram model. As noted
above, such sequences will often be ungrammatical.
4 A Mechanism for Propagating Dependency
Features in the Extended Viterbi Algorithm
In our extension, we modify the de nition of the Transition
Probability such that not only do we consider bigram prob-
abilities but also dependency-based transition probabilities.
Examining the dependency head of the preceding string then
allows us to consider long-distance context when append-
ing a new word. The algorithm ranks highly those words
with a plausible dependency relation to the preceding string,
with respect to the source text being generated from (or sum-
marised).
However, instead of considering just the likelihood of a
dependency relation between adjacent pairs of words, we can
consider the likelihood of a word attaching to the dependency
tree structure of the partially generated sentence. Speci cally,
it is the rightmost root-to-leaf branch that can still be modi ed
or governed by the appending of a new word to the string.
This rightmost branch is stored as a stack. It is updated and
propagated to the end of the path each time we add a word.
Thus, our extension has two components: Dependency
Transition and Head Stack Reduction. Aside from these mod-
i cations, the Viterbi algorithm remains the same.
In the remaining subsections, we describe in detail how the
dependency relations are computed and how the stack is re-
duced. In Figure 3, we present pseudo-code for the extended
Viterbi algorithm.
4.1 Scoring a Dependency Transition
Dependency Parse Preprocessing of Source Text
The Dependency Transition is simply an additional weight on
the HMM edge. The transition probability is the average of
the two transition weights based on bigrams and dependen-
cies:
ptr(wi+1jw1) =
average(ptrngram(wi+1jw1);ptrdep(wi+1jw1))
Before we begin the generation process, we  rst use a depen-
dency parser to parse all the sentences from the source text to
1Here the subscripts refer to the fact that this is a transition prob-
ability based on n-grams. We will later propose an alternative using
dependency transitions.
obtain dependency trees. A traversal of each dependency tree
yields all parent-child relationships, and we update an adja-
cency matrix of connectivity accordingly. Because the status
of a word as a head or modi er depends on the word order in
English, we consider relative word positions to determine if
a relation has a forward or backward2 direction. Forward and
backward directional relations are stored in separate matri-
ces. The Forward matrix stores relations in which the head is
to the right of modi er in the sentence. Conversely, the Back-
ward matrix stores relations in the head to left of the modi er.
This distinction is required later in the stack reduction step.
As an example, given the two strings (using characters
in lieu of words)  d b e a c and  b e d c a and the
corresponding dependency trees:
a
b
d e
c
c
d
b
e
a
we obtain the following adjacency matrices:
Forward (or Right-Direction) Adjacency Matrix
2
66
4
0 a b c d e
a 0 1 0 0 0
b 0 0 0 1 0
c 0 0 0 1 0
d 0 1 0 0 0
e 0 0 0 0 0
3
77
5
Backward (or Left-Direction) Adjacency Matrix
2
66
4
0 a b c d e
a 0 0 1 0 0
b 0 0 0 0 2
c 1 0 0 0 0
d 0 0 0 0 0
e 0 0 0 0 0
3
77
5
We refer to the matrices as Adjright and Adjleft respectively.
The cell value in each matrix indicates the number of times
word i (that is, the row index) governs word j (that is, the
column index).
Computing the Dependency Transition Probability
We de ne the Dependency Transition weight as:
ptrdep(wi+1jwi) =
p(Depsym(wi+1;headStack(wi))
where Depsym is the symmetric relation stating that some
dependency relation occurs between a word and any of the
words in the stack, irrespective of which is the head. Intu-
itively, the stack is a compressed representation of the depen-
dency tree corresponding to the preceding words. The prob-
ability indicates how likely it is that the new word can attach
itself to this incrementally built dependency tree, either as a
modi er or a governer. Since the stack is cumulatively passed
on at each point, we need only consider the stack stored at the
preceding word.
This is estimated as follows:
p(Depsym(wi+1;headStack(wi))) =
max
h2headStack(wi)
p(Depsym(wi+1;h))
2These are de ned analogously to similar concepts in Combina-
tory Categorial Grammar [Steedman, 2000].
the quick brown fox jumps
jumps over the lazy dog .
£
the
⁄ h quick
the
i • brown
quick
the
‚
£
fox
⁄
£
jumps
⁄ h over
jumps
i• the
over
jumps
‚" lazythe
over
jumps
#•
dog
over
jumps
‚
Figure 1: A path through a lattice. Although separated on
two lines, it represents a single sequence of words. The stack
(oriented upwards) grows and shrinks as we add words. Note
that the modi ers to dog are popped off before it is pushed on.
Note also that modi ers of existing items on the stack, such
as over are merely pushed on. Words with no connection to
previously seen stack items are also pushed on (eg. quick) in
the hope that a head will be found later.
Here, we assume that a word can only attach to the tree
once at a single node; hence, we  nd the node that max-
imises the probability of node attachment. The relation-
ship Depsym(a; b) is modelled using a simpli ed version of
Collins’ [1996] dependency model.
Because of the status of word as the head relies on thepreservation of word order, we keep track of the direction-
ality of a relation. For two words a and b where a precedes b
in the generated string,
p(Depsym(a;b))…
Adjright(a;b) + Adjleft(b;a)
cnt(co-occur(a;b))
where Adjright and Adjleft are the right and left adjacency
matrices. Recall that row indices are heads and column in-
dices are modi ers.
4.2 Head Stack Reduction
Once we decide that a newly considered path is better than
any other previously considered one, we update the head
stack to represent the extended path. At any point in time,
the stack represents the rightmost root-to-leaf branch of the
dependency tree (for the generated sentence) that can still
be modi ed or governed by concatenating new words to the
string.3 Within the stack, older words may be modi ed by
newer words. Our rules for modifying the stack are designed
to cater for a projective4 dependency grammar.
There are three possible alternative outcomes of the reduc-
tion. The  rst is that the proposed top-of-stack (ToS) has
no dependency relation to any of the existing stack items, in
which case the stack remains unchanged. For the second and
third cases, we check each item on the stack and keep a record
3Note that we can scan through the stack as well as push onto
and pop from the top; this is thus the same type of stack as used in,
for example, Nested Stack Automata.
4That is, if wi depends on wj, all words in between wi and wj
are also dependent on wj.
reduceHeadStack(aNode, aStack) returns aStack
Nodenew ˆaNode
Stack ˆaStack # duplicate
Nodemax ˆNULL
Edgeprob ˆ0
# Find best chunk
While notEmpty(aStack)
Head ˆpop(aStack)
if p(depsym (Nodenew, Head)) > Edgeprob
Nodemax ˆHead
Edgeprob ˆdepsym(Nodenew, Head)
# Keep only best chunk
While top(aStack) 6= Nodemax
pop(aStack)
# Determine new head of existing string
if isReduced(Nodenew,Nodemax)
pop(aStack)
else
push(Nodenew, aStack)
Figure 2: Pseudocode for the Head Stack Reduction operation
only of the best probable dependency between the proposed
ToS and the appropriate stack item. The second outcome,
then, is that the proposed ToS is the head of some item on
the stack. All items up to and including that stack item are
popped off and the proposed ToS is pushed on. The third out-
come is that it modi es some item on the stack. All stackitems up to (but not including) the stack item are popped off
and the proposed ToS is pushed on. The pseudocode is pre-
sented in Figure 2. An example of stack manipulation is pre-
sented in Figure 1. We rely on two external functions. The
 rst function, depsym=2, has already been presented above.
The second function, isReduced=2, relies on an auxiliary
function returning the probability of one word being governed
by the other, given the relative order of the words. This is in
essence our parsing step, determining which word governs
the other. The function is de ned as follows:
isReduced(w1;w2) =
p(isHeadRight(w1;w2)) > p(isHeadLeft(w1;w2))
where w1 precedes w2, and:
p(isHeadRight(w1;w2))
… Adjright(w1;w2)cnt(hasRelation(w
1;w2; wherei(w1) < i(w2)))
and similarly,
p(isHeadLeft(w1;w2))
…
Adjleft(w2;w1)
cnt(hasRelation(w1;w2; wherei(w1) < i(w2)))
where hasRelation=2 is the number of times we see
the two words in a dependency relation, and where i(wi)
returns a word position in the corpus sentence. The
function isReduced=2 makes calls to p(isHeadRight=2)
andp(isHeadLeft=2). It returns true if the  rst parameter
viterbiSearch(maxLength, stateGraph) returns bestPath
numStates ˆgetNumStates(stateGraph)
viterbi ˆa matrix[numStates+2,maxLength+2]
viterbi[0,0]:score ˆ1.0
for each time step t from 0 to maxLength do
# Termination Condition
if ((viterbi[endState, t]:score 6= 0)
AND isAcceptable(endState:headStack))
# Backtrace from endState and return path
# Continue appending words
for each state s from 0 to numStates do
for each transition s0 from s
newScore ˆ
viterbi[s,t]:score £ ptr(s0js) £ pem(s0)
if ((viterbi[s0,t+1]:score = 0) OR
(newScore > viterbi[s0, t+1]))
viterbi[s0,t+1]:score ˆnewScore
viterbi[s0,t+1]:headStack ˆ
reduceHeadStack(s0,viterbi[s,t]:headStack)
backPointer[s0,t+1] ˆs
Backtrace from viterbi[endState,t] and return path
Figure 3: Extended Viterbi Algorithm
is the head of the second, and false otherwise. In the com-
parison, the denominator is constant. We thus need only the
numerator in these auxiliary functions.
Collins’ distance heuristics [1996] weight the probability
of a dependency relation between two words based on the
distance between them. We could implement a similar strat-
egy by favouring small reductions in the head stack. Thus a
reduction with a more recent stack item which is closer to the
proposed ToS would be less penalised than an older one.
5 Related Work
There is a wealth of relevant research related to sentence gen-
eration. We focus here on a discussion of related work from
statistical sentence generation and from summarisation.
In recent years, there has been a steady stream of research
in statistical text generation. We focus here on work which
generates sentences from some sentential semantic represen-
tation via a statistical method. For examples of related sta-
tistical sentence generators see Langkilde and Knight [1998]
and Bangalore and Rambow [2000]. These approaches be-
gin with a representation of sentence semantics that closely
resembles that of a dependency tree. This semantic represen-
tation is turned into a word lattice. By ranking all traversals
of this lattice using an n-gram model, the best surface realisa-
tion of the semantic representation is chosen. The system then
searches for the best path through this lattice. Our approach
differs in that we do not start with a semantic representation.
Instead, we paraphrase the original text. We search for the
best word sequence and dependency tree structure concur-
rently.
Research in summarisation has also addressed the prob-
lem of generating non-verbatim sentences; see [Jing and
McKeown, 1999], [Barzilay et al., 1999] and more recently
[Daum·e III and Marcu, 2004]. Jing presented a HMM for
learning alignments between summary and source sentences
trained using examples of summary sentences generated by
humans. Daume III also provides a mechanism for sub-
sentential alignment but allows for alignments between multi-
ple sentences. Both these approaches provide models for later
recombining sentence fragments. Our work differs primarily
in granularity. Using words as a basic unit potentially offers
greater  exibility in pseudo-paraphrase generation since we
able to modify the word sequence within the phrase.
It should be noted, however, that a successful execution of
our algorithm is likely to conserve constituent structure (ie. a
coarser granularity) via the use of dependencies, whilst still
making available a  exibility at the word level. Addition-
ally, our use of dependencies allows us to generate not only a
string but a dependency tree for that sentence.
6 Evaluation
In this section, we outline our preliminary evaluation of gram-
maticality in which we compare our dependency based gener-
ation method against a baseline. To study any improvements
in grammaticality, we compare our dependency based gener-
ation method against a baseline consisting of sentences gen-
erated using bigram model.
In the evaluation, we do not use any smoothing algorithms
for dependency counts. For both our approach and the base-
line, Katz’s Back-off smoothing algorithm is used for bigram
probabilities.
For our evaluation cases, we use the Information Fusion
data collected by [Barzilay et al., 1999]. This data is made
up of news articles that have been  rst grouped by topic,
and then component sentences further clustered by similar-
ity of events. There are 100 sentence clusters and on average
there are 4 sentences per cluster. Each sentence in the cluster
is initially passed through the Connexor dependency parser
(www.connexor.com) to obtain dependency relations. Each
sentence cluster forms an evaluation case in which we gener-
ate a single sentence. Example output and the original text of
the cluster is presented in Figure 4.
To give both our approach and the baseline the greatest
chance of generating a sentence, we obtain our bigrams from
our evaluation cases.5 Aside from this preprocessing to col-
lect input sentence bigrams and dependencies, there is no
training as such. For each evaluation case, both our system
and the baseline method generates a set of answer strings,
from 3 to 40 words in length.
For each generated output of a given sentence length, we
count the number of times the Connexor parser resorts to re-
turning partial parses. This count, albeit a noisy one, is used
as our measure of ungrammaticality. We calculate the aver-
age ungrammaticality score across evaluation cases for each
sentence length.
5Note that this is permissible in this case because we are not
making any claims about the coverage of our model.
Original Text
A military transporter was scheduled to take off in the afternoon from Yokota air base
on the outskirts of Tokyo and  y to Osaka with 37,000 blankets .
Mondale said the United States, which has been  ying in blankets and is sending a
team of quake relief experts, was prepared to do more if Japan requested .
United States forces based in Japan will take blankets to help earthquake survivors
Thursday, in the U.S. military’s  rst disaster relief operation in Japan since it set up
bases here.
Our approach with Dependencies and End of Sentence Check
6: united states forces based in blankets
8: united states which has been  ying in blankets
11: a military transporter was prepared to osaka with 37,000 blankets
18: mondale said the afternoon from yokota air base on the united states which has been
 ying in blankets
20: mondale said the outskirts of tokyo and is sending a military transporter was pre-
pared to osaka with 37,000 blankets
23: united states forces based in the afternoon from yokota air base on the outskirts of
tokyo and  y to osaka with 37,000 blankets
27: mondale said the afternoon from yokota air base on the outskirts of tokyo and is
sending a military transporter was prepared to osaka with 37,000 blankets
29: united states which has been  ying in the afternoon from yokota air base on the
outskirts of tokyo and is sending a team of quake relief operation in blankets
31: united states which has been  ying in the afternoon from yokota air base on the out-
skirts of tokyo and is sending a military transporter was prepared to osaka with 37,000
blankets
34: mondale said the afternoon from yokota air base on the united states which has
been  ying in the outskirts of tokyo and is sending a military transporter was prepared
to osaka with 37,000 blankets
36: united states which has been  ying in japan will take off in the afternoon from
yokota air base on the outskirts of tokyo and is sending a military transporter was pre-
pared to osaka with 37,000 blankets
Figure 4: A cluster of related sentences and sample generated
output from our system. Leftmost numbers indicate sentence
length.
 1
 1.5
 2
 2.5
 3
 3.5
 4
 4.5
 5
 3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Ungrammaticality Score
Sentence Length
Ungrammaticality Errors across Sentence Lengths
Baseline
System
Figure 5: Ungrammaticality scores for generated output.
Higher scores indicates worse performance.
The results are presented in Figure 5. Our approach almost
always performs better than the baseline, producing less er-
rors per sentence length. Using the Wilcoxon Signed Rank
Text (alpha = 0.5), we found that for sentences of length
greater than 12, the differences were usually signi cant.
7 Conclusion and Future Work
In this paper, we presented an extension to the Viterbi al-
gorithm that statistically determines dependency structure of
partially generated sentences and selects of words that are
likely to attach to this structure. The resulting sentence is
more grammatical than that generated using a bigram base-
line. In future work, we intend to conduct experiments to see
whether the smoothing approaches chosen are successful in
parsing without introducing spurious dependency relations.
We would also like to re-integrate the emission probability
(that is, the word content selection model). We are also in
the process of developing a measure of consistency. Finally,
we intend to provide a comparison evaluation with Barzilay’s
Information Fusion work.
8 Acknowledgements
This work was funded by the Centre for Language Technol-
ogy at Macquarie University and the CSIRO Information and
Communication Technology Centre. We would like to thank
the research groups of both organisations for useful com-
ments and feedback.

References
[Bangalore and Rambow, 2000] Srinivas Bangalore and
Owen Rambow. Exploiting a probabilistic hierarchical
model for generation. In Proceedings of the 18th Con-
ference on Computational Linguistics (COLING’2000),
July 31 - August 4 2000, Universitcurrency1at des Saarlandes,
Saarbrcurrency1ucken, Germany, 2000.
[Barzilay et al., 1999] Regina Barzilay, Kathleen R. McKe-
own, and Michael Elhadad. Information fusion in the con-
text of multi-document summarization. In Proceedings of
the 37th conference on Association for Computational Lin-
guistics, pages 550 557, Morristown, NJ, USA, 1999. As-
sociation for Computational Linguistics.
[Collins, 1996] Michael John Collins. A new statistical
parser based on bigram lexical dependencies. In Arivind
Joshi and Martha Palmer, editors, Proceedings of the
Thirty-Fourth Annual Meeting of the Association for Com-
putational Linguistics, pages 184 191, San Francisco,
1996. Morgan Kaufmann Publishers.
[Daum·e III and Marcu, 2004] Hal Daum·e III and Daniel
Marcu. A phrase-based hmm approach to docu-
ment/abstract alignment. In Dekang Lin and Dekai Wu,
editors, Proceedings of EMNLP 2004, pages 119 126,
Barcelona, Spain, July 2004. Association for Computa-
tional Linguistics.
[Forney, 1973] G. David Forney. The viterbi algorithm. Pro-
ceedings of The IEEE, 61(3):268 278, 1973.
[Jing and McKeown, 1999] Hongyan Jing and Kathleen
McKeown. The decomposition of human-written sum-
mary sentences. In Research and Development in Infor-
mation Retrieval, pages 129 136, 1999.
[Kittredge and Mel’cuk, 1983] Richard I. Kittredge and Igor
Mel’cuk. Towards a computable model of meaning-text
relations within a natural sublanguage. In IJCAI, pages
657 659, 1983.
[Knight and Marcu, 2002] Kevin Knight and Daniel Marcu.
Summarization beyond sentence extraction: a probabilis-
tic approach to sentence compression. Artif. Intell.,
139(1):91 107, 2002.
[Langkilde and Knight, 1998] Irene Langkilde and Kevin
Knight. The practical value of N-grams in derivation. In
Eduard Hovy, editor, Proceedings of the Ninth Interna-
tional Workshop on Natural Language Generation, pages
248 255, New Brunswick, New Jersey, 1998. Association
for Computational Linguistics.
[Manning and Schcurrency1utze, 1999] Christopher D. Manning and
Hinrich Schcurrency1utze. Foundations of Statistical Natural Lan-
guage Processing. The MIT Press, Cambridge, Mas-
sachusetts, 1999.
[Steedman, 2000] Mark Steedman. The syntactic process.
MIT Press, Cambridge, MA, USA, 2000.
[Wan et al., 2003] Stephen Wan, Mark Dras, Cecile Paris,
and Robert Dale. Using thematic information in statistical
headline generation. In The Proceedings of the Workshop
on Multilingual Summarization and Question Answering
at ACL 2003, Sapporo, Japan, July 2003.
[Witbrock and Mittal, 1999] Michael J. Witbrock and
Vibhu O. Mittal. Ultra-summarization (poster abstract):
a statistical approach to generating highly condensed
non-extractive summaries. In SIGIR ’99: Proceedings of
the 22nd annual international ACM SIGIR conference on
Research and development in information retrieval, pages
315 316, New York, NY, USA, 1999. ACM Press.
