Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 1105–1112,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Stochastic Language Generation Using WIDL-expressions and its
Application in Machine Translation and Summarization
Radu Soricut
Information Sciences Institute
University of Southern California
4676 Admiralty Way, Suite 1001
Marina del Rey, CA 90292
radu@isi.edu
Daniel Marcu
Information Sciences Institute
University of Southern California
4676 Admiralty Way, Suite 1001
Marina del Rey, CA 90292
marcu@isi.edu
Abstract
We propose WIDL-expressions as a flex-
ible formalism that facilitates the integra-
tion of a generic sentence realization sys-
tem within end-to-end language process-
ing applications. WIDL-expressions rep-
resent compactly probability distributions
over finite sets of candidate realizations,
and have optimal algorithms for realiza-
tion via interpolation with language model
probability distributions. We show the ef-
fectiveness of a WIDL-based NLG system
in two sentence realization tasks: auto-
matic translation and headline generation.
1 Introduction
The Natural Language Generation (NLG) com-
munity has produced over the years a consid-
erable number of generic sentence realization
systems: Penman (Matthiessen and Bateman,
1991), FUF (Elhadad, 1991), Nitrogen (Knight
and Hatzivassiloglou, 1995), Fergus (Bangalore
and Rambow, 2000), HALogen (Langkilde-Geary,
2002), Amalgam (Corston-Oliver et al., 2002), etc.
However, when it comes to end-to-end, text-to-
text applications – Machine Translation, Summa-
rization, Question Answering – these generic sys-
tems either cannot be employed, or, in instances
where they can be, the results are significantly
below that of state-of-the-art, application-specific
systems (Hajic et al., 2002; Habash, 2003). We
believe two reasons explain this state of affairs.
First, these generic NLG systems use input rep-
resentation languages with complex syntax and se-
mantics. These languages involve deep, semantic-
based subject-verb or verb-object relations (such
as ACTOR, AGENT, PATIENT, etc., for Penman
and FUF), syntactic relations (such as subject,
object, premod, etc., for HALogen), or lexi-
cal dependencies (Fergus, Amalgam). Such inputs
cannot be accurately produced by state-of-the-art
analysis components from arbitrary textual input
in the context of text-to-text applications.
Second, most of the recent systems (starting
with Nitrogen) have adopted a hybrid approach
to generation, which has increased their robust-
ness. These hybrid systems use, in a first phase,
symbolic knowledge to (over)generate a large set
of candidate realizations, and, in a second phase,
statistical knowledge about the target language
(such as stochastic language models) to rank the
candidate realizations and find the best scoring
one. The disadvantage of the hybrid approach
– from the perspective of integrating these sys-
tems within end-to-end applications – is that the
two generation phases cannot be tightly coupled.
More precisely, input-driven preferences and tar-
get language–driven preferences cannot be inte-
grated in a true probabilistic model that can be
trained and tuned for maximum performance.
In this paper, we propose WIDL-expressions
(WIDL stands for Weighted Interleave, Disjunc-
tion, and Lock, after the names of the main op-
erators) as a representation formalism that facil-
itates the integration of a generic sentence real-
ization system within end-to-end language appli-
cations. The WIDL formalism, an extension of
the IDL-expressions formalism of Nederhof and
Satta (2004), has several crucial properties that
differentiate it from previously-proposed NLG
representation formalisms. First, it has a sim-
ple syntax (expressions are built using four oper-
ators) and a simple, formal semantics (probability
distributions over finite sets of strings). Second,
it is a compact representation that grows linearly
1105
in the number of words available for generation
(see Section 2). (In contrast, representations such
as word lattices (Knight and Hatzivassiloglou,
1995) or non-recursive CFGs (Langkilde-Geary,
2002) require exponential space in the number
of words available for generation (Nederhof and
Satta, 2004).) Third, it has good computational
properties, such as optimal algorithms for inter-
section with a0 -gram language models (Section 3).
Fourth, it is flexible with respect to the amount of
linguistic processing required to produce WIDL-
expressions directly from text (Sections 4 and 5).
Fifth, it allows for a tight integration of input-
specific preferences and target-language prefer-
ences via interpolation of probability distributions
using log-linear models. We show the effec-
tiveness of our proposal by directly employing
a generic WIDL-based generation system in two
end-to-end tasks: machine translation and auto-
matic headline generation.
2 The WIDL Representation Language
2.1 WIDL-expressions
In this section, we introduce WIDL-expressions, a
formal language used to compactly represent prob-
ability distributions over finite sets of strings.
Given a finite alphabet of symbols a1 , atomic
WIDL-expressions are of the form a2 , with a2a4a3 a1 .
For a WIDL-expression a5a7a6a8a2 , its semantics is
a probability distribution a9a11a10a13a12a15a14a17a16a19a18a20a5a22a21a24a23a26a25a28a27a30a29a32a31a34a33
a35a36a30a37a39a38a41a40 , where
a25a42a27a30a29 a31 a6a44a43a45a2a13a46 and a9a47a10a13a12a15a14a17a16a48a18a20a5a22a21a17a18a48a2a49a21a50a6
a38 . Complex WIDL-expressions are created from
other WIDL-expressions, by employing the fol-
lowing four operators, as well as operator distri-
bution functions a51a53a52 from an alphabet a54 .
Weighted Disjunction. If a5a56a55 a37a53a57a53a57a53a57a58a37 a5a60a59 are
WIDL-expressions, then a5a61a6 a62a42a63a19a64a65a18a20a5a42a55 a37a53a57a53a57a53a57a45a37 a5a66a59a30a21 ,
with a51a17a67 a23a44a43 a38a68a37a53a57a53a57a53a57a58a37 a0 a46 a33 a35a36a30a37a39a38a41a40 , specified
such that a1a70a69a45a71a73a72a75a74a77a76a60a78a63a64a75a79a51a17a67a80a18a82a81a47a21a83a6
a38 , is a WIDL-
expression. Its semantics is a probability
distribution a9a13a10a13a12a15a14a17a16a48a18a20a5a22a21a84a23a85a25a28a27a30a29 a31 a33 a35a36a30a37a39a38a41a40 , where
a25a28a27a30a29 a31 a6 a86
a59
a52a88a87a60a55
a25a42a27a30a29 a31a80a89 , and the probabil-
ity values are induced by a51a39a67 and a9a49a10a11a12a15a14a41a16a19a18a20a5a66a52a82a21 ,
a38a91a90a93a92a94a90
a0 . For example, if
a5a95a6 a62a28a63a19a64a68a18a48a96
a37a98a97
a21 ,
a99
a64a101a100a94a102a17a103a32a104a106a105a39a107a108a45a109a111a110a112a104a106a105a39a107a110a41a113 , its semantics is a proba-
bility distribution a9a13a10a11a12a15a14a41a16a82a18a20a5a22a21 over a25a28a27a30a29 a31 a6a106a43a45a96
a37a98a97
a46 ,
defined by a114a68a115a68a116a117a119a118a88a120a122a121a124a123a119a120a88a125a41a123 a100 a99 a64 a120a103 a123 a100 a105a39a107a108 and
a114 a115a73a116a117a126a118a120a122a121a124a123a119a120a128a127a65a123
a100
a99
a64
a120
a110
a123
a100a4a105a39a107a110 .
Precedence. If a5a22a55
a37
a5a66a129 are WIDL-expressions,
then a5a130a6 a5 a55a132a131 a5 a129 is a WIDL-expression. Its
semantics is a probability distribution a9a124a10a13a12a15a14a17a16a82a18a20a5a22a21a4a23
a25a28a27a30a29a32a31a83a33
a35a36a30a37a39a38a41a40 , where
a25a28a27a30a29a133a31 is the set of all
strings that obey the precedence imposed over
the arguments, and the probability values are in-
duced by a9a47a10a13a12a15a14a17a16a48a18a20a5a134a55a41a21 and a9a47a10a13a12a15a14a17a16a82a18a20a5a66a129a45a21 . For example, if
a5a134a55a22a6a135a62a136a63a98a137a53a18a48a96
a37a98a97
a21 ,
a99
a137 a100a26a102a17a103a66a104a91a105a45a107a108a45a109a13a110a28a104a138a105a39a107a110a17a113 , and
a5a136a129a133a6
a62a136a63a140a139a58a18a48a141
a37a75a142
a21 ,
a99
a139a134a100a50a102a17a103a70a104a143a105a39a107a144a39a109a111a110a32a104a143a105a45a107a145a39a113 , then
a5a135a6a146a5a134a55 a131
a5a136a129 represents a probability distribution a9a147a10a13a12a15a14a17a16a19a18a20a5a22a21
over the set a25a28a27a30a29a112a31a148a6 a43a45a96a65a141 a37 a96 a142a149a37a98a97 a141 a37a98a97a124a142 a46 , defined
by a114 a115a73a116a117a126a118a120a122a121a124a123a119a120a88a125a77a150a98a123 a100 a99 a137 a120a103 a123a99 a139 a120a103 a123 a100a24a105a39a107a145a17a108 , a114 a115a68a116a117a126a118a120a122a121a149a123a119a120a88a125a77a151a73a123 a100
a99
a137
a120
a103
a123
a99
a139
a120
a110
a123
a100a4a105a45a107a152a17a110 , etc.
Weighted Interleave. If a5a56a55 a37a53a57a53a57a53a57a58a37 a5a60a59 are WIDL-
expressions, then a5a153a6a95a154a41a63a19a64a65a18a20a5a42a55 a37 a5a66a129 a37a53a57a53a57a53a57a58a37 a5a60a59a30a21 , with
a99
a64a136a155a53a156a11a157a147a102a98a158a77a159a140a160a162a161a77a163a39a164a17a161a77a163a88a165a167a166a140a113a41a157a11a102a75a166a119a160a53a168a98a169a22a161a119a166a119a113a136a104a171a170a105a39a109a98a103a119a172,
a173a175a174a24a176a49a177a68a178a19a29a112a59 ,
specified such that a1a101a179 a71a73a72a75a74a77a76a66a78a63a19a64 a79a51a17a67a65a18a48a2a49a21a146a6 a38 , is a
WIDL-expression. Its semantics is a probability
distribution a9a13a10a13a12a15a14a17a16a82a18a20a5a22a21a180a23a85a25a42a27a30a29 a31 a33 a35a36a30a37a39a38a41a40 , where
a25a28a27a30a29 a31 consists of all the possible interleavings of
strings from a25a28a27a30a29 a31a80a89 , a38a181a90a182a92a183a90 a0 , and the proba-
bility values are induced by a51a45a67 and a9a49a10a11a12a15a14a41a16a19a18a20a5a66a52a82a21 . The
distribution function a51 a67 is defined either explicitly,
over a173a180a174a184a176a49a177a68a178a185a29a112a59 (the set of all permutations of a0
elements), or implicitly, as a51a45a67a80a18a48a27a68a186a77a187a73a177a68a178a189a188a80a177a68a178a185a29a32a190a77a21 . Be-
cause the set of argument permutations is a sub-
set of all possible interleavings, a51a45a67 also needs to
specify the probability mass for the strings that
are not argument permutations, a51a45a67a80a18a191a190a53a187a193a192a39a194a195a177a17a190a41a21 . For
example, if a5 a6 a154a77a63a64 a18a48a96 a131 a97a167a37 a141a45a21 , a99 a64 a100a95a102a53a103a196a110a24a104
a105a45a107a108a41a105a39a109a42a158a77a159a119a160a162a161a162a163a53a164a17a161a77a163a88a165a167a166a198a197a200a199a20a201a202a203a200a204a205a104 a105a45a107a206a103a75a207a53a109a140a166a140a160a53a168a98a169a56a161a119a166a196a197a20a199a200a201a202a203a200a204a205a104 a105a45a107a105a41a207a17a113 , its
semantics is a probability distribution a9a124a10a13a12a15a14a17a16a48a18a20a5a22a21 ,
with domain a25a28a27a30a29 a31 a6a93a43a45a96 a97 a141 a37 a141a53a96 a97a167a37 a96a65a141 a97 a46 , defined
by a114 a115a68a116a117a126a118a120a122a121a149a123a119a120a88a125a162a127a65a150a98a123
a100
a99
a64
a120
a103a56a110
a123
a100a91a105a39a107a108a41a105 ,
a114 a115a73a116a117a126a118a120a122a121a124a123a119a120a128a150a98a125a162a127a65a123
a100
a208a48a209a98a210
a203a88a211a206a212a128a213a88a204a82a214a200a213a88a204a205a53a215a122a216
a137
a100a4a105a45a107a206a103a98a207 ,
a114a68a115a73a116a117a126a118a128a120a122a121a124a123a119a120a88a125a77a150a119a127a68a123
a100
a208a48a209a98a210
a215a217a212a20a197a219a218a68a213a122a215a219a216
a137
a100a4a105a39a107a105a41a207 .
Lock. If a5a42a220 is a WIDL-expression, then a5a93a6
a221
a18a20a5a28a220a88a21 is a WIDL-expression. The semantic map-
ping a9a47a10a13a12a15a14a17a16a48a18a20a5a22a21 is the same as a9a13a10a13a12a15a14a17a16a82a18a20a5 a220a21 , except
that a25a28a27a30a29 a31 contains strings in which no addi-
tional symbol can be interleaved. For exam-
ple, if a5 a6 a154a77a63a19a64a65a18 a221 a18a48a96 a131 a97 a21
a37
a141a45a21 ,
a99
a64a91a100a44a102a17a103a222a110a184a104
a105a45a107a108a41a105a39a109a124a158a162a159a119a160a162a161a77a163a53a164a41a161a77a163a88a165a167a166a223a104a182a105a45a107a110a77a105a17a113 , its semantics is a proba-
bility distribution a9a13a10a13a12a15a14a17a16a82a18a20a5a22a21 , with domain a25a28a27a30a29 a31 a6
a43a45a141a53a96
a97a111a37
a96
a97
a141a68a46 , defined by a114 a115a68a116a117a126a118a120a122a121a149a123a119a120a88a125a162a127a65a150a98a123
a100
a99
a64
a120
a103a56a110
a123
a100
a105a45a107a108a41a105 ,
a114 a115a73a116a117a126a118a120a122a121a124a123a119a120a128a150a98a125a77a127a73a123
a100
a208a48a209a162a210
a203a200a211a15a212a88a213a88a204a82a214a88a213a88a204a205a17a215a224a216
a137
a100a4a105a45a107a110a77a105 .
In Figure 1, we show a more complex WIDL-
expression. The probability distribution a51 a55 associ-
ated with the operator a154a41a63a126a137 assigns probability 0.2
to the argument order a225 a38a111a226 ; from a probability
mass of 0.7, it assigns uniformly, for each of the
remaining a226a189a227a147a228a135a38 a6a230a229 argument permutations, a
permutation probability value of a67a41a231a232a233 a6 a36a30a57a88a38a17a234 . The
1106
a154a162a63a126a137a58a18
a221
a18a191a186a77a192a193a178a1a0a3a2a190a53a187 a131a5a4 a27a3a6a73a177a68a178a1a7a193a29a133a177a8a7a45a186a162a21
a37
a62a136a63a140a139a65a18
a221
a18a119a178a200a177a8a9a80a177a8a10a190 a131a12a11a13a4 a187a58a186a14a2a15a7 a4 a21
a37
a221
a18a17a16a45a186a119a186a18a16a20a19a8a0a77a177a22a21 a131 a178a20a177a8a9a80a177a8a10a190a77a21a98a21
a37
a2a23a7 a131 a2a122a178a24a16a3a25a193a21
a37
a51a58a55a22a6a135a43a58a225
a38a111a226
a33
a36a30a57
a225
a37
a27a68a186a41a187a58a177a68a178a189a188a80a177a68a178a19a29a32a190a27a26a14a28
a29a30
a74a18a31a219a76
a33
a36a30a57a23a32a193a37
a190a53a187a193a192a39a194a195a177a17a190a27a26a33a28
a29a30
a74a33a31a219a76
a33
a36a30a57a88a38
a46
a37
a51a17a129a32a6a135a43
a38
a33
a36a30a57a35a34
a229
a37
a225 a33
a36a30a57a217a226
a229a193a46
Figure 1: An example of a WIDL-expression.
remaining probability mass of 0.1 is left for the
12 shuffles associated with the unlocked expres-
sion a2a15a7 a131 a2a122a178a24a16a3a25 , for a shuffle probability of a67a41a231a15a55
a55a185a129
a6
a36a30a57a206a36a65a36a37a36 . The list below enumerates some of the
a38
a190a98a186a41a178a17a2a23a7 a4
a37a40a39
a18a191a190a98a186a41a178a17a2a23a7 a4 a21a18a41 pairs that belong to the proba-
bility distribution defined by our example:
rebels fighting turkish government in iraq 0.130
in iraq attacked rebels turkish goverment 0.049
in turkish goverment iraq rebels fighting 0.005
The following result characterizes an important
representation property for WIDL-expressions.
Theorem 1 A WIDL-expression a5 over a1 and a54
using a0 atomic expressions has space complexity
O(a0 ), if the operator distribution functions of a5
have space complexity at most O(a0 ).
For proofs and more details regarding WIDL-
expressions, we refer the interested reader
to (Soricut, 2006). Theorem 1 ensures that high-
complexity hypothesis spaces can be represented
efficiently by WIDL-expressions (Section 5).
2.2 WIDL-graphs and Probabilistic
Finite-State Acceptors
WIDL-graphs. Equivalent at the representation
level with WIDL-expressions, WIDL-graphs al-
low for formulations of algorithms that process
them. For each WIDL-expression a5 , there exists
an equivalent WIDL-graph a42 a31 . As an example,
we illustrate in Figure 2(a) the WIDL-graph cor-
responding to the WIDL-expression in Figure 1.
WIDL-graphs have an initial vertex a43a45a44 and a final
vertex a43a37a46 . Vertices a43a80a67 , a43a48a47 , and a43a65a129a98a67 with in-going
edges labeled a49 a55
a63 a137
, a49 a129
a63 a137
, and a49a51a50
a63 a137
, respectively, and
vertices a43 a233 , a43a189a55a53a52 , and a43a65a129
a50
with out-going edges la-
beled a54 a55
a63a126a137
, a54 a129
a63a98a137
, and a54a51a50
a63a126a137
, respectively, result from
the expansion of the a154a41a63a126a137 operator. Vertices a43a193a232
and a43 a55
a50
with in-going edges labeled a18 a55
a63a185a139
, a18a129
a63a185a139
, re-
spectively, and vertices a43a49a55a185a129 and a43a189a55a53a55 with out-going
edges labeled a21 a55
a63a140a139
, a21a129
a63a140a139
, respectively, result from the
expansion of the a62a28a63a185a139 operator.
With each WIDL-graph a42 a31 , we associate a
probability distribution. The domain of this dis-
tribution is the finite collection of strings that can
be generated from the paths of a WIDL-specific
traversal of a42 a31 , starting from a43a56a44 and ending in a43a37a46 .
Each path (and its associated string) has a proba-
bility value induced by the probability distribution
functions associated with the edge labels of a42 a31 . A
WIDL-expression a5 and its corresponding WIDL-
graph a42 a31 are said to be equivalent because they
represent the same distribution a9a11a10a11a12a15a14a41a16a19a18a20a5a22a21 .
WIDL-graphs and Probabilistic FSA. Proba-
bilistic finite-state acceptors (pFSA) are a well-
known formalism for representing probability dis-
tributions (Mohri et al., 2002). For a WIDL-
expression a5 , we define a mapping, called
UNFOLD, between the WIDL-graph a42 a31 and a
pFSA a57 a31 . A state a58 in a57 a31 is created for each
set of WIDL-graph vertices that can be reached
simultaneously when traversing the graph. State
a58 records, in what we call a a154 -stack (interleave
stack), the order in which a49 a52
a63
,a54
a52
a63
–bordered sub-
graphs are traversed. Consider Figure 2(b), in
which state a35a43a80a67a59a43a48a52a60a43a68a129
a50
a37
a43
a38
a63a98a137
a226
a225a193a46
a40 (at the bottom) cor-
responds to reaching vertices a43a193a67 a37 a43a8a52 , and a43a65a129
a50
(see
the WIDL-graph in Figure 2(a)), by first reach-
ing vertex a43a80a129
a50
(inside the a49 a50
a63a126a137
, a54 a50
a63a126a137
–bordered sub-
graph), and then reaching vertex a43 a52 (inside the a49 a129
a63a98a137
,
a54
a129
a63a98a137
–bordered sub-graph).
A transition labeled a2 between two a57 a31 states
a58a80a55 and a58a58a129 in a57 a31 exists if there exists a vertex a43a3a61
in the description of a58 a55 and a vertex a43 a69 in the de-
scription of a58a73a129 such that there exists a path in a42 a31
between a43a5a61 and a43 a69 , and a2 is the only a1 -labeled
transitions in this path. For example, transition
a35
a43a65a67a59a43a48a52a60a43a68a129
a50
a37
a43
a38
a63a98a137
a226
a225a193a46
a40
a31a63a62a65a64a18a62a67a66a68
a33
a35
a43a65a67a60a43a189a55a53a52a60a43a65a129
a50
a37
a43
a38
a63a126a137
a226
a225a193a46
a40 (Fig-
ure 2(b)) results from unfolding the path a43a56a52a70a69a33
a43a189a55a185a67
a31a35a62a67a64a18a62a65a66a68
a33 a43a189a55a98a55a71a69a33a72a43a189a55a185a129
a79
a137
a208
a139
a33a73a43a189a55a53a52 (Figure 2(a)). A tran-
sition labeled a74 between two a57a112a31 states a58 a55 and a58 a129 in
a57 a31 exists if there exists a vertex a43a3a61 in the descrip-
tion of a58 a55 and vertices a43 a55a69 a37a53a57a53a57a53a57a45a37 a43 a59a69 in the descrip-
tion of a58a58a129 , such that a43a5a61a76a75
a89
a208
a33a77a43
a52
a69
a3a78a42 a31 ,
a38a4a90a171a92 a90
a0
(see transition a35a43a79a44 a37a126a40 a69a33 a35a43a65a67a60a43a8a47a59a43a65a129a98a67 a37 a43 a38 a63a126a137a59a41a119a63a126a137a17a46 a40 ), or if
there exists vertices a43
a55
a61
a37a53a57a53a57a53a57a45a37
a43
a59
a61
in the description
of a58 a55 and vertex a43 a69 in the description of a58 a129 , such
that a43 a52
a61
a80a58a89
a208
a33a81a43
a69
a3a82a42 a31 ,
a38 a90a180a92a70a90
a0 . The
a74 -transitions
1107
a0
a0a1
a1
a2
a2a3
a3
a4
a4a5
a5
a6
a6a7
a7
a8
a8a9
a9
a10
a10a11
a11
a12
a12a13
a13
a14
a14a15
a15
a16
a16a17
a17
a18
a18a19
a19
a20
a20a21
a21
a22
a22a23
a23
a24
a24a25
a25
a26
a26a27
a27
a28
a28a29
a29
a30
a30a31
a31
a32
a32a33
a33
a34
a34a35
a35
a36
a36a37
a37
a38
a38a39
a39
a40
a40a41
a41
a42
a42a43
a43
a44
a44a45
a45
a46
a46a47
a47
a48
a48a49
a49
a50
a50a51
a51
a52a53a52a53a52
a52a53a52a53a52
a52a53a52a53a52
a52a53a52a53a52
a52a53a52a53a52
a52a53a52a53a52
a54a53a54a53a54
a54a53a54a53a54
a54a53a54a53a54
a54a53a54a53a54
a54a53a54a53a54
a54a53a54a53a54
a55a53a55a53a55
a55a53a55a53a55
a55a53a55a53a55
a55a53a55a53a55
a55a53a55a53a55
a55a53a55a53a55
a55a53a55a53a55
a56a53a56a53a56
a56a53a56a53a56
a56a53a56a53a56
a56a53a56a53a56
a56a53a56a53a56
a56a53a56a53a56
a57a53a57a53a57
a57a53a57a53a57
a57a53a57a53a57
a57a53a57a53a57
a57a53a57a53a57
a57a53a57a53a57
a58a53a58a53a58
a58a53a58a53a58
a58a53a58a53a58
a58a53a58a53a58
a58a53a58a53a58
a58a53a58a53a58
a59a53a59a53a59a53a59
a59a53a59a53a59a53a59
a59a53a59a53a59a53a59
a59a53a59a53a59a53a59
a59a53a59a53a59a53a59
a59a53a59a53a59a53a59
a59a53a59a53a59a53a59
a60a53a60a53a60a53a60
a60a53a60a53a60a53a60
a60a53a60a53a60a53a60
a60a53a60a53a60a53a60
a60a53a60a53a60a53a60
a60a53a60a53a60a53a60
attacked
attacked
attacked
attacked
attacked rebels
rebels
rebels
fighting
rebels
rebels
rebels
rebels
rebels
fighting
fighting fighting
fighting
turkish
turkish
turkish
turkish
turkish
turkish
turkish government
government
government
government
government government
in
iraq
in
in in
in
in
iraq
iraq
iraq
iraq
iraq
ε
ε
δ1
government
turkish
:0.3
attacked :0.1
:0.3
:1
:1
rebels
:0.2
:1
fighting
:1rebels
:1
δ1
:0.18
:0.18
:1rebels
:1rebels
:1
ε
0 6 21
0 6 023 9 23
9 0 2111
0 209
0 1520
6 202
0 21
s
e
(b)(a)
rebels
rebels fighting
(
( )2
δ1
δ1
δ1
δ1
δ1
δ1
δ2
δ2 δ2
1
2
3
2
11
3
)1δ2
attacked
in iraq
ε ε ε
εεε
ε ε ε
ε
turkish government1 1 1
1 1 1 1
1 1 1 1
2v
v v v v
v v v v v v
v v v v v v
v
v
v v v v
v
1
v
s e
0 1 2 3 4
v6
7 8 9 10 11 12
13 14 15 16 17 18
19
20 232221
5
0 6 20
0 δ12319
0 2319
[v ,   ]
0 δ12319
0 2319
[v v  v ,<32][v v v ,<32][v v v ,<3]
[v v v ,<3] [v v v ,<32]
[v v  v ,<0]
[v v  v ,<32]
[v v  v ,<2]
[v v v ,<2]
[v ,   ]
[v v v ,<1 ]
ε
ε
δ1
δ1δ1
δ1δ1δ1
δ1
δ1
δ1
δ1
0.1 }shuffles0.7,δ1= { 2 1 3 0.2, other perms
δ2 = { 1 0.35 }0.65,  2
δ1[v v v ,< > ]δ1
[v v  v ,< 0 > ]
[v v  v ,< 321 > ]δ1δ1
Figure 2: The WIDL-graph corresponding to the WIDL-expression in Figure 1 is shown in (a). The
probabilistic finite-state acceptor (pFSA) that corresponds to the WIDL-graph is shown in (b).
are responsible for adding and removing, respec-
tively, the a38 a63 ,a41a126a63 symbols in the a154 -stack. The prob-
abilities associated with a57 a31 transitions are com-
puted using the vertex set and the a154 -stack of each
a57a32a31 state, together with the distribution functions
of the a62 and a154 operators. For a detailed presen-
tation of the UNFOLD relation we refer the reader
to (Soricut, 2006).
3 Stochastic Language Generation from
WIDL-expressions
3.1 Interpolating Probability Distributions in
a Log-linear Framework
Let us assume a finite set a61 of strings over a
finite alphabet a1 , representing the set of possi-
ble sentence realizations. In a log-linear frame-
work, we have a vector of feature functions a62 a6
a38
a62 a67 a62 a55
a57a53a57a53a57
a62a64a63 a41 , and a vector of parameters a65a182a6
a38
a65a13a67a53a65a124a55
a57a53a57a53a57
a65 a63 a41 . For any a66 a3
a61 , the interpolated
probability a67a222a18a68a66a58a21 can be written under a log-linear
model as in Equation 1:
a67a183a18a68a66a73a21a42a6
a69a71a70a73a72
a35a1
a63
a74
a87a124a67
a65
a74
a62
a74
a18a68a66a73a21
a40
a1
a46a76a75
a69a71a70a73a72
a35a1
a63
a74
a87a124a67
a65
a74
a62
a74
a18a68a66 a220a21
a40 (1)
We can formulate the search problem of finding
the most probable realization a66 under this model
as shown in Equation 2, and therefore we do not
need to be concerned about computing expensive
normalization factors.
a96a78a77a80a79a82a81a195a96
a70
a46
a67a222a18a68a66a58a21a42a6a84a96a78a77a80a79a83a81a195a96
a70
a46
a69a71a70a73a72
a35a1
a63
a74
a87a124a67
a65
a74
a62
a74
a18a68a66a58a21
a40 (2)
For a given WIDL-expression a5 over a1 , the set a61
is defined by a21 a27a30a29a196a18a48a9a47a10a13a12a15a14a17a16a19a18a20a5a22a21a98a21 , and feature function
a62a13a67 is taken to be a9a13a10a13a12a15a14a17a16a82a18a20a5a22a21 . Any language model
we want to employ may be added in Equation 2 as
a feature function a62a13a52 , a92a85a84a24a38 .
3.2 Algorithms for Intersecting
WIDL-expressions with Language
Models
Algorithm WIDL-NGLM-Aa86 (Figure 3) solves
the search problem defined by Equation 2 for a
WIDL-expression a5 (which provides feature func-
tion a62 a67 ) and a87 a0 -gram language models (which
provide feature functions a62a167a55 a37a53a57a53a57a53a57a58a37 a62 a63 a21 . It does
so by incrementally computing UNFOLD for a42 a31
(i.e., on-demand computation of the correspond-
ing pFSA a57 a31 ), by keeping track of a set of active
states, called a88a90a89a92a91a94a93a96a95a78a97 . The set of newly UNFOLDed
states is called a98a100a99a53a101a71a102a78a103a105a104 . Using Equation 1 (unnor-
malized), we EVALUATE the current a67a183a18a68a66a58a21 scores
for the a98a73a99a92a101a106a102a78a103a107a104 states. Additionally, EVALUATE
uses an admissible heuristic function to compute
future (admissible) scores for the a98a73a99a92a101a71a102a108a103a107a104 states.
The algorithm PUSHes each state from the cur-
rent a98a100a99a53a101a71a102a78a103a105a104 into a priority queue a109 , which sorts
the states according to their total score (current a110
admissible). In the next iteration, a88a90a89a53a91a94a93a111a95a108a97 is a sin-
gleton set containing the state POPed out from the
top of a109 . The admissible heuristic function we use
is the one defined in (Soricut and Marcu, 2005),
using Equation 1 (unnormalized) for computing
the event costs. Given the existence of the ad-
missible heuristic and the monotonicity property
of the unfolding provided by the priority queue a109 ,
the proof for Aa86 optimality (Russell and Norvig,
1995) guarantees that WIDL-NGLM-Aa86 finds a
path in a57a113a112 that provides an optimal solution.
1108
WIDL-NGLM-Aa86a30a18a1a42 a31
a37
a62
a37
a65a147a21
1 a88a90a89a53a91a94a93a111a95a108a97a1a0 a43 a35a43a37a44 a37 a43a65a46 a40a46
2 a2 a88a4a3a5a0 a38
3 while a2 a88a4a3
4 do a98a73a99a92a101a106a102a78a103a107a104a6a0 UNFOLDa18a1a42 a31
a37
a88a90a89a53a91a94a93a111a95a108a97a68a21
5 EVALUATEa18 a98a100a99a53a101a71a102a78a103a105a104
a37
a62
a37
a65a124a21
6 if a88a90a89a92a91a94a93a96a95a78a97 a6a135a43 a35a43a48a46 a37 a43a65a46 a40a46
7 then a2 a88a4a3a5a0 a36
8 for each a7 a91 a88a108a91 a97 in a98a73a99a92a101a106a102a78a103a105a104
do PUSHa18 a109
a37
a7 a91 a88a108a91 a97a80a21
a88 a89a92a91a94a93a96a95a78a97a8a0 POPa18 a109 a21
9 return a88a90a89a53a91a94a93a111a95a108a97
Figure 3: Aa86 algorithm for interpolating WIDL-
expressions with a0 -gram language models.
An important property of the
WIDL-NGLM-Aa86 algorithm is that the UNFOLD
relation (and, implicitly, the a57a112a31 acceptor) is
computed only partially, for those states for
which the total cost is less than the cost of the
optimal path. This results in important savings,
both in space and time, over simply running a
single-source shortest-path algorithm for directed
acyclic graphs (Cormen et al., 2001) over the full
acceptor a57a133a31 (Soricut and Marcu, 2005).
4 Headline Generation using
WIDL-expressions
We employ the WIDL formalism (Section 2) and
the WIDL-NGLM-Aa86 algorithm (Section 3) in a
summarization application that aims at producing
both informative and fluent headlines. Our head-
lines are generated in an abstractive, bottom-up
manner, starting from words and phrases. A more
common, extractive approach operates top-down,
by starting from an extracted sentence that is com-
pressed (Dorr et al., 2003) and annotated with ad-
ditional information (Zajic et al., 2004).
Automatic Creation of WIDL-expressions for
Headline Generation. We generate WIDL-
expressions starting from an input document.
First, we extract a weighted list of topic keywords
from the input document using the algorithm of
Zhou and Hovy (2003). This list is enriched
with phrases created from the lexical dependen-
cies the topic keywords have in the input docu-
ment. We associate probability distributions with
these phrases using their frequency (we assume
Keywords a43 iraq 0.32, syria 0.25, rebels 0.22,
kurdish 0.17, turkish 0.14, attack 0.10a46
Phrases
iraq a43 in iraq 0.4, northern iraq 0.5,iraq and iran 0.1a46 ,
syria a43 into syria 0.6, and syria 0.4 a46
rebels a43 attacked rebels 0.7,rebels fighting 0.3a46
. . .
a9 WIDL-expression & trigram interpolation
TURKISH GOVERNMENT ATTACKED REBELS IN IRAQ AND SYRIA
Figure 4: Input and output for our automatic head-
line generation system.
that higher frequency is indicative of increased im-
portance) and their position in the document (we
assume that proximity to the beginning of the doc-
ument is also indicative of importance). In Fig-
ure 4, we present an example of input keywords
and lexical-dependency phrases automatically ex-
tracted from a document describing incidents at
the Turkey-Iraq border.
The algorithm for producing WIDL-
expressions combines the lexical-dependency
phrases for each keyword using a a62 operator with
the associated probability values for each phrase
multiplied with the probability value of each
topic keyword. It then combines all the a62 -headed
expressions into a single WIDL-expression using
a a154 operator with uniform probability. The WIDL-
expression in Figure 1 is a (scaled-down) example
of the expressions created by this algorithm.
On average, a WIDL-expression created by this
algorithm, using a10 a6 a34 keywords and an average
of a81a195a6 a234 lexical-dependency phrases per keyword,
compactly encodes a candidate set of about 3
million possible realizations. As the specification
of the a154a77a63 operator takes space a11a195a18 a38 a21 for uniform a51 ,
Theorem 1 guarantees that the space complexity
of these expressions is a11a183a18a12a10 a81a47a21 .
Finally, we generate headlines from WIDL-
expressions using the WIDL-NGLM-Aa86 algo-
rithm, which interpolates the probability distribu-
tions represented by the WIDL-expressions with
a0 -gram language model distributions. The output
presented in Figure 4 is the most likely headline
realization produced by our system.
Headline Generation Evaluation. To evaluate
the accuracy of our headline generation system,
we use the documents from the DUC 2003 eval-
uation competition. Half of these documents
are used as development set (283 documents),
1109
ALG a0 (uni) a0 (bi) Len. Rougea1 Rougea2
Extractive
Lead10 458 114 9.9 20.8 11.1
HedgeTrimmera3 399 104 7.4 18.1 9.9
Topiarya4 576 115 9.9 26.2 12.5
Abstractive
Keywords 585 22 9.9 26.6 5.5
Webcl 311 76 7.3 14.1 7.5
WIDL-Aa5 562 126 10.0 25.5 12.9
Table 1: Headline generation evaluation. We com-
pare extractive algorithms against abstractive al-
gorithms, including our WIDL-based algorithm.
and the other half is used as test set (273 docu-
ments). We automatically measure performance
by comparing the produced headlines against one
reference headline produced by a human using
ROUGEa129 (Lin, 2004).
For each input document, we train two language
models, using the SRI Language Model Toolkit
(with modified Kneser-Ney smoothing). A gen-
eral trigram language model, trained on 170M
English words from the Wall Street Journal, is
used to model fluency. A document-specific tri-
gram language model, trained on-the-fly for each
input document, accounts for both fluency and
content validity. We also employ a word-count
model (which counts the number of words in a
proposed realization) and a phrase-count model
(which counts the number of phrases in a proposed
realization), which allow us to learn to produce
headlines that have restrictions in the number of
words allowed (10, in our case). The interpolation
weights a65 (Equation 2) are trained using discrimi-
native training (Och, 2003) using ROUGEa129 as the
objective function, on the development set.
The results are presented in Table 1. We com-
pare the performance of several extractive algo-
rithms (which operate on an extracted sentence
to arrive at a headline) against several abstractive
algorithms (which create headlines starting from
scratch). For the extractive algorithms, Lead10
is a baseline which simply proposes as headline
the lead sentence, cut after the first 10 words.
HedgeTrimmera6 is our implementation of the Hedge
Trimer system (Dorr et al., 2003), and Topiarya7 is
our implementation of the Topiary system (Zajic
et al., 2004). For the abstractive algorithms, Key-
words is a baseline that proposes as headline the
sequence of topic keywords, Webcl is the system
THREE GORGES PROJECT IN CHINA HAS WON APPROVAL
WATER IS LINK BETWEEN CLUSTER OF E. COLI CASES
SRI LANKA ’S JOINT VENTURE TO EXPAND EXPORTS
OPPOSITION TO EUROPEAN UNION SINGLE CURRENCY EURO
OF INDIA AND BANGLADESH WATER BARRAGE
Figure 5: Headlines generated automatically using
a WIDL-based sentence realization system.
described in (Zhou and Hovy, 2003), and WIDL-
Aa8 is the algorithm described in this paper.
This evaluation shows that our WIDL-based
approach to generation is capable of obtaining
headlines that compare favorably, in both content
and fluency, with extractive, state-of-the-art re-
sults (Zajic et al., 2004), while it outperforms a
previously-proposed abstractive system by a wide
margin (Zhou and Hovy, 2003). Also note that our
evaluation makes these results directly compara-
ble, as they use the same parsing and topic identi-
fication algorithms. In Figure 5, we present a sam-
ple of headlines produced by our system, which
includes both good and not-so-good outputs.
5 Machine Translation using
WIDL-expressions
We also employ our WIDL-based realization en-
gine in a machine translation application that uses
a two-phase generation approach: in a first phase,
WIDL-expressions representing large sets of pos-
sible translations are created from input foreign-
language sentences. In a second phase, we use
our generic, WIDL-based sentence realization en-
gine to intersect WIDL-expressions with an a0 -
gram language model. In the experiments reported
here, we translate between Chinese (source lan-
guage) and English (target language).
Automatic Creation of WIDL-expressions for
MT. We generate WIDL-expressions from Chi-
nese strings by exploiting a phrase-based trans-
lation table (Koehn et al., 2003). We use an al-
gorithm resembling probabilistic bottom-up pars-
ing to build a WIDL-expression for an input Chi-
nese string: each contiguous span a18a92a75a37a10a9 a21 over a
Chinese string a11a22a52a13a12a61 is considered a possible “con-
stituent”, and the “non-terminals” associated with
each constituent are the English phrase transla-
tions a61
a69
a52a13a12a61
that correspond in the translation ta-
ble to the Chinese string a11a56a52a13a12a61 . Multiple-word En-
glish phrases, such as a14a16a15a17a14a19a18a20a14a22a21 , are represented
as WIDL-expressions using the precedence (a131) and
1110
a0a2a1
a1a4a3a6a5
a1
a2a7a3a9a8a11a10a13a12a13a14a16a15a13a12a18a17a11a19a16a3a21a20a23a22a25a24a27a26a28a8a18a10a13a12a29a14a30a24a25a12a25a31a25a17a32a8a18a10a13a12a29a14a30a24a25a12a25a31a25a17
a5
a1a34a33
a3a25a19a16a3a35a22a2a15a37a36a25a24a38a26a18a39a13a24a34a24a2a12a28a31a25a17a2a39a28a24a37a40 a12a25a8a41a17a42a19a16a3a43a22a28a15a25a44a45a26a18a39a13a24a34a24a25a12a13a31a25a17a23a46a47a24a2a48 a24a29a17a23a46a49a15a25a50a35a31a25a17
a19a51a3a43a39a34a52a51a26a18a53a28a54a18a55a40a56a32a24a25a31a2a17a37a5
a1a35a57
a3a35a58a37a40 a55 a55 a17a28a58a37a40 a55a55a24a32a44a6a17a2a58a37a40 a55 a55a40 a12a25a8a41a31a25a17a13a59a60a31
a61
a1a45a62a64a63
a53a13a24a2a48a21a14a30a50a47a65 a66a68a67 a69a18a70a9a71 a67a9a65a9a72
a73 a74
a59 a75a41a76
a61
a2a77a62a78a63
a74a79a73
a75a6a59 a80a18a81a38a75a6a59 a75a18a80a30a75a82a59
a74a13a83
a75a6a59 a84a18a81a41a17
a61
a33
a62a64a63
a74a77a73
a75a82a59
a74
a80a38a75a6a59
a74a28a85
a75a6a59
a74a28a86
a75a6a59 a75a18a87a82a17 a84
a73
a75a6a59 a80a29a81a30a75a82a59
a74
a75a27a75a6a59 a80a18a84a27a75a6a59 a80a29a88a82a17
a84
a73
a75a6a59
a85
a87a27a75a6a59
a74
a81a30a75a6a59 a80a13a75a30a75a82a59
a85
a88a41a17 a80
a73
a75a6a59 a80a13a75a16a75a82a59
a86
a87a38a75a6a59
a85a11a83
a75a6a59 a80
a83
a76
a80
a73
a75a6a59
a74a28a85
a75a6a59
a74a29a74
a75a6a59
a74a28a86
a75a82a59 a75a11a87a41a17
a61
a57
a62a78a63
a74a79a73
a75a6a59 a88a18a81a38a75a6a59 a80a29a80a30a75a82a59 a84a18a81a38a75a6a59 a88a18a80a41a17
a85a89a73
a75a6a59
a74
a88a27a75a6a59 a84a29a87a30a75a6a59 a84
a85
a75a82a59 a84
a85
a17 a84
a73
a75a6a59 a84a29a80a30a75a82a59 a81a29a75a27a75a6a59 a81a18a80a27a75a6a59 a84a29a88a82a17
a81
a73
a75a6a59
a74
a75a30a75a6a59 a80a29a80a30a75a6a59
a74
a75a30a75a82a59
a74
a88a11a76 a80
a73
a75a6a59
a74
a84a30a75a82a59
a74
a88a38a75a6a59 a84a18a84a27a75a6a59
a74a29a74
a76
a9 WIDL-expression & trigram interpolation
gunman was killed by police .
Figure 6: A Chinese string is converted into a
WIDL-expression, which provides a translation as
the best scoring hypothesis under the interpolation
with a trigram language model.
lock (a221 ) operators, as a221 a18 a14 a15a134a131 a14 a18a56a131 a14 a21 a21 . To limit
the number of possible translations a61
a69
a52a13a12a61
corre-
sponding to a Chinese span a11a56a52a13a12a61 , we use a prob-
abilistic beam a90 and a histogram beam a58 to beam
out low probability translation alternatives. At this
point, each a11 a52 a12a61 span is “tiled” with likely transla-
tions a61
a69
a52a13a12a61
taken from the translation table.
Tiles that are adjacent are joined together in
a larger tile by a a154a77a63 operator, where a51 a6
a43a80a188a80a177a68a178a19a29a133a190
a62a25a91a82a92 a76a149a74a11a93 a92a18a62
a28a33
a38
a46 . That is, reordering of
the component tiles are permitted by the a154a53a63 op-
erators (assigned non-zero probability), but the
longer the movement from the original order of
the tiles, the lower the probability. (This distor-
tion model is similar with the one used in (Koehn,
2004).) When multiple tiles are available for the
same span a18a92a75a37a10a9 a21 , they are joined by a a62a42a63 opera-
tor, where a51 is specified by the probability distri-
butions specified in the translation table. Usually,
statistical phrase-based translation tables specify
not only one, but multiple distributions that ac-
count for context preferences. In our experi-
ments, we consider four probability distributions:
a39
a18 a94a96a95 a66a68a21
a37a40a39
a18 a66a16a95 a94a111a21
a37a40a39a89a97
a46a29a98 a18 a94a96a95 a66a68a21 , and
a39a99a97
a46a13a98 a18 a66a51a95 a94a124a21 , where a94
and a66 are Chinese-English phrase translations as
they appear in the translation table. In Figure 6,
we show an example of WIDL-expression created
by this algorithm1.
On average, a WIDL-expression created by this
algorithm, using an average of a10 a6 a226a48a36 tiles per
sentence (for an average input sentence length of
30 words) and an average of a81a183a6a101a100 possible trans-
lations per tile, encodes a candidate set of about
10a233 a67 possible translations. As the specification
of the a154a162a63 operators takes space a11a195a18 a38 a21 , Theorem 1
1English reference: the gunman was shot dead by the police.
guarantees that these WIDL-expressions encode
compactly these huge spaces in a11a183a18a12a10 a81a47a21 .
In the second phase, we employ our WIDL-
based realization engine to interpolate the distri-
bution probabilities of WIDL-expressions with a
trigram language model. In the notation of Equa-
tion 2, we use four feature functions a62a124a67 a37a53a57a53a57a53a57a58a37 a62
a50
for
the WIDL-expression distributions (one for each
probability distribution encoded); a feature func-
tion a62a79a102 for a trigram language model; a feature
function a62 a233 for a word-count model, and a feature
function a62 a47 for a phrase-count model.
As acknowledged in the Machine Translation
literature (Germann et al., 2003), full Aa86 search is
not usually possible, due to the large size of the
search spaces. We therefore use an approxima-
tion algorithm, called WIDL-NGLM-Aa86a69 , which
considers for unfolding only the nodes extracted
from the priority queue a109 which already unfolded
a path of length greater than or equal to the max-
imum length already unfolded minus a81 (we used
a81a195a6a84a225 in the experiments reported here).
MT Performance Evaluation. When evaluated
against the state-of-the-art, phrase-based decoder
Pharaoh (Koehn, 2004), using the same experi-
mental conditions – translation table trained on
the FBIS corpus (7.2M Chinese words and 9.2M
English words of parallel text), trigram lan-
guage model trained on 155M words of English
newswire, interpolation weights a65 (Equation 2)
trained using discriminative training (Och, 2003)
(on the 2002 NIST MT evaluation set), probabilis-
tic beam a90 set to 0.01, histogram beam a58 set to 10
– and BLEU (Papineni et al., 2002) as our met-
ric, the WIDL-NGLM-Aa86
a129
algorithm produces
translations that have a BLEU score of 0.2570,
while Pharaoh translations have a BLEU score of
0.2635. The difference is not statistically signifi-
cant at 95% confidence level.
These results show that the WIDL-based ap-
proach to machine translation is powerful enough
to achieve translation accuracy comparable with
state-of-the-art systems in machine translation.
6 Conclusions
The approach to sentence realization we advocate
in this paper relies on WIDL-expressions, a for-
mal language with convenient theoretical proper-
ties that can accommodate a wide range of gener-
ation scenarios. In the worst case, one can work
with simple bags of words that encode no context
1111
preferences (Soricut and Marcu, 2005). One can
also work with bags of words and phrases that en-
code context preferences, a scenario that applies to
current approaches in statistical machine transla-
tion (Section 5). And one can also encode context
and ordering preferences typically used in summa-
rization (Section 4).
The generation engine we describe enables
a tight coupling of content selection with sen-
tence realization preferences. Its algorithm comes
with theoretical guarantees about its optimality.
Because the requirements for producing WIDL-
expressions are minimal, our WIDL-based genera-
tion engine can be employed, with state-of-the-art
results, in a variety of text-to-text applications.
Acknowledgments This work was partially sup-
ported under the GALE program of the Defense
Advanced Research Projects Agency, Contract
No. HR0011-06-C-0022.
References
Srinivas Bangalore and Owen Rambow. 2000. Using
TAG, a tree model, and a language model for gen-
eration. In Proceedings of the Fifth International
Workshop on Tree-Adjoining Grammars (TAG+).
Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, and Clifford Stein. 2001. Introduction to
Algorithms. The MIT Press and McGraw-Hill.
Simon Corston-Oliver, Michael Gamon, Eric K. Ring-
ger, and Robert Moore. 2002. An overview of
Amalgam: A machine-learned generation module.
In Proceedings of the INLG.
Bonnie Dorr, David Zajic, and Richard Schwartz.
2003. Hedge trimmer: a parse-and-trim approach
to headline generation. In Proceedings of the HLT-
NAACL Text Summarization Workshop, pages 1–8.
Michael Elhadad. 1991. FUF User manual — version
5.0. Technical Report CUCS-038-91, Department
of Computer Science, Columbia University.
Ulrich Germann, Mike Jahr, Kevin Knight, Daniel
Marcu, and Kenji Yamada. 2003. Fast decoding and
optimal decoding for machine translation. Artificial
Intelligence, 154(1–2):127-143.
Nizar Habash. 2003. Matador: A large-scale Spanish-
English GHMT system. In Proceedings of AMTA.
J. Hajic, M. Cmejrek, B. Dorr, Y. Ding, J. Eisner,
D. Gildea, T. Koo, K. Parton, G. Penn, D. Radev,
and O. Rambow. 2002. Natural language genera-
tion in the context of machine translation. Summer
workshop final report, Johns Hopkins University.
K. Knight and V. Hatzivassiloglou. 1995. Two level,
many-path generation. In Proceedings of the ACL.
Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003.
Statistical phrase based translation. In Proceedings
of the HLT-NAACL, pages 127–133.
Philipp Koehn. 2004. Pharaoh: a beam search decoder
for phrase-based statistical machine transltion mod-
els. In Proceedings of the AMTA, pages 115–124.
I. Langkilde-Geary. 2002. A foundation for general-
purpose natural language generation: sentence re-
alization using probabilistic models of language.
Ph.D. thesis, University of Southern California.
Chin-Yew Lin. 2004. ROUGE: a package for auto-
matic evaluation of summaries. In Proceedings of
the Workshop on Text Summarization Branches Out
(WAS 2004).
Christian Matthiessen and John Bateman. 1991.
Text Generation and Systemic-Functional Linguis-
tic. Pinter Publishers, London.
Mehryar Mohri, Fernando Pereira, and Michael Ri-
ley. 2002. Weighted finite-state transducers in
speech recognition. Computer Speech and Lan-
guage, 16(1):69–88.
Mark-Jan Nederhof and Giorgio Satta. 2004. IDL-
expressions: a formalism for representing and pars-
ing finite languages in natural language processing.
Journal of Artificial Intelligence Research, pages
287–317.
Franz Josef Och. 2003. Minimum error rate training
in statistical machine translation. In Proceedings of
the ACL, pages 160–167.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
Jing Zhu. 2002. BLEU: a method for automatic
evaluation of machine translation. In In Proceedings
of the ACL, pages 311–318.
Stuart Russell and Peter Norvig. 1995. Artificial Intel-
ligence. A Modern Approach. Prentice Hall.
Radu Soricut and Daniel Marcu. 2005. Towards devel-
oping generation algorithms for text-to-text applica-
tions. In Proceedings of the ACL, pages 66–74.
Radu Soricut. 2006. Natural Language Generation for
Text-to-Text Applications Using an Information-Slim
Representation. Ph.D. thesis, University of South-
ern California.
David Zajic, Bonnie J. Dorr, and Richard Schwartz.
2004. BBN/UMD at DUC-2004: Topiary. In Pro-
ceedings of the NAACL Workshop on Document Un-
derstanding, pages 112–119.
Liang Zhou and Eduard Hovy. 2003. Headline sum-
marization at ISI. In Proceedings of the NAACL
Workshop on Document Understanding.
1112
