Time Mapping with Hypergraphs 
Jan W. Amtrup Volker Weber 
Computing Research Laboratory University of Hamburg, 
New Mexico State University Computer Science Department, 
Las Cruces, NM 88003,USA Vogt-K611n-Str. 30, D-22527 Hamburg, Germany 
email: jamtrup~crl.nmsu, edu email: weber@informatik.uni-hamburg.de 
Abstract 
Word graphs are able to represent a large num- 
ber of different utterance hypotheses in a very 
compact manner. However, usually they con- 
tain a huge amount of redundancy in terms 
of word hypotheses that cover almost identi- 
cal intervals in time. We address this problem 
by introducing hypergraphs for speech process- 
ing. Hypergraphs can be classified as an ex- 
tension to word graphs and charts, their edges 
possibly having several start and end vertices. 
By converting ordinary word graphs to hyper- 
graphs one can reduce the number of edges 
considerably. We define hypergraphs formally, 
present an algorithm to convert word graphs 
into hypergraphs and state consistency proper- 
ties for edges and their combination. Finally, we 
present some empirical results concerning graph 
size and parsing efficiency. 
1 Introduction 
The interface between a word recognizer and 
language processing modules is a crucial issue 
with modern speech processing systems. Given 
a sufficiently high word recognition rate, it suf- 
fices to transmit the most probable word se- 
quence from the recognizer to a subsequent 
module (e.g. a parser). A slight extension over 
this best chain mode would be to deliver n-best 
chains to improve language processing results. 
However, it is usually not enough to deliver 
just the best 10 or 20 utterances, at least not 
for reasonable sized applications given todays 
speech recognition technology. To overcome this 
problem, in most current systems word graphs 
are used as speech-language interface. Word 
graphs offer a simple and efficient means to rep- 
resent a very high number of utterance hypothe- 
ses in a extremely compact way (Oerder and 
Ney, 1993; Aubert and Ney, 1995). 
und (and) dann (then) 
Figure 1: Two families of edges in a word graph 
Although they are compact, the use of word 
graphs leads to problems by itself. One of them 
is the current lack of a reasonable measure for 
word graph size and evaluation of their contents 
(Amtrup et al., 1997). The problem we want to 
address in this paper is the presence of a large 
number of almost identical word hypotheses. By 
almost identical we mean that the start and end 
vertices of edges differ only slightly. 
Consider figure 1 as an example section of a 
word graph. There are several word hypothe- 
ses representing the words und (and) and dann 
(then). The start and end points of them differ 
by small numbers of frames, each of them 10ms 
long. The reasons for the existence of these fam- 
ilies of edges are at least twofold: 
* Standard HMM-based word recognizers try 
to start (and finish) word models at each 
individual frame. Since the resolution is 
quite high (10ms, in many cases shorter 
than the word onset), a word model may 
have boundaries at several points in time. 
• Natural speech (and in particular spon- 
taneously produced speech) tends to blur 
word boundaries. This effect is in part re- 
sponsible for the dramatic decrease in word 
recognition rate, given fluent speech as in- 
put in contrast to isolated words as in- 
put. Figure 1 demonstrates the inaccuracy 
55 
of word boundaries by containing several 
meeting points between und and dann, em- 
phasized by the fact that both words end 
resp. start with the same consonant. 
Thus, for most words, there is a whole set 
of word hypotheses in a word graph which re- 
sults in several meets between two sets of hy- 
potheses. Both facts are disadvantageous for 
speech processing: Many word edges result in 
a high number of lexical lookups and basic op- 
erations (e.g. bottom-up proposals of syntactic 
categories); many meeting points between edges 
result in a high number of possibly complex op- 
erations (like unifications in a parser). 
The most obvious way to reduce the number 
of neighboring, identically labeled edges is to 
reduce the time resolution provided by a word 
recognizer (Weber, 1992). If a word edge is 
to be processed, the start and end vertices are 
mapped to the more coarse grained points in 
time used by linguistic modules and a redun- 
dancy check is carried out in order to prevent 
multiple copies of edges. This can be easily 
done, but one has to face the drawback on in- 
troducing many more paths through the graph 
due to artificially constructed overlaps. Fur- 
thermore, it is not simple to choose a correct 
resolution, as the intervals effectively appearing 
with word onsets and offsets change consider- 
ably with words spoken. Also, the introduction 
of cycles has to be avoided. 
A more sophisticated schema would use in- 
terval graphs to encode word graphs. Edges of 
interval graphs do not have individual start and 
end vertices, but instead use intervals to denote 
the range of applicability of an edge. The major 
problem with interval graphs lies with the com- 
plexity of edge access methods. However, many 
formal statements shown below will use interval 
arithmetics, as the argument will be easier to 
follow. 
The approach we take in this paper is to use 
hypergraphs as representation medium for word 
graphs. What one wants is to carry out oper- 
ations only once and record the fact that there 
are several start and end points of words. Hy- 
pergraphs (Gondran and Minoux, 1984, p. 30) 
are generalizations of ordinary graphs that al- 
low multiple start and end vertices of edges. 
We extend the approach of H. Weber (Weber, 
1995) for time mapping. Weber considered sets 
of edges with identical start vertices but slightly 
different end vertices, for which the notion fam- 
ily was introduced. We use full hypergraphs as 
representation and thus additionally allow sev- 
eral start vertices, which results in a further de- 
crease of 6% in terms of resulting chart edges 
while parsing (cf. section 3). Figure 2 shows 
the example section using hyperedges for the 
two families of edges. We adopt the way of 
dealing with different acoustical scores of word 
hypotheses from Weber. 
(then) 
lOms 
Figure 2: Two hyperedges representing families 
of edges 
2 Word Graphs and Hypergraphs 
As described in the introduction, word graphs 
consist of edges representing word hypotheses 
generated by a word recognizer. The start and 
end point of edges usually denote points in time. 
Formally, a word graph is a directed, acyclic, 
weighted, labeled graph with distinct root and 
end vertices. It is a quadruple G = (V, g, YV,/:) 
with the following components: 
• A nonempty set of graph vertices Y -- 
{vl,...,Vn}. To associate vertices with 
points in time, we use a function t : 1) > 
N that returns the frame number for a 
given vertex. 
• A nonempty set of weighted, labeled, di- 
rected edges g = {el,...,em} C_ V x ~2 x 
14) × E. To access the components of an 
edge e = (v, v', w, l), we use functions a, 
~3, w and l, which return the start vertex 
(~(e) = v), the end vertex (/~(e) = v'), the 
weight (w(e) = w) and the label (l(e) = l) 
of an edge, respectively. 
• A nonempty set of edge weights ~ -- 
{wi,...,wp}. Edge weights normally rep- 
resent a the acoustic score assigned to the 
word hypothesis by a HMM based word 
recognizer. 
56 
• A nonempty set of Labels £ = {tl,... ,lo}, 
which represents information attached to 
an edge, usually words. 
We define the relation of teachability for ver- 
tices (--r) as Vv, w E V : v --+ w ~ 3e E $ : 
= v ^ = w 
The transitive hull of the reachability relation 
---r is denoted by -~. 
We already stated that a word graph is acyclic 
and distinctly rooted and ended. 
2.1 I-Iypergraphs 
Hypergraphs differ from graphs by allowing sev- 
eral start and end vertices for a single edge. In 
order to apply this property to word graphs, the 
definition of edges has to be changed. The set of 
edges C becomes a nonempty set of weighted, la- 
beled, directed hyperedges $ = {el,...,em} C_ 
V*\O x V*\O x W x £. 
Several notions and functions defined for ordi- 
nary word graphs have to be adapted to reflect 
edges having sets of start and end vertices. 
• The accessor functions for start and end 
vertices have to be adapted to return sets of 
vertices. Consider an edge e = (V, V', w,/), 
then we redefine 
a:c > := v (1) 
E > := v' (2) 
• Two hyperedges e, e' are adjacent, if they 
share a common vertex: 
fl(e) A a(e') # ~ (3) 
• The reachability relation is now Vv, w E )2 : 
v-+ w ~ 9e e $ : v e a(e) ^w e ~(e) 
Additionally, we define accessor functions for 
the first and last start and end vertex of an 
edge. We recur to the association of vertices 
with frame numbers, which is a slight simplifi- 
cation (in general, there is no need for a total 
ordering on the vertices in a word graph) 1. Fur- 
thermore, the intervals covered by start and end 
vertices are defined. 
a<(e) := argmin{t(v)lv E V} (4) 
1The total ordering on vertices is naturally given 
through the linearity of speech. 
a>Ce) := argmax{t(v)lv e V} (5) 
/3<(e) := argmin{t(v)lv e V'} (6) 
/3>(e) := argmax{t(v)lv E V'} (7) 
au(e ) := \[t(a<(e)),t(a>(e))\] (8) 
~D(e) := \[t(~<(e)),t(~>(e))\] (9) 
In contrast to interval graphs, we do not re- 
quire the sets of start and end vertices to be con- 
tiguous, i.e. there may be vertices that fall in 
the range of the start or end vertices of an edge 
which are not members of that set. If we are not 
interested in the individual members of a(e) or 
~(e), we merely talk about interval graphs. 
2.2 Edge Consistency 
Just like word graphs, we demand that hyper- 
graphs are acyclic, i.e. Vv -5, w : v # w. 
In terms of edges, this corresponds to Ve : 
t(a>Ce)) < 
2.3 Adding Edges to Hypergraphs 
Adding a simple word edge to a hypergraph is a 
simplification of merging two hyperedges bear- 
ing the same label into a new hyperedge. There- 
fore we are going to explain the more general 
case for hyperedge merging first. We analyze 
which edges of a hypergraph may be merged to 
form a new hyperedge without loss of linguistic 
information. This process has to follow three 
main principles: 
• Edge labels have to be identical 
• Edge weights (scores) have to be combined 
to a single value 
• Edges have to be compatible in their start 
and end vertices and must not introduce 
cycles to the resulting graph 
Simple Rule Set for Edge Merging 
Let ei, e2 E E be two hyperedges to be checked 
for merging, where el = (V1, VI', wt, 11) and e2 = 
(V2, V~, w2,/2). Then el and e2 could be merged 
into a new hyperedge e3 = (V3, V~, w3,/3) iff 
t(el) = t(e2) (10) 
min(t(/3< (el)), t(/3< (e2))) > 
max(t(a> (el)), t(a> (e2))) (11) 
where e3 is: 
13 = ll (=/2) (12) 
w3 = scorejoin(wi,w2) 2 (13) 
57 
V3 = VI UV2 (14) 
= Vl' u (15) 
el and e2 have to be removed from the hyper- 
graph while e3 has to be inserted. 
Sufficiency of the Rule-Set 
Why is this set of two conditions sufficient for 
hyperedge merging? First of all it is clear that 
we can merge only hyperedges with the same 
label (this is prescribed by condition 10). Con- 
dition 11 gives advice which hyperedges could 
be combined and prohibits cycles to be intro- 
duced in the hypergraph. An analysis of the 
occuring cases shows that this condition is rea- 
sonable. Without loss of generality, we assume 
that t(~>(el)) ~ t(;3>(e2)). 
1. aO(el)n 13D(e2 ) # 0 V a0(e2 ) n ~0(el) # 0: 
This is the case where either the start ver- 
tices of el and the end vertices of e2 or the 
start vertices of e2 and end vertices of el 
overlap each other. The merge of two such 
hyperedges of this case would result in a 
hyperedge e3 where t(a>(e3)) )> t(~< (e3)). 
This could introduce cycles to the hyper- 
graph. So this case is excluded by condi- 
tion 11. 
2. aD(el ) n ~B(e2) = O A a\[l(e2 ) ¢3 ;3\[\](el) = O: 
This is the complementary case to 1. 
(a) t(a<(e2)) >_ t(~>(el)) 
This is the case where all vertices of 
hyperedge el occur before all vertices 
of hyperedge e2 or in other words the 
case where two individual independent 
word hypotheses with same label oc- 
cur in the word graph. This case must 
also not result in an edge merge since 
~H(el) C_ \[t(a<(el)),t(a>(e2))\] in the 
merged edge. This merge is prohib- 
ited by condition 11 since all vertices 
of ~(el) have to be smaller than all 
vertices of a(e2). 
(b) t(a<(e2)) < t(~>(el)) 
This is the complementary case to (a). 
i. t(o~<(el)) ~ t(~>(e2)) 
This is only a theoretical case be- 
cause tC <(el)) < < 
< _< 
2Examples for the scorejoin operation are given later 
in the paragraph about score normalization. 
is required (e2 contains the last end 
vertex). 
ii. t(a<(el)) < t(;3>(e2)) 
This is the complementary case to 
i. As a result of the empty in- 
tersections and the cases (b) and 
ii we get t(c~>(el)) < t(~<(e2)) 
and t(oz>(e2)) < t(~<(el)). That 
is in other words Vta E a0(el ) U 
~n(e2),t~ e ;3U(el)u~D(e2) : t~ < 
t~ and just the case demanded by 
condition 2. 
After analyzing all cases of merging of inter- 
sections between start and end vertices of two 
hyperedges we turn to insertion of word hy- 
potheses to a hypergraph. Of course, a word 
hypothesis could be seen as interval edge with 
trivial intervals or as a hyperedge with only one 
start and one end vertex. Since this case of 
adding an edge to a hypergraph is rather easy 
to depict and is heavily used while parsing word 
graphs incrementally we discuss it in more de- 
tail. 
e g e I 
)( ~+e3 
C ) ( ) 
( ) ( ) 
~e5 
Figure 3: Cases for adding an edge to the graph 
The speech decoder we use delivers word hy- 
potheses incrementally and ordered by the time 
stamps of their end vertices. For practical rea- 
sons we further sort the start vertices with equal 
end vertex of a hypergraph by time. Under this 
precondition we get the cases shown in figure 3. 
The situation is such that eg is a hyperedge al- 
ready constructed and el -e5 are candidates for 
insertion. 
58 
function AddEdge(G:Hypergraph,en :Wordhypothesis) 
-~ HyperGraph 
begin 
\[1\] if 3e~ E £(G) with 
l(ek) = l(en)A t(~<(elc)) > t(a(en)) then (2\] 
Modify edge e& 
e t := (aCek) u (a(e.)}, 
~(ek) u (/~(e.)), max(t~(ek), 
normalize(w(en ), a (en), ~(en ))), l(e~)) 
return 
C' := (r U {a(en),~(en)), e I 
\[3\] else 
\[4\] Add edge e' n 
eln := ({c~(en)}, {~(en)}, 
normalize(w(en ), Q(en ), ~(en )), 
l(en)) 
return 
G' := (1) U {~(en),~(en)},E U {e~}, 
wu (w(e')}, L u (t(e'))) 
,and 
Figure 4: An algorithm for adding edges to hy- 
pergraphs 
of a matching hyperdege. In practice this is not 
needed and we could check a smaller amount of 
vertices. We do this by introducing a maximal 
time gap which gives us advice how far (in mea- 
sures of time) we look backwards from the start 
vertex of a new edge to be inserted into the hy- 
pergraph to determine a compatible hyperedge 
of the hypergraph. 
Additional Paths 
C 
C 
Figure 5: Additional paths by time mapping 
It is not possible to add el and e2 to the hy- 
peredge eg since they would introduce an over- 
lap between the sets of start and end vertices 
of the potential new hyperedge. The resulting 
hyperedges of adding e3 -- e5 are depicted below. 
Score Normalization 
Score normalization is a necessary means if 
one wants to compare hypotheses of different 
lengths. Thus, edges in word graphs are as- 
signed normalized scores that account for words 
of different extensions in time. The usual mea- 
sure is the score per .frame, which is computed 
Score per word 
by taking Length of word in frames" 
When combining several word edges as we 
do by constructing hyperedges, the combina- 
tion should be assigned a single value that re- 
flects a certain useful aspect of the originating 
edges. In order not to exclude certain hypothe- 
ses from consideration in score-driven language 
processing modules, the score of the hyperedge 
is inherited from the best-rated word hypothe- 
sis (cf. (Weber, 1995)). We use the minimum of 
the source acoustic scores, which corresponds to 
a highest recognition probability. 
Introducing a Maximal Time Gap 
The algorithm depicted in figure 4 can be 
speeded up for practical reasons. Each vertex 
between the graph root and the start vertex of 
the new edge could be one of the start vertices 
It is possible to introduce additional paths 
into a graph by performing time mapping. Con- 
sider fig. 5 as an example. Taken as a normal 
word graph, it contains two label sequences, 
namely a-c-d and b-c-e. However, if time 
mapping is performed for the edges labelled c, 
two additional sequences are introduced: a-c-e 
and b-c-d. Thus, time mapping by hyper- 
graphs is not information preserving in a strong 
sense. For practical applications this does not 
present any problems. The situations in which 
additional label sequences are introduced are 
quite rare, we did not observe any linguistic dif- 
ference in our experiments. 
2.4 Edge Combination 
Besides merging the combination of hyperedges, 
to construct edges with new content is an impor- 
tant task within any speech processing module, 
e.g. for parsing. The assumption we will adopt 
here is that two hyperedges el, e2 E £ may be 
combined if they are adjacent, i.e. they share a 
common vertex: /~(el) N ol(e2) ~ 0. The label 
of the new edge en (which may be used to rep- 
resent linguistic content) is determined by the 
component building it, whereas the start and 
end vertices are determined by 
Qc(en) c (el) (16) 
J (en) :=  (e2) (17) 
59 
This approach is quite analogous to edge com- 
bination methods for normal graphs, e.g. in 
chart parsing, where two edges are equally re- 
quired to have a meeting point. However, score 
computation for hyperedge combination is more 
difficult. The goal is to determine a score per 
frame by selecting the smallest possible score 
under all possible meeting vertices. It is de- 
rived by examining all possible connecting ver- 
tices (all elements of I := f~(el) CI a(e2)) and 
computing the resulting score of the new edge: 
If w(el) < w(e2), we use w(e,~) 
:= WC~l)(t< -t(~> (~1)))+~(~2).(t(~> (e2))-t<) tC~>(e2))-t(~>(~l)) 
where t< = min{t(v)lv e I}. If, on the other 
hand, w(el) > w(e2), we use 
w(el).(t>-t(~<(~l)))+~C~2).(t(~<(~2))-t>) w(en) := t(~< (e:))-t(~< (~1)) ' 
where t> = max{t(v)\[v e I}. 
3 Experiments with Hypergraphs 
The method of converting word graphs to hy- 
pergraphs has been used in two experiments so 
far. One of them is devoted to the study of 
connectionist unification in speech applications 
(Weber, forthcoming). The other one, from 
which the performance figures in this section 
are drawn, is an experimental speech transla- 
tion system focusing on incremental operation 
and uniform representation (Amtrup, 1997). 
1000 
0 .................................................. 
# Edges 
graphs produced by the Hamburg speech recog- 
nition system (Huebener et al., 1996). The 
test data consisted of one dialogue within the 
Verbmobil domain. There were 41 turns with 
an average length of 4.65s speaking time per 
turn. The word graphs contained 1828 edges 
on the average. Figure 6 shows the amount of 
reduction in the number of edges by converting 
the graphs into hypergraphs. On the average, 
1671 edges were removed (mapped), leaving 157 
edges in hypergraphs, approximately 91% less 
than the original word graphs. 
Next, we used both sets of graphs (the orig- 
inal word graphs and hypergraphs) as input 
to the speech parser used in (Amtrup, 1997). 
This parser is an incremental active chart parser 
which uses a typed feature formalism to describe 
linguistic entities. The grammar is focussed on 
partial parsing and contains rules mainly for 
noun phrases, prepositional phrases and such. 
The integration of complete utterances is ne- 
glected. Figure 7 shows the reduction in terms 
of chart edges at completion time. 
Figure 6: Word edge reduction 
We want to show the effect of hypergraphs re- 
garding edge reduction and parsing effort. In or- 
der to provide real-world figures, we used word 
lO000 - 
IO00- 
-4- Maximum gaps 100 
# Edges 
Figure 7: Chart edge reduction 
The amount of reduction concerning parsing 
effort is much less impressive than pure edge 
reduction. On the average, parsing of complete 
graphs resulted in 15547 chart edges, while pars- 
ing of hypergraphs produced 3316 chart edges, 
a reduction of about 79%. Due to edge combi- 
nations, one could have expected a much higher 
value. The reason for this fact lies mainly with 
the redundancy test used in the parser. There 
60 
are many instances of edges which are not in- 
serted into the chart at all, because identical 
hypotheses are already present. 
Consequently, the amount of reduction in 
parse time is within the same bounds. Pars- 
ing ordinary graphs took 87.7s, parsing of hy- 
pergraphs 6.4s, a reduction of 93%. There are 
some extreme cases of word graphs, where hy- 
pergraph parsing was 94 times faster than word 
graph parsing. One of the turns had to be ex- 
cluded from the test set, because it could not 
be fully parsed as word graph. 
I000 
100 
1~ 1.617 
--II- With mapping 
0.10 
o.ol I o 4&o 
# Edges 
Figure 8: Parsing time reduction 
4 Conclusion 
In this paper, we have proposed the application 
of hypergraph techniques to word graph pars- 
ing. Motivated by linguistic properties of spon- 
taneously spoken speech, we argued that bun- 
dles of edges in word graphs should be treated in 
an integrated manner. We introduced interval 
graphs and directed hypergraphs as representa- 
tion devices. Directed hypergraphs extend the 
notion of a family of edges in that they are able 
to represent edges having several start and end 
vertices. 
We gave a formal definition of word graphs 
and the necessary extensions to cover hyper- 
graphs. The conditions that have to be fulfilled 
in order to merge two hyperedges and to com- 
bine two adjacent hyperedges were stated in a 
formal way; an algorithm to integrate a word 
hypothesis into a hypergraph was presented. 
We proved the applicability of our mecha- 
nisms by parsing one dialogue of real-world spo- 
ken utterances. Using hypergraphs resulted in a 
91% reduction of initial edges in the graph and 
a 79% reduction in the total number of chart 
edges. Parsing hypergraphs instead of ordinary 
word graphs reduced the parsing time by 93%. 

References 
Jan W. Amtrup, Henrik Heine, and Uwe Jost. 
1997. What's in a Word Graph -- Evaluation 
and Enhancement of Word Lattices. In Proc. 
of Eurospeech 1997, Rhodes, Greece, Septem- 
ber. 
Jan W. Amtrup. 1997. Layered Charts for 
Speech Translation. In Proceedings of the 
Seventh International Conference on Theo- 
retical and Methodological Issues in Machine 
Translation, TMI '97, Santa Fe, NM, July. 
Xavier Aubert and Hermann Ney. 1995. Large 
Vocabulary Continuous Speech Recognition 
Using Word Graphs. In ICASSP 95. 
Michel Gondran and Michel Minoux. 1984. 
Graphs and algorithms. Wiley-Interscience 
Series in Discrete Mathematics. John Wiley 
& Sons, Chichester. 
Kai Huebener, Uwe Jost, and Henrik Heine. 
1996. Speech Recognition for Spontaneously 
Spoken German Dialogs. In ICSLP96, 
Philadelphia. 
Martin Oerder and Hermann Ney. 1993. 
Word Graphs: An Efficient Interface Be- 
tween Continuous-Speech Recognition and 
Language Understanding. In Proceedings 
of the 1993 IEEE International Conference 
on Acoustics, Speech ~ Signal Processing, 
ICASSP, pages II/119-II/122, Minneapolis, 
MN. 
Hans Weber. 1992. Chartparsing in ASL-Nord: 
Berichte zu den Arbeitspaketen P1 bis P9. 
Technical Report ASL-TR-28-92/UER, Uni- 
versit/it Erlangen-Niirnberg, Erlangen, De- 
cember. 
Hans Weber. 1995. LR-inkrementelles, Prob- 
abilistisches Chartparsing yon Worthypothe- 
sengraphen mit Unifikationsgrammatiken: 
Eine Enge Kopplung yon Suche und Analyse. 
Ph.D. thesis, Universit/it Hamburg. 
Volker Weber. forthcoming. Funktionales Kon- 
nektionistisches Unifikationsbasiertes Pars- 
ing. Ph.D. thesis, Univ. Hamburg. 
