Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in NLP, pages 24–31,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Using Semantic and Syntactic Graphs for Call Classi cation
Dilek Hakkani-Tcurrency1ur Gokhan Tur
AT&T Labs  Research
Florham Park, NJ, 07932
a0 dtur,gtur
a1 @research.att.com
Ananlada Chotimongkol
Carnegie Mellon University
Pittsburgh, PA 15213
ananlada@cs.cmu.edu
Abstract
In this paper, we introduce a new data
representation format for language pro-
cessing, the syntactic and semantic graphs
(SSGs), and show its use for call classi -
cation in spoken dialog systems. For each
sentence or utterance, these graphs in-
clude lexical information (words), syntac-
tic information (such as the part of speech
tags of the words and the syntactic parse of
the utterance), and semantic information
(such as the named entities and seman-
tic role labels). In our experiments, we
used written language as the training data
while computing SSGs and tested on spo-
ken language. In spite of this mismatch,
we have shown that this is a very promis-
ing approach for classifying complex ex-
amples, and by using SSGs it is possible
to reduce the call classi cation error rate
by 4.74% relative.
1 Introduction
Goal-oriented spoken dialog systems aim to iden-
tify intents of humans, expressed in natural lan-
guage, and take actions accordingly to satisfy their
requests. The intent of each speaker is identi ed
using a natural language understanding component.
This step can be seen as a multi-label, multi-class
call classi cation problem for customer care appli-
cations (Gorin et al., 1997; Chu-Carroll and Carpen-
ter, 1999; Gupta et al., To appear, among others).
As an example, consider the utterance I would like
to know my account balance, from a  nancial do-
main customer care application. Assuming that the
utterance is recognized correctly by the automatic
speech recognizer (ASR), the corresponding intent
(call-type) would be Request(Balance) and the ac-
tion would be telling the balance to the user after
prompting for the account number or routing this
call to the billing department.
Typically these application speci c call-types are
pre-designed and large amounts of utterances man-
ually labeled with call-types are used for training
call classi cation systems. For classi cation, gen-
erally word a2 -grams are used as features: In the
How May I Help You?a3a5a4 (HMIHY) call routing sys-
tem, selected word a2 -grams, namely salient phrases,
which are salient to certain call-types play an im-
portant role (Gorin et al., 1997). For instance, for
the above example, the salient phrase  account bal-
ance is strongly associated with the call-type Re-
quest(Balance). Instead of using salient phrases, one
can leave the decision of determining useful features
(word a2 -grams) to a classi cation algorithm used as
described in (Di Fabbrizio et al., 2002) and (Gupta
et al., To appear). An alternative would be using
a vector space model for classi cation where call-
types and utterances are represented as vectors in-
cluding word a2 -grams (Chu-Carroll and Carpenter,
1999).
Call classi cation is similar to text categorization,
except the following:
a6 The utterances are much shorter than typical
documents used for text categorization (such as
broadcast news or newspaper articles).
24
0 1<bos> 2WORD:I 3WORD:paid 4WORD:six 5WORD:dollars 6<eos>
Figure 1: An example utterance represented as a single path FSM.
a6 Since it deals with spontaneous speech, the ut-
terances frequently include dis uencies or are
ungrammatical, and
a6 ASR output is very noisy, typically one out of
every four words is misrecognized (Riccardi
and Hakkani-Tcurrency1ur, 2003).
Even though the shortness of the utterances may
imply the easiness of the call classi cation task, un-
fortunately this is not the case. The call classi -
cation error rates typically range between 15% to
30% depending on the application (Gupta et al., To
appear). This is mainly due to the data sparseness
problem because of the nature of the input. Even for
simple call-types like Request(Balance), there are
many ways of uttering the same intent. For instance,
in one of the applications we used in our experi-
ments, as a response to the greeting prompt, there
are 2,697 unique utterances out of 3,547 utterances
for that call-type. Some examples include:
a6 I would like to know my account balance
a6 How much do I owe you
a6 How much is my bill
a6 What is my current bill
a6 account balance
a6 You can help me by telling me what my phone
bill is
a6 ...
Given this data sparseness, current classi cation ap-
proaches require an extensive amount of labeled data
in order to train a call classi cation system with a
reasonable performance. In this paper, we present
methods for extending the classi er’s feature set by
generalizing word sequences using syntactic and se-
mantic information represented in compact graphs,
called syntactic and semantic graphs (SSGs). For
each sentence or utterance, these graphs include
lexical information (words), syntactic information
(such as the part of speech tags of the words and the
syntactic parse of the utterance), and semantic in-
formation (such as the named entities and semantic
role labels). The generalization is expected to help
reduce the data sparseness problem by applying var-
ious groupings on word sequences. Furthermore, the
classi er is provided with additional syntactic and
semantic information which might be useful for the
call classi cation task.
In the following section, we describe the syntac-
tic and semantic graphs. In Section 3, we describe
our approach for call classi cation using SSGs. In
Section 4, we present the computation of syntactic
and semantic information for SSGs. In the last Sec-
tion, we present our experiments and results using
a spoken dialog system AT&T VoiceTone Ra7 Spoken
Dialog System (Gupta et al., To appear).
2 Semantic and Syntactic Graphs
Consider the typical case, where only lexical infor-
mation, i.e. word a2 -grams are used for call classi -
cation. This is equivalent to representing the words
in an utterance as a directed acyclic graph where
the words are the labels of the transitions and then
extracting the transition a2 -grams from it. Figure 1
shows the graph for the example sentence I paid six
dollars, where a8 bosa9 and a8 eosa9 denote the begin-
ning and end of the sentence, respectively.
Syntactic and semantic graphs are also directed
acyclic graphs, formed by adding transitions encod-
ing syntactic and semantic categories of words or
word sequences to the word graph. The  rst addi-
tional information is the part of speech tags of the
words. In the graph, as a parallel transition for each
word of the utterance, the part of speech category
of the word is added, as shown in Figure 2 for the
example sentence. Note that, the word is pre xed
by the token WORD: and the part-of-speech tag is
pre xed by the token POS:, in order to distinguish
between different types of transitions in the graph.
The other type of information that is encoded in
these graphs is the syntactic parse of each utterance,
namely the syntactic phrases with their head words.
For example in the sentence I paid six dollars, six
dollars is a noun phrase with the head word dollars.
In Figure 2, the labels of the transitions for syntactic
phrases are pre xed by the token PHRASE:. There-
25
0 1<bos>
2
POS:PRP
WORD:I
SRL:pay.A0
PHRASE:NP_I
5
PHRASE:S_paid
PHRASE:VP_paid
3
POS:VBD
WORD:paid
SRL:pay.V
6<eos>
NE:m
SRL:pay.A1
PHRASE:NP_dollars
4
POS:CD
WORD:six
POS:NNS
WORD:dollars
Figure 2: The SSG for the utterance I paid six dollars, where words (WORD:), part-of-speech tags (POS:),
syntactic parse (PHRASE:), named entities (NE:) and semantic roles (SRL:) are included.
S:S0 321
S:e
e:e
S:e
e:e
S:SS:S
Figure 3: The FST used to extract unigram, bigram and trigrams. a10 represents the alphabet, a11 represents the
epsilon transition.
fore, six dollars is also represented by the transition
labeled PHRASE:NP dollars. As an alternative, one
may drop the head word of the phrase from the rep-
resentation, or insert an epsilon transition parallel to
the transitions of the modi ers of the head word to
eliminate them from some a2 -grams.
Generic named entity tags, such as person, lo-
cation and organization names and task-dependent
named entity tags, such as drug names in a medical
domain, are also incorporated into the graph, where
applicable. For instance, for the example sentence,
six dollars is a monetary amount, so the arc NE:m is
inserted parallel to that sequence.
As another source of semantic information, se-
mantic role labels of the utterance components are
incorporated to the SSGs. The semantic role labels
represent the predicate/argument structure of each
sentence: Given a predicate, the goal is to identify
all of its arguments and their semantic roles. For
example, in the example sentence the predicate is
pay, the agent of this predicate is I and the amount
is six dollars. In the graph, the labels of the tran-
sitions for semantic roles are pre xed by the token
SRL: and the corresponding predicate. For exam-
ple, the sequence six dollars is the amount of the
predicate pay, and this is shown by the transition
with label SRL:pay.A1 following the PropBank no-
tation (Kingsbury et al., 2002)1.
In this work, we were only able to incorporate
part-of-speech tags, syntactic parses, named entity
tags and semantic role labels in the syntactic and se-
mantic graphs. Insertion of further information such
as supertags (Bangalore and Joshi, 1999) or word
stems can also be bene cial for further processing.
3 Using SSGs for Call Classi cation
In this paper we propose extracting all a2 -grams from
the SSGs to use them for call classi cation. The a2 -
grams in an utterance SSG can be extracted by con-
verting it to a  nite state transducer (FST), a12a14a13 . Each
transition of a12a15a13 has the labels of the arcs on the SSG
as input and output symbols2. Composing this FST
with another FST, a12a17a16 , representing all the possible
a2 -grams, forms the FST, a12a15a18 , which includes all a2 -
grams in the SSG:
a12a19a18a21a20a22a12 a13a24a23 a12 a16
1A1 or Arg1 indicates the object of the predicate, in this case
the amount.
2Instead of the standard notation where  : is used to sepa-
rate the input and output symbols in  nite state transducers, we
use  : to separate the type of the token and its value.
26
Then, extracting the a2 -grams in the SSG is equiva-
lent to enumerating all paths of a12 a18 . For a2a25a20a27a26 , a12a28a16
is shown in Figure 3. The alphabet a10 contains all
the symbols in a12a15a13 .
We expect the SSGs to help call classi cation be-
cause of the following reasons:
a6 First of all, the additional information is ex-
pected to provide some generalization, by al-
lowing new a2 -grams to be encoded in the utter-
ance graph since SSGs provide syntactic and
semantic groupings. For example, the words
a and the both have the part-of-speech tag
category DT (determiner), or all the numbers
are mapped to a cardinal number (CD), like
the six in the example sentence. So the bi-
grams WORD:six WORD:dollars and POS:CD
WORD:dollars will both be in the SSG. Simi-
larly the sentences I paid six dollars and I paid
seventy  ve dollars and sixty  ve cents will both
have the trigram WORD:I WORD:paid NE:m in
their SSGs.
a6 The head words of the syntactic phrases and
predicate of the arguments are included in the
SSGs. This enables the classi er to handle long
distance dependencies better than using other
simpler methods, such as extracting all gappy
a2 -grams. For example, consider the following
two utterances: I need a copy of my bill and
I need a copy of a past due bill. As shown
in Figures 4 and 5, the a2 -gram WORD:copy
WORD:of PHRASE:NP bill appears for both
utterances, since both subsequences my bill and
a past due bill are nothing but noun phrases
with the head word bill.
a6 Another motivation is that, when using simply
the word a2 -grams in an utterance, the classi-
 er is only given lexical information. Now the
classi er is provided with more and different
information using these extra syntactic and se-
mantic features. For example, a named entity
of type monetary amount may be strongly as-
sociated with some call-type.
a6 Furthermore, there is a close relationship be-
tween the call-types and semantic roles. For
example, if the predicate is order this is most
probably the call-type Order(Item) in a retail
domain application. The simple a2 -gram ap-
proach would consider all the appearances of
the unigram order as equal. However consider
the utterance I’d like to check an order of a dif-
ferent call-type, where the order is not a pred-
icate but an object. Word a2 -gram features will
fail to capture this distinction.
Once the SSG of an utterance is formed, all the
a2 -grams are extracted as features, and the decision
of which one to select/use is left to the classi er.
4 Computation of the SSGs
In this section, the tools used to compute the in-
formation in SSGs are described and their perfor-
mances on manually transcribed spoken dialog ut-
terances are presented. All of these components may
be improved independently, for the speci c applica-
tion domain.
4.1 Part-of-Speech Tagger
Part-of-speech tagging has been very well studied
in the literature for many languages, and the ap-
proaches vary from rule-based to HMM-based and
classi er-based (Church, 1988; Brill, 1995, among
others) tagging. In our framework, we employ a
simple HMM-based tagger, where the most prob-
able tag sequence, a29a30 , given the words, a31 , is out-
put (Weischedel et al., 1993):
a29
a30
a20a22a32a34a33a36a35a38a37a39a32a41a40
a42 a43a45a44
a30a47a46
a31a49a48a17a20a22a32a34a33a50a35a38a37a39a32a41a40
a42 a43a45a44
a31
a46a30
a48
a43a51a44
a30
a48
Since we do not have enough data which is manually
tagged with part-of-speech tags for our applications,
we used Penn Treebank (Marcus et al., 1994) as our
training set. Penn Treebank includes data from Wall
Street Journal, Brown, ATIS, and Switchboard cor-
pora. The  nal two sets are the most useful for our
domain, since they are also from spoken language
and include dis uencies. As a test set, we manu-
ally labeled 2,000 words of user utterances from an
AT&T VoiceTone spoken dialog system application,
and we achieved an accuracy of 94.95% on manu-
ally transcribed utterances. When we examined the
errors, we have seen that the frequent word please
is mis-labeled or frequently occurs as a verb in the
training data, even when it is not. Given that the lat-
est literature on POS tagging using Penn Treebank
reports an accuracy of around 97% with in-domain
27
0 1<bos>
2
POS:PRP
WORD:I
SRL:need.A0
PHRASE:NP_I
8
PHRASE:S_need
PHRASE:VP_need
3
POS:VBP
WORD:need
SRL:need.V 9<eos>
SRL:need.A1
PHRASE:NP-A_copy
4
POS:DT
WORD:a
5PHRASE:NP_copy
POS:NN
WORD:copy
PHRASE:PP_of
6
POS:IN
WORD:of
PHRASE:NP_bill
7
POS:PRP$
WORD:my
POS:NN
WORD:bill
Figure 4: An example SSG for the utterance I need a copy of my bill.
0 1<bos>
2
POS:PRP
WORD:I
SRL:need.A0
PHRASE:NP_I
10
PHRASE:S_need
PHRASE:VP_need
3
POS:VBP
WORD:need
SRL:need.V 11<eos>
SRL:need.A1
PHRASE:NP-A_copy
4
POS:DT
WORD:a
5PHRASE:NP_copy
POS:NN
WORD:copy
PHRASE:PP_of
6
POS:IN
WORD:of
PHRASE:NP_bill
7
POS:DT
WORD:a 8
POS:JJ
WORD:past
9POS:JJ
WORD:due
POS:NN
WORD:bill
Figure 5: An example SSG for the utterance I need a copy of a past due bill.
training data (van Halteren et al., 2001), we achieve
a very reasonable performance, considering these er-
rors.
4.2 Syntactic Parser
For syntactic parsing, we use the Collins’
parser (Collins, 1999), which is reported to
give over 88% labeled recall and precision on
Wall Street Journal portion of the Penn Treebank.
We use Buchholz’s chunklink script to extract
information from the parse trees3. Since we do not
have any data from our domain, we do not have a
performance  gure for this task for our domain.
4.3 Named Entity Extractor
For named entity extraction, we tried using a sim-
ple HMM-based approach, a simpli ed version of
BBN’s name  nder (Bikel et al., 1999), and a
classi er-based tagger using Boostexter (Schapire
and Singer, 2000). In the simple HMM-based ap-
proach, which is the same as the part-of-speech tag-
ging, the goal is to  nd the tag sequence, a29a30 , which
maximizes
a43a45a44
a30a47a46
a31a49a48 for the word sequence, a31 . The
tags in this case are named entity categories (such
as P and p for Person names, O and o for Orga-
nization names, etc. where upper-case indicates
the  rst word in the named entity) or NA if the
word is not a part of a named entity. In the sim-
pli ed version of BBN’s name  nder, the states of
3http://ilk.kub.nl/
a52 sabine/chunklink/chunklink 2-2-
2000 for conll.pl
the model were word/tag combinations, where the
tag a53a54a13 for word a55a56a13 is the named entity category of
each word. Transition probabilities consisted of tri-
gram probabilities
a43a45a44
a55a57a13a59a58a60a53a54a13
a46
a55a61a13a63a62a65a64a66a58a60a53a67a13a63a62a65a64a69a68a70a55a61a13a63a62a24a71a72a58a60a53a67a13a63a62a24a71a72a48
over these combined tokens. In the  nal version,
we extended this model with an unknown words
model (Hakkani-Tcurrency1ur et al., 1999). In the classi er-
based approach, we used simple features such as the
current word and surrounding 4 words, binary tags
indicating if the word considered contains any dig-
its or is formed from digits, and features checking
capitalization (Carreras et al., 2003).
To test these approaches, we have used data from
an AT&T VoiceTone spoken dialog system applica-
tion for a pharmaceutical domain, where some of
the named entity categories were person, organiza-
tion, drug name, prescription number, and date. The
training and test sets contained around 11,000 and
5,000 utterances, respectively. Table 1 summarizes
the overall F-measure results as well as F-measure
for the most frequent named entity categories. Over-
all, the classi er based approach resulted in the best
performance, so it is also used for the call classi ca-
tion experiments.
4.4 Semantic Role Labeling
The goal of semantic role labeling is to extract all
the constituents which  ll a semantic role of a tar-
get verb. Typical semantic arguments include Agent,
Patient, Instrument, etc. and also adjuncts such as
Locative, Temporal, Manner, Cause, etc. In this
28
Category Count HMM IF Boostexter
Org. 132 62.0 73.8 70.9
Person 150 45.0 62.4 54.4
Date 178 51.4 61.9 72.0
Drug 220 65.7 62.3 63.1
Overall 836 54.5 56.8 64.0
Table 1: F-Measure results for named entity extrac-
tion with various approaches. HMM is the sim-
ple HMM-based approach, IF is the simpli ed ver-
sion of BBN’s name  nder with an unknown words
model.
work, we use the semantic roles and annotations
from the PropBank corpus (Kingsbury et al., 2002),
where the arguments are given mnemonic names,
such as Arg0, Arg1, Arg-LOC, etc. For example,
for the sentence I have bought myself a blue jacket
from your summer catalog for twenty  ve dollars last
week, the agent (buyer, or Arg0) is I, the predicate
is buy, the thing bought (Arg1) is a blue jacket, the
seller or source (Arg2) is from your summer catalog,
the price paid (Arg3) is twenty  ve dollars, the bene-
factive (Arg4) is myself, and the date (ArgM-TMP)
is last week4.
Semantic role labeling can be viewed as a multi-
class classi cation problem. Given a word (or
phrase) and its features, the goal is to output the
most probable semantic label. For semantic role la-
beling, we have used the exact same feature set that
Hacioglu et al. (2004) have used, since their sys-
tem performed the best among others in the CoNLL-
2004 shared task (Carreras and M arquez, 2004).
We have used Boostexter (Schapire and Singer,
2000) as the classi er. The features include token-
level features (such as the current (head) word, its
part-of-speech tag, base phrase type and position,
etc.), predicate-level features (such as the predicate’s
lemma, frequency, part-of-speech tag, etc.) and
argument-level features which capture the relation-
ship between the token (head word/phrase) and the
predicate (such as the syntactic path between the to-
ken and the predicate, their distance, token position
relative to the predicate, etc.).
In order to evaluate the performance of semantic
role labeling, we have manually annotated 285 utter-
ances from an AT&T VoiceTone spoken dialog sys-
4See http://www.cis.upenn.edu/
a52 dgildea/Verbs for more
details
tem application for a retail domain. The utterances
include 645 predicates (2.3 predicates/utterance).
First we have computed recall and precision rates for
evaluating the predicate identi cation performance.
The precision is found to be 93.04% and recall is
91.16%. More than 90% of false alarms for pred-
icate extraction are due to the word please, which
is very frequent in customer care domain and erro-
neously tagged as explained above. Most of the false
rejections are due to dis uencies and ungrammatical
utterances. For example in the utterance I’d like to
order place an order, the predicate place is tagged
as a noun erroneously, probably because of the pre-
ceding verb order. Then we have evaluated the argu-
ment labeling performance. We have used a stricter
measure than the CoNLL-2004 shared task. The la-
beling is correct if both the boundary and the role of
all the arguments of a predicate are correct. In our
test set, we have found out that our SRL tool cor-
rectly tags all arguments of 57.4% of the predicates.
5 Call Classi cation Experiments and
Results
In order to evaluate our approach, we carried out call
classi cation experiments using human-machine di-
alogs collected by the spoken dialog system used
for customer care. We have only considered utter-
ances which are responses to the greeting prompt
How may I help you? in order not to deal with con r-
mation and clari cation utterances. We  rst describe
this data, and then give the results obtained by the se-
mantic classi er. We have performed our tests using
the Boostexter tool, an implementation of the Boost-
ing algorithm, which iteratively selects the most dis-
criminative features for a given task (Schapire and
Singer, 2000).
5.1 Data
Table 2 summarizes the characteristics of our appli-
cation including the amount of training and test data,
total number of call-types, average utterance length,
and call-type perplexity. Perplexity is computed us-
ing the prior distribution over all the call-types in the
training data.
5.2 Results
For call classi cation, we have generated SSGs for
the training and test set utterances using the tools
29
Training Data Size 3,725 utterances
Test Data Size 1,954 utterances
Number of Call-Types 79
Call-Type Perplexity 28.86
Average Utterance Length 12.08 words
Table 2: Characteristics of the data used in the ex-
periments.
Baseline Using SSG Increase
Unigram 2,303 6,875 2.99 times
Bigram 15,621 112,653 7.21 times
Trigram 34,185 705,673 20.64 times
Total 52,109 825,201 15.84 times
Table 3: A comparison of number of features.
described above. When a2 -grams are extracted from
these SSGs, instead of the word graphs (Baseline),
there is a huge increase in the number of features
given to the classi er, as seen in Table 3. The clas-
si er has now 15 times more features to work with.
Although one can apply a feature selection approach
before classi cation as frequently done in the ma-
chine learning community, we left the burden of an-
alyzing 825,201 features to the classi er.
Table 4 presents the percentage of the features se-
lected by Boostexter using SSGs for each informa-
tion category. As expected the lexical information is
the most frequently used, and 54.06% of the selected
features have at least one word in its a2 -gram. The to-
tal is more than 100%, since some features contain
more than one category, as in the bigram feature ex-
ample: POS:DT WORD:bill. This shows the use of
other information sources as well as words.
Table 5 presents our results for call classi cation.
As the evaluation metric, we use the top class error
rate (TCER), which is the ratio of utterances, where
the top scoring call-type is not one of the true call-
types assigned to each utterance by the human la-
belers. The baseline TCER on the test set using only
word a2 -grams is 23.80%. When we extract features
from the SSGs, we see a 2.14% relative decrease in
the error rate down to 23.29%. When we analyze
these results, we have seen that:
a6 For  easy to classify utterances, the classi er
already assigns a high score to the true call-type
Category Frequency
Lexical Words 54.06%
Syntactic Part-of-Speech 49.98%
Syntactic Parse 27.10%
Semantic Named Entity 1.70%
Semantic Role Label 11.74%
Table 4: The percentage of the features selected by
the classi er for each information category
Baseline SSGs Decrease
All utterances 23.80% 23.29% 2.14%
Low con dence
utterances 68.77% 62.16% 9.61%
All utterances
(Cascaded) 23.80% 22.67% 4.74%
Table 5: Call classi cation error rates using words
and SSGs.
using just word a2 -grams.
a6 The syntactic and semantic features extracted
from the SSGs are not 100% accurate, as pre-
sented earlier. So, although many of these fea-
tures have been useful, there is certain amount
of noise introduced in the call classi cation
training data.
a6 The particular classi er we use, namely Boost-
ing, is known to handle large feature spaces
poorer than some others, such as SVMs. This
is especially important with 15 times more fea-
tures.
Due to this analysis, we have focused on a sub-
set of utterances, namely utterances with low con -
dence scores, i.e. cases where the score given to the
top scoring call-type by the baseline model is be-
low a certain threshold. In this subset we had 333
utterances, which is about 17% of the test set. As
expected the error rates are much higher than the
overall and we get much larger improvement in per-
formance when we use SSGs. The baseline for this
set is 68.77%, and using extra features, this reduces
to 62.16% which is a 9.61% relative reduction in the
error rate.
This  nal experiment suggests a cascaded ap-
proach for exploiting SSGs for call classi cation.
30
That is,  rst the baseline word a2 -gram based clas-
si er is used to classify all the utterances, then if
this model fails to commit on a call-type, we per-
form extra feature extraction using SSGs, and use
the classi cation model trained with SSGs. This cas-
caded approach reduced the overall error rate of all
utterances from 23.80% to 22.67%, which is 4.74%
relative reduction in error rate.
6 Conclusions
In this paper, we have introduced syntactic and se-
mantic graphs (SSGs) for speech and language pro-
cessing. We have described their use for the task of
call classi cation. We have presented results show-
ing 4.74% improvement, using utterances collected
from AT&T VoiceTone spoken dialog system. SSGs
can also be useful for text classi cation and other
similar language processing applications. Our fu-
ture work includes feature selection prior to classi -
cation and developing methods that are more robust
to ASR errors while computing the SSGs. We also
plan to improve the syntactic and semantic process-
ing components by adapting the models with some
amount of labeled in-domain spoken dialog data.

References
Srinivas Bangalore and Aravind K. Joshi. 1999. Su-
pertagging: An approach to almost parsing. Compu-
tational Linguistics, 25(2), June.
Daniel M. Bikel, Richard Schwartz, and Ralph M.
Weischedel. 1999. An algorithm that learns what’s
in a name. Machine Learning Journal Special Issue
on Natural Language Learning, 34(1-3):211 231.
Eric Brill. 1995. Transformation-based error-driven
learning and natural language processing: A case
study in part of speech tagging. Computational Lin-
guistics, 21(4):543 565, December.
Xavier Carreras and Llu· s M arquez. 2004. Introduction
to the CoNLL-2004 shared task: Semantic role label-
ing. In Proceedings of the Conference on Computa-
tional Natural Language Learning (CoNLL), Boston,
MA, May.
Xavier Carreras, Llu· s M arquez, and Llu· s Padr·o. 2003.
A simple named entity extractor using AdaBoost.
In Proceedings of the Conference on Computational
Natural Language Learning (CoNLL), Edmonton,
Canada.
Jennifer Chu-Carroll and Bob Carpenter. 1999. Vector-
based natural language call routing. Computational
Linguistics, 25(3):361 388.
Kenneth W. Church. 1988. A stochastic parts program
and noun phrase parser for unrestricted text. In Second
Conference on Applied Natural Language Processing
(ANLP), pages 136 143, Austin, Texas.
Michael Collins. 1999. Head-Driven Statistical Models
for Natural Language Parsing. Ph.D. thesis, Univer-
sity of Pennsylvania, Computer and Information Sci-
ence, Philadelphia, PA.
Giuseppe Di Fabbrizio , Dawn Dutton, Narendra Gupta,
Barbara Hollister, Mazin Rahim, Giuseppe Riccardi,
Robert Schapire, and Juergen Schroeter. 2002. AT&T
help desk. In Proceedings of the International Confer-
ence on Spoken Language Processing (ICSLP), Den-
ver, CO, September.
Allen L. Gorin, Giuseppe Riccardi, and Jerry H. Wright.
1997. How May I Help You? . Speech Communica-
tion, 23:113 127.
Narendra Gupta, Gokhan Tur, Dilek Hakkani-Tcurrency1ur, Srini-
vas Bangalore, Giuseppe Riccardi, and Mazin Rahim.
To appear. The AT&T spoken language understand-
ing system. IEEE Transactions on Speech and Audio
Processing.
Kadri Hacioglu, Sameer Pradhan, Wayne Ward, James H.
Martin, and Dan Jurafsky. 2004. Semantic role label-
ing by tagging syntactic chunks. In Proceedings of
the Conference on Computational Natural Language
Learning (CoNLL), Boston, MA, May.
Dilek Hakkani-Tcurrency1ur, Gokhan Tur, Andreas Stolcke, and
Elizabeth Shriberg. 1999. Combining words and
prosody for information extraction from speech. In
Proceedings of the EUROSPEECH’99, a73a41a74a76a75 European
Conference on Speech Communication and Technol-
ogy, Budapest, Hungary, September.
Paul Kingsbury, Mitch Marcus, and Martha Palmer.
2002. Adding semantic annotation to the Penn Tree-
Bank. In Proceedings of the Human Language Tech-
nology Conference (HLT), San Diego, CA, March.
Mitchell P. Marcus, Beatrice Santorini, and Mary Ann
Marcinkiewicz. 1994. Building a large annotated cor-
pus of English: The Penn Treebank. Computational
Linguistics, 19(2):313 330.
Giuseppe Riccardi and Dilek Hakkani-Tcurrency1ur. 2003. Ac-
tive and unsupervised learning for automatic speech
recognition. In Proceedings of the European Confer-
ence on Speech Communication and Technology (EU-
ROSPEECH), Geneva, Switzerland, September.
Robert E. Schapire and Yoram Singer. 2000. Boostex-
ter: A boosting-based system for text categorization.
Machine Learning, 39(2-3):135 168.
Hans van Halteren, Jakub Zavrel, and Walter Daele-
mans. 2001. Improving accuracy in word class tag-
ging through combination of machine learning sys-
tems. Computational Linguistics, 27(2):199 230.
Ralph Weischedel, Richard Schwartz, Jeff Palmucci,
Marie Meteer, and Lance Ramshaw. 1993. Coping
with ambiguity and unknown words through proba-
bilistic models. Computational Linguistics, Special Is-
sue on Using Large Corpora, 19(2):361 382, June.
