A Little Goes a Long Way: Quick Authoring of Semantic Knowledge
Sources for Interpretation
Carolyn Penstein Ros´e
Carnegie Mellon University
cprose@cs.cmu.edu
Brian S. Hall
University of Pittsburgh
mosesh@pitt.edu
Abstract
In this paper we present an evaluation of
Carmel-Tools, a novel behavior oriented ap-
proach to authoring and maintaining do-
main specific knowledge sources for ro-
bust sentence-level language understanding.
Carmel-Tools provides a layer of abstraction
between the author and the knowledge sources,
freeing up the author to focus on the desired
language processing behavior that is desired in
the target system rather than the linguistic de-
tails of the knowledge sources that would make
this behavior possible. Furthermore, Carmel-
Tools offers greater flexibility in output rep-
resentation than the context-free rewrite rules
produced by previous semantic authoring tools,
allowing authors to design their own predicate
language representations.
1 Introduction
One of the major obstacles that currently makes it imprac-
tical for language technology applications to make use of
sophisticated approaches to natural language understand-
ing, such as deep semantic analysis and domain level rea-
soning, is the tremendous expense involved in authoring
and maintaining domain specific knowledge sources. In
this paper we describe an evaluation of Carmel-Tools as a
proof of concept for a novel behavior oriented approach
to authoring and maintaining domain specific knowledge
sources for robust sentence-level language understand-
ing. What we mean by behavior oriented is that Carmel-
Tools provides a layer of abstraction between the author
and the knowledge sources, freeing up the author to fo-
cus on the desired language processing behavior that is
desired in the target system rather than the linguistic de-
tails of the knowledge sources that would make this be-
havior possible. Thus, Carmel-Tools is meant to make
the knowledge source engineering process accessible to
a broader audience. Carmel-Tools is used to author do-
main specific semantic knowledge sources for the Carmel
Workbench (Ros´e, 2000; Ros´e et al., 2002) that con-
tains broad coverage domain general syntactic and lexical
knowledge sources for robust language understanding in
English. Our evaluation demonstrates how Carmel-Tools
can be used to interpret sentences in the physics domain
as part of a content-based approach to automatic essay
grading.
Sentence: The man is moving horizontally at a constant
velocity with the pumpkin.
Predicate Language Representation:
((velocity id1 man horizontal constant non-zero)
(velocity id2 pumpkin ?dir ?mag-change ?mag-zero)
(rel-value id3 id1 id2 equal))
Gloss: The constant, nonzero, horizontal velocity of the
man is equal to the velocity of the pumpkin.
Figure 1: Simple example of how Carmel-Tools builds
knowledge sources capable of assigning representations
to sentences that are not constrained to mirror the exact
wording, structure, or literal surface meaning of the text.
While much work has been done in the area of ro-
bust semantic interpretation, current authoring tools for
building semantic knowledge sources (Cunningham et
al., 2003; Jay et al., 1997) are tailored for informa-
tion extraction tasks that emphasize the identification of
named entities such as people, locations, and organiza-
tions. While regular expression based recognizers, such
as JAPE (Cunningham et al., 2000), used for information
extraction systems, are not strictly limited to these stan-
dard entity types, it is not clear how they would handle
concepts expressing complex relationships between enti-
ties, where the complexity in the meaning can be real-
ized with a much greater degree of surface syntactic vari-
ation. Outside of the information extraction domain, a
concept acquisition authoring environment called SGStu-
dio (Wang and Acero, 2003) offers similar functionality
to JAPE for building language understanding modules for
dialogue systems, with similar limitations. Carmel-Tools
is more flexible in that it allows a wider range of linguis-
tics expression that communicate the same idea to match
against the same pattern. It accomplishes this by inducing
patterns that match against a deep syntactic parse rather
than a stream of words, in order to normalize as much
surface syntactic variation as possible, and thus reduc-
ing the number of patterns that the learned rules must
account for. Furthermore, Carmel-Tools offers greater
flexibility in output representation than the context-free
rewrite rules produced by previous semantic authoring
tools, allowing authors to design their own predicate lan-
guage representations that are not constrained to follow
the structure of the input text (See Figure 1 for a simple
example and Figure 2 for a more complex example.). See
Section 3 and (Ros´e, 2000; Ros´e et al., 2002) for more
details about CARMEL’s knowledge source representa-
tion.
Note that the predicate language representation uti-
lized by Carmel-Tools is in the style of Davidsonian event
based semantics (Hobbs, 1985). For example, in Figure
1 notice that the first argument of each predicate is an
identification token that represents the whole predicate.
These identification tokens can then be bound to argu-
ments of other predicates, and in that way be used to rep-
resent relationships between predicates. For example, the
rel-value predicate expresses the idea that the predi-
cates indicated by id1 and id2 are equal in value.
While language understanding systems with this style
of analysis are not a new idea, the contribution of this
work is a set of authoring tools that simplify the semantic
knowledge sources authoring process.
2 Motivation
While the technology presented in this paper is not spe-
cific to any particular application area, this work is mo-
tivated by a need within a growing community of re-
searchers working on educational applications of Natu-
ral Language Processing to extract detailed information
from student language input to be used for formulating
specific feedback directed at the details of what the stu-
dent has uttered. Such applications include tutorial dia-
logue systems (Zinn et al., 2002; Popescue et al., 2003)
and writing coaches that perform detailed assessments of
writing content (Ros´e et al., 2003; Wiemer-Hastings et
al., 1998; Malatesta et al., 2002) as opposed to just gram-
mar (Lonsdale and Strong-Krause, 2003), and provide
detailed feedback rather than just letter grades (Burstein
et al., 1998; Foltz et al., 1998). Because of the important
role of language in the learning process (Chi et al., 2001),
and because of the unique demands educational applica-
tions place on the technology, especially where detailed
feedback based on student language input is offered to
students, educational applications present interesting op-
portunities for this community.
The area of automated essay grading has enjoyed a
great deal of success at applying shallow language pro-
cessing techniques to the problem of assigning general
quality measures to student essays (Burstein et al., 1998;
Foltz et al., 1998). The problem of providing reliable, de-
tailed, content-based feedback to students is a more diffi-
cult problem, however, that involves identifying individ-
ual pieces of content (Christie, 2003), sometimes called
“answer aspects” (Wiemer-Hastings et al., 1998). Previ-
ously, tutorial dialogue systems such as AUTO-TUTOR
(Wiemer-Hastings et al., 1998) and Research Methods
Tutor (Malatesta et al., 2002) have used LSA to per-
form an analysis of the correct answer aspects present
in extended student explanations. While straightfor-
ward applications of bag of words approaches such as
LSA have performed successfully on the content analy-
sis task in domains such as Computer Literacy (Wiemer-
Hastings et al., 1998), they have been demonstrated to
perform poorly in causal domains such as research meth-
ods (Malatesta et al., 2002) and physics (Ros´e et al.,
2003) because they base their predictions only on the
words included in a text and not on the functional rela-
tionships between them. Key phrase spotting approaches
such as (Christie, 2003) fall prey to the same problem. A
hybrid rule learning approach to classification involving
both statistical and symbolic features has been shown to
perform better than LSA and Naive Bayes classification
(McCallum and Nigam, 1998) for content analysis in the
physics domain (Ros´e et al., 2003). Nevertheless, trained
approaches such as this perform poorly on low-frequency
classes and can be too coarse grained to provide enough
information to the system for it to provide the kind of
detailed feedback human tutors offer students (Lepper et
al., 1993) unless an extensive hierarchy of classes that
represent subtle differences in content is used (Popescue
et al., 2003). Popescue et al. (2003) present impressive
results at using a symbolic classification approach involv-
ing hand-written rules for performing a detailed assess-
ment of student explanations in the Geometry domain.
Rule based approaches have also shown promise in non-
educational domains. For example, an approach to adapt-
ing the generic rule based MACE system for informa-
tion extraction has achieved an F-measure of 82.2% at
the ACE task (Maynard et al., 2002). Authoring tools for
speeding up and simplifying the task of writing symbolic
rules for assessing the content in student essays would
make it more practical to take advantage of the benefits
of rule based assessment approaches.
3 Carmel-Tools Interpretation Framework
Sentence: During the fall of the elevator the man and
the keys have the same constant downward acceleration
that the elevator has.
Predicate Language Representation:
((rel-time id0 id1 id2 equal)
(body-state id1 elevator freefall)
(and id2 id3 id4)
(rel-value id3 id5 id7 equal)
(rel-value id4 id6 id6 equal)
(acceleration id5 man down constant non-zero)
(acceleration id6 keys down constant non-zero)
(acceleration id7 elevator down constant non-zero))
Gloss: The elevator is in a state of freefall at the same
time when there is an equivalence between the elevator’s
acceleration and the constant downward nonzero
acceleration of both the man and the keys
Figure 2: Example of how deep syntactic analysis facili-
tates uncovering complex relationships encoded syntacti-
cally within a sentence
One of the goals behind the design of Carmel-Tools is
to leverage off of the normalization over surface syntac-
tic variation that deep syntactic analysis provides. While
our approach is not specific to a particular framework for
deep syntactic analysis, we have chosen to build upon
the publicly available LCFLEX robust parser (Ros´e et al.,
2002), the CARMEL grammar and semantic interpreta-
tion framework (Ros´e, 2000), and the COMLEX lexicon
(Grishman et al., 1994). This same broad coverage, do-
main general interpretation framework has already been
used in a number of educational applications including
(Zinn et al., 2002; VanLehn et al., 2002).
Syntactic feature structures produced by the CARMEL
grammar normalize those aspects of syntax that modify
the surface realization of a sentence but do not change
its deep functional analysis. These aspects include tense,
negation, mood, modality, and syntactic transformations
such as passivization and extraction. Thus, a sentence
and it’s otherwise equivalent passive counterpart would
be encoded with the same set of functional relationships,
but the passive feature would be negative for the active
version of the sentence and positive for the passive ver-
sion. A verb’s direct object is assigned the obj role re-
gardless of where it appears in relation to the verb. Fur-
thermore, constituents that are shared between more than
one verb, for example a noun phrase that is the object of
a verb as well as the subject of a relative clause modi-
fier, will be assigned both roles, in that way “undoing”
the relative clause extraction. In order to do this analy-
sis reliably, the component of the grammar that performs
the deep syntactic analysis of verb argument functional
relationships was generated automatically from a feature
representation for each of 91 of COMLEX’s verb subcat-
egorization tags (Ros´e et al., 2002). Altogether there are
519 syntactic configurations of a verb in relation to its ar-
guments covered by the 91 subcategorization tags, all of
which are covered by the CARMEL grammar.
CARMEL provides an interface to allow semantic in-
terpretation to operate in parallel with syntactic interpre-
tation at parse time in a lexicon driven fashion (Ros´e,
2000). Domain specific semantic knowledge is encoded
declaratively within a meaning representation specifica-
tion. Semantic constructor functions are compiled au-
tomatically from this specification and then linked into
lexical entries. Based on syntactic head/argument rela-
tionships assigned at parse time, the constructor func-
tions enforce semantic selectional restrictions and assem-
ble meaning representation structures by composing the
meaning representation associated with the constructor
function with the meaning representation of each of its
arguments. After the parser produces a semantic feature
structure representation of the sentence, predicate map-
ping rules then match against that representation in or-
der to produce a predicate language representation in the
style of Davidsonian event based semantics (Davidson,
1967; Hobbs, 1985), as mentioned above. The predicate
mapping stage is the key to the great flexibility in repre-
sentation that Carmel-Tools is able to offer. The mapping
rules perform two functions. First, they match a feature
structure pattern to a predicate language representation.
Next, they express where in the feature structure to look
for the bindings of the uninstantiated variables that are
part of the associated predicate language representation.
Because the rules match against feature structure patterns
and are thus above the word level, and because the pred-
icate language representations associated with them can
be arbitrarily complex, the mapping process is decompo-
sitional in manner but is not constrained to rigidly follow
the structure of the text.
Figure 2 illustrates the power in the pairing between
deep functional analyses and the predicate language rep-
resentation. The deep syntactic analysis of the sentence
makes it possible to uncover the fact that the expression
“constant downward acceleration” applies to the acceler-
ation of all three entities mentioned in the sentence. The
coordination in the subject of the sentence makes it pos-
sible to infer that both the acceleration of the man and of
the keys are individually in an equative relationship with
the acceleration of the elevator. The identification to-
ken of the and predicate allows the whole representation
of the matrix clause to be referred to in the rel-time
predicate that represents the fact that the equative rela-
tionships hold at the same time as the elevator is in a state
Figure 3: Predicate Language Definition Page
Figure 4: Example Map Page
of freefall. But individual predicates, each representing
a part of the meaning of the whole sentence, can also be
referred to individually if desired using their own identi-
fication tokens.
4 Carmel-Tools Authoring Process
The purpose of Carmel-Tools is to insulate the author
from the details of the underlying domain specific knowl-
edge sources. If an author were building knowledge
sources by hand for this framework, the author would be
responsible for building an ontology for the semantic fea-
ture structure representation produced by the parser, link-
ing pointers into this hierarchy into entries in the lexicon,
and writing predicate mapping rules. With Carmel-Tools,
the author never has to deal directly with these knowledge
sources. The Carmel-Tools authoring process involves
designing a Predicate Language Definition, augmenting
the base lexical resources by either loading raw human
tutoring corpora or entering example texts by hand, and
annotating example texts with their corresponding rep-
resentation in the defined Predicate Language Defini-
tion. From this authored knowledge, CARMEL’s se-
mantic knowledge sources can be generated and com-
piled. The knowledge source inference algorithm ensures
that knowledge coded redundantly across multiple exam-
ples is represented only once in the compiled knowledge
sources. The authoring interface allows the author or au-
thors to test the compiled knowledge sources and then
continue the authoring process by updating the Predicate
Language Definition, loading additional corpora, anno-
tating additional examples, or modifying already anno-
tated examples.
The Carmel-Tools authoring process was designed to
eliminate the most time-consuming parts of the authoring
Figure 5: Text Annotation Page Page
process. In particular, its GUI interface guides authors in
such a way as to prevent them from introducing inconsis-
tencies between knowledge sources, which is particularly
crucial when multiple authors work together. For exam-
ple, a GUI interface for entering propositional representa-
tions for example texts insures that the entered represen-
tation is consistent with the author’s Predicate Language
Definition. Compiled knowledge sources contain point-
ers back to the annotated examples that are responsible
for their creation. Thus, it is also able to provide trou-
bleshooting facilities to help authors track down potential
sources for incorrect analyses generated from compiled
knowledge sources. When changes are made to the Pred-
icate Language Definition, Carmel-Tools tests whether
each proposed change would cause conflicts with any an-
notated example texts. An example of such a change
would be deleting an argument from a predicate type
where some example has as part of its analysis an instan-
tiation of a predicate with that type where that argument
is bound. If so, it lists these example texts for the author
and requires the author to modify the annotated examples
first in such a way that the proposed change will not cause
a conflict, in this case that would mean uninstantiating the
variable that the author desires to remove. In cases where
changes would not cause any conflict, such as adding an
argument to a predicate type, renaming a predicate, to-
ken, or type, or removing an argument that is not bound
in any instantiated proposition, these changes are made
throughout the database automatically.
4.1 Defining the Predicate Language Definition
The author begins the authoring process by designing
the propositional language that will be the output repre-
sentation from CARMEL using the authored knowledge
sources. This is done on the Predicate Language Defi-
nition page of the Carmel-Tools interface, displayed in
Figure 3. The author is free to develop a representation
language that is as simple or complex as is required by the
type of reasoning, if any, that will be applied to the output
representations by the tutoring system as it formulates its
response to the student’s natural language input.
The interface includes facilities for defining a list of
predicates and Tokens to be used in constructing proposi-
tional analyses. Each predicate is associated with a basic
predicate type, which is a associated with a list of argu-
ments. Each basic predicate type argument is itself asso-
ciated with a type that defines the range of atomic values,
which may be tokens or identifier tokens referring to in-
stantiated predicates, that can be bound to it. Thus, to-
kens also have types. Each token has one or more basic
token types. Besides basic predicate types and basic to-
ken types, we also allow the definition of abstract types
that can subsume other types.
4.2 Generating Lexical Resources and Annotating
Example Sentences
When the predicate language definition is defined, the
next step is to generate the domain specific lexical re-
sources and annotate example sentences with their corre-
sponding representation within this defined predicate lan-
guage. The author begins this process on the Example
Map Page, displayed in Figure 4.
Carmel-Tools provides facilities for loading a raw hu-
man tutoring corpus file. Carmel-Tools then makes a list
of each unique morpheme it finds in the file and then aug-
ments both its base lexicon (using entries from COM-
LEX), in order to include all morphemes found in the
transcript file that were not already included in the base
lexicon, and the spelling corrector’s word list, so that it
includes all morphological forms of the new lexical en-
tries. It also segments the file into a list of student sen-
tence strings, which are then loaded into a Corpus Ex-
amples list, which appears on the right hand side of the
interface. Searching and sorting facilities are provided to
make it easy for authors to find sentences that have cer-
tain things in common in order to organize the list of sen-
tences extracted from the raw corpus file in a convenient
way. For example, a Sort By Similarity button
causes Carmel-Tools to sort the list of sentences accord-
ing to their respective similarity to a given text string ac-
cording to an LSA match between the example string and
each corpus sentence. The interface also includes the To-
ken List and the Predicate List, with all defined tokens
and predicates that are part of the defined predicate lan-
guage. When the author clicks on a predicate or token,
the Examples list beside it will display the list of anno-
tated examples that have been annotated with an analysis
containing that token or predicate.
Figure 5 displays how individual texts are annotated.
The Analysis box displays the propositional representa-
tion of the example text. This analysis is constructed
using the Add Token, Delete, Add Predicate,
and Modify Predicatebuttons, as well as their sub-
windows, which are not shown. Once the analysis is en-
tered, the author may indicate the compositional break-
down of the example text by associating spans of text
with parts of the analysis by means of the Optional
Match and Mandatory Match buttons. For exam-
ple, the noun phrase “the man” corresponds to the man
token, which is bound in two places. Each time a match
takes place, the Carmel-Tools internal data structures cre-
ate one or more templates that show how pieces of syn-
tactic analyses corresponding to spans of text are matched
up with their corresponding propositional representation.
From this match Carmel-Tools infers both that “the man”
is a way of expressing the meaning of the man token in
text and that the subject of the verbholdcan be bound to
the ?body1 argument of the become predicate. By de-
composing example texts in this way, Carmel-Tools con-
structs templates that are general and can be reused in
multiple annotated examples. It is these created templates
that form the basis for all compiled semantic knowledge
sources. Thus, even if mappings are represented redun-
dantly in annotated examples, they will not be repre-
sented redundantly in the compiled knowledge sournces.
The list of templates that indicates the hierarchical break-
down of this example text are displayed in the Templates
list on the right hand side of Figure 5. Note that while the
author matches spans to text to portions of the meaning
representation, the tool stores mappings between feature
structures and portions of meaning representation, which
is a more general mapping.
Templates can be generalized by entering paraphrases
for portions of template patterns. Internally what this ac-
complishes is that all paraphrases listed can be interpreted
by CARMEL as having the same meaning so that they
can be treated as interchangeable in the context of this
template. A paraphrase can be entered either as a specific
string or as a Defined Type, including any type defined
in the Predicate Language Definition. What this means is
that the selected span of text can be replaced by any span
of text that can be interpreted in such a way that its pred-
icate representation’s type is subsumed by the indicated
type.
4.3 Compiling Knowledge Sources
Each template that is created during the authoring pro-
cess corresponds to one or more elements of each of
the required domain specific knowledge sources, namely
the ontology, the lexicon with semantic pointers, and the
predicate mapping rules. Using the automatically gener-
ated knowledge sources, most of the “work” for mapping
a novel text onto its predicate language representation is
done either by the deep syntactic analysis, where a lot of
surface syntactic variation is factored out, and during the
predicate mapping phase, where feature structure patterns
are mapped onto their corresponding predicate language
representations. The primary purpose of the sentence
level ontology that is used to generate a semantic fea-
ture structure at parse time is primarily for the purpose of
limiting the ambiguity produced by the parser. Very little
generalization is obtained by the semantic feature struc-
tures created by the automatically generated knowledge
sources over that provided by the deep syntactic analysis
alone. By default, the automatically generated ontology
contains a semantic concept corresponding to each word
appearing in at least one annotated example. A semantic
pointer to that concept is then inserted into all lexical en-
tries for the associated word that were used in one of the
annotated examples. An exception occurs where para-
phrases are entered into feature structure representations.
In this case, a semantic pointer is entered not only into the
entry for the word from the sentence, but also the words
from the paraphrase list, allowing all of the words in the
paraphrase list to be treated as equivalent at parse time.
The process is a bit more involved in the case of verbs.
In this case it is necessary to infer based on the parses of
the examples where the verb appears which set of sub-
categorization tags are consistent, thus limiting the set
of verb entries for each verb that will be associated with
a semantic pointer, and thus which entries can be used
at parse time in semantic interpretation mode. Carmel-
Tools makes this choice by considering both which argu-
ments are present with that verb in the complete database
of annotated examples as well as how the examples were
broken down at the matching stage. All non-extracted
arguments are considered mandatory. All extracted argu-
ments are considered optional. Each COMLEX subcat
tag is associated with a set of licensed arguments. Thus,
subcat tags are considered consistent if the set of licensed
arguments contains at least all mandatory arguments and
doesn’t license any arguments that are not either manda-
tory or optional. Predicate mapping rules are generated
for each template by first converting the corresponding
syntactic feature structure into the semantic representa-
tion defined by the automatically generated ontology and
lexicon with semantic pointers. Predicate mapping rules
are then created that map the resulting semantic feature
structure into the associated predicate language represen-
tation.
5 Evaluation
A preliminary evaluation was run for the physics domain.
We used for our evaluation a corpus of essays written by
students in response to 5 simple qualitative physics ques-
tions such as “If a man is standing in an elevator hold-
ing his keys in front of his face, and if the cable hold-
ing the elevator snaps and the man then lets go of the
keys, what will be the relationship between the position
of the keys and that of the man as the elevator falls to the
ground? Explain why.” A predicate language definition
was designed consisting of 40 predicates, 31 predicate
types, 160 tokens, 37 token types, and 15 abstract types.
The language was meant to be able to represent phys-
ical objects mentioned in our set of physics problems,
body states (e.g., freefall, contact, non-contact), quanti-
ties that can be measured (e.g., force, velocity, acceler-
ation, speed, etc.), features of these quantities (e.g., di-
rection, magnitude, etc.), comparisons between quanti-
ties (equivalence, non-equivalence, relative size, relative
time, relative location), physics laws, and dependency re-
lations. An initial set of 250 example sentences was then
annotated, including sentences from each of a set of 5
physics problems.
Next a set of 202 novel test sentences, each between 4
and 64 words long, was extracted from the corpus. Since
comparisons, such as between the accelerations of objects
in freefall together, are important for the reasoning in all
of the questions used for corpus collection, we focused
the coverage evaluation specifically on sentences pertain-
ing to comparisons, such as in Figures 1 and 2. The goal
of the evaluation was to test the extent to which knowl-
edge generated from annotated examples generalizes to
novel examples.
Since obtaining the correct predicate language repre-
sentation requires obtaining a correct syntactic parse, we
first evaluated CARMEL’s syntactic coverage over the
corpus of test sentences to obtain an upper bound for ex-
pected performance. We assigned the syntactic interpre-
tation of each sentence a score of None, Bad, Partial, or
Acceptable. A grade of None indicates that no interpreta-
tion was built by the grammar. Bad indicates that parses
were generated, but they contained errorfull functional
relationships between constituents. Partial indicates that
no parse was generated that covered the entire sentence,
ut the portions that were completely correct for at least
one interpretation of the sentence. Acceptable indicates
that a complete parse was built that contained no incor-
rect functional relationships. If any word of the sentence
was not covered, it was one that would not change the
meaning of the sentence. For example, “he had the same
velocity as you had” is the same as “he had the same ve-
locity as you”, so if “did” was not part of the final parse
but other than that, the parse was fine, it was counted as
Acceptable. Overall the coverage of the grammar was
very good. 166 sentences were graded Acceptable, which
is about 83% of the corpus. 8 received a grade of Partial,
26 Bad, and 1 None.
We then applied the same set of grades to the quality of
the predicate language output. Note that that the grade as-
signed to an analysis represents the correctness and com-
pleteness of the predicate representation the system ob-
tained for that sentence. In this case, a grade of Accept-
able meant that all aspects of intended meaning were ac-
counted for, and no misleading information was encoded.
Partial indicated that some non-trivial part of the intended
meaning was communicated. Any interpretation contain-
ing any misleading information was counted as Bad. If
no predicate language representation was returned, the
sentence was graded as None. As expected, grades for
semantic interpretation were not as high as for syntactic
analysis. In particular, 107 were assigned a grade of Ac-
ceptable, 45 were assigned a grade of Partial, 36 were
assigned a grade of Bad, and 14 received a nil interpre-
tation. Our evaluation demonstrates that knowledge gen-
erated from annotated examples can be used to interpret
novel sentences, however, there are still gaps in the cov-
erage of the automatically generated knowledge sources
that need to be filled in with new annotated examples.
Furthermore, the small but noticeable percentage of bad
interpretations indicates that some previously annotated
examples need to be modified in order to prevent these
bad interpretations from being generated.
6 Current Directions
In this paper we have introduced Carmel-Tools, a tool
set for quick authoring of semantic knowledge sources.
Our evaluation demonstrates that the semantic knowledge
sources inferred from examples annotated using Carmel-
Tools generalize to novel sentences. We are continuing
to work to enhance the ability of Carmel-Tools to learn
generalizable knowledge from examples as well as to im-
prove the user friendliness of the interface.

References
J. Burstein, K. Kukich, S. Wolff, C. Lu, M. Chodorow,
L. Braden-Harder, and M. D. Harris. 1998. Au-
tomated scoring using a hybrid feature identification
technique. In Proceedings of COLING-ACL’98, pages
206–210.
M. T. H. Chi, S. A. Siler, H. Jeong, T. Yamauchi, and
R. G. Hausmann. 2001. Learning from human tutor-
ing. Cognitive Science, (25):471–533.
J. Christie. 2003. Automated essay marking for content:
Does it work? In CAA Conference Proceedings.
H. Cunningham, D. Maynard, and V. Tablan. 2000. Jape:
a java annotations patterns engine. Institute for Lan-
guage, Speach, and Hearing, University of Sheffield,
Tech Report CS-00-10.
H. Cunningham, D. Maynard, K. Bontcheva, and
V. Tablan. 2003. Gate: an architecture for develop-
ment of robust hlt applications. In Recent Advanced in
Language Processing.
D. Davidson. 1967. The logical form of action sentences.
In N. Rescher, editor, The Logic of Decision and Ac-
tion.
P. W. Foltz, W. Kintsch, and T. Landauer. 1998. The
measurement of textual coherence with latent semantic
analysis. Discourse Processes, 25(2-3):285–307.
R. Grishman, C. Macleod, and A. Meyers. 1994. COM-
LEX syntax: Building a computational lexicon. In
Proceedings of the 15th International Conference on
Computational Linguistics (COLING-94).
J. R. Hobbs. 1985. Ontological promiscuity. In Proceed-
ings of the 23rd Annual Meeting of the Association for
Computational Linguistics.
D. Jay, J. Aberdeen, L. Hirschman, R. Kozierok,
P. Robinson, and M. Vilain. 1997. Mixed-initiative
development of language processing systems. In Fifth
Conference on Applied Natural Language Processing.
M. Lepper, M. Woolverton, D. Mumme, and J. Gurtner.
1993. Motivational techniques of expert human tutors:
Lessons for the design of computer based tutors. In
S. P. Lajoie and S. J. Derry, editors, Computers as Cog-
nitive Tools, pages 75–105.
D. Lonsdale and D. Strong-Krause. 2003. Automated
rating of esl essays. In Proceedings of the HLT-NAACL
2003 Workshop: Building Educational Applications
Using Natural Language Processing.
K. Malatesta, P. Wiemer-Hastings, and J. Robertson.
2002. Beyond the short answer question with research
methods tutor. In Proceedings of the Intelligent Tutor-
ing Systems Conference.
D. Maynard, K. Bontcheva, and H. Cunningham. 2002.
Towards a semantic extraction of named entities. In
40th Anniversary meeting of the Association for Com-
putational Linguistics.
A. McCallum and K. Nigam. 1998. A comparison of
event models for naive bayes text classification. In
Proceedings of the AAAI-98 Workshop on Learning for
Text Classification.
O. Popescue, V. Aleven, and K. Koedinger. 2003. A
kowledge based approach to understanding students’
explanations. In Proceedings of the AI in Education
Workshop on Tutorial Dialogue Systems: With a View
Towards the Classroom.
C. P. Ros´e, D. Bhembe, A. Roque, and K. VanLehn.
2002. An efficient incremental architecture for ro-
bust interpretation. In Proceedings of the Human Lan-
guages Technology Conference, pages 307–312.
C. P. Ros´e, A. Roque, D. Bhembe, and K VanLehn. 2003.
A hybrid text classification approach for analysis of
student essays. In Proceedings of the HLT-NAACL
2003 Workshop: Building Educational Applications
Using Natural Language Processing.
C. P. Ros´e. 2000. A framework for robust semantic in-
terpretation. In Proceedings of the First Meeting of the
North American Chapter of the Association for Com-
putational Linguistics, pages 311–318.
K. VanLehn, P. Jordan, C. P. Ros´e, and The Natural Lan-
guag e Tutoring Group. 2002. The architecture of
why2-atlas: a coach for qualitative physics essay writ-
ing. In Proceedings of the Intelligent Tutoring Systems
Conference, pages 159–167.
Y. Wang and A. Acero. 2003. Concept acquisition in
example-based grammar authoring. In Proceedings
of the International Conference on Acoustics, Speech,
and Signal Processing.
P. Wiemer-Hastings, A. Graesser, D. Harter, and the Tu-
toring Res earch Group. 1998. The foundations
and architecture of autotutor. In B. Goettl, H. Halff,
C. Redfield, and V. Shute, editors, Intelligent Tutor-
ing Systems: 4th International Conference (ITS ’98 ),
pages 334–343. Springer Verlag.
C. Zinn, J. D. Moore, and M. G. Core. 2002. A 3-
tier planning architecture for managing tutorial dia-
logue. In Proceedings of the Intelligent Tutoring Sys-
tems Conference, pages 574–584.
