A New E-learning Paradigm through Annotating Operations
Hiroaki Saito, Kyoko Ohara, Kengo Sato, Kazunari Ito, Shinsuke Hizuka
Masaya Soga, Tomoya Nishino, Yuji Nomura, Hideaki Shirakawa, Hiroyuki Okamoto
Dept. of Computer Science, Keio University
3-14-1, Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan
hxs@ics.keio.ac.jp
Abstract
This paper proposes a new e-learning paradigm
which enables the user to type in arbitrary sentences.
The current NLP technologies, however, are not ma-
tured enough to perform full-automatic semantic or
discourse analysis. Thus, we take a di erent ap-
proach; an instructor corrects the contents and its
correction is embedded into the contents in an XML
format called KeML. The key/mouse operations of
the user are also embedded as annotations. Thus,
the contents can be incomplete at the initial stage
and become solid gradually as being utilized. We
have implemented an e-learning system of group dis-
cussion for a foreign language class, which is demon-
strated at the workshop.
1 Introduction
Many old e-learning systems asked the user to click
the button or to type a unique answer. The domains
of the e-learning systems were limited by their ap-
proaches and their contents were  xed. This paper
introduces a new approach to expand the domain.
Namely, the user himself annotates the contents in
addition to the instructor.
Annotated contents can be used for further learn-
ing such as example-based learning and group learn-
ing. The burden of building e-learning contents are
so heavy that this \annotated contents become other
contents" scheme is important for practical applica-
tions. Annotations are attached in an XML format.
This project can be considered as another applica-
tion of XML technologies, like MATE [1] [2] or Anvil
[3] to name a few. The principal di erence is that
some annotations are implicitly attached and used
for NLP.
2 System Overview
Here we consider the debate discussion for a for-
eign language class as an example. This course was
originally taught in a regular classroom and through
an electronic chatting board supervised by a human
instructor. One student posts his/her thought in
English and others express positive or negative re-
sponses to it. Since this is a foreign language appre-
hension class, the instructor corrects the students’
English if necessary. Since students express various
opinions, we cannot prepare for them in advance.
The instructor was occupied with correcting syntac-
tic errors, therefore, the instructor could not thor-
oughly pay attention to the  ow of the debate or to
whether students had appropriately expressed their
opinions.
In Figure 1 example debate submissions are shown
on the topic \English should be taught at an elemen-
tary school." #n indicates the submitted order and
P1, P2, ... stand for the identi er of the debaters.
(We will further explain Figure 1 later.)
Our system is designed for multi-user discussion,
not for self-learning. Thus, we divide the system
into the server and the client machines as shown in
Figure 2 (only one client machine is drawn in the
 gure). The server machine manages the contents
and handles computationally heavy NLP, while each
client machine is responsible for user interface.
We have developed an e-learning system which of-
fers the process above. Here we describe  ve impor-
tant modules:
 Sentence analysis module: In this module the
input sentences are parsed and tagged syntac-
tically and semantically in the GDA (Global
Document Annotation) [4] format. We have
adopted the Charniak parser [5], which is cus-
tomized so that the head word is identi ed be-
cause the GDA tagging requires the attachment
direction between phrases. The GDA tagger
consults the WordNet [6] to  nd the root form
of a word. GDA tags can be utilized for such
further NLP as high quality summarization or
information retrieval [7].
 KWIC module: Novice English learners of-
ten make such mistakes in collocations and in
phrasal verbs. These word-usage mistakes can
be e ectively resolved by looking at example
sentences rather than by consulting a regular
dictionary. This module presents the corpus
sentences in the KWIC (KeyWord in Context)
format in which the speci ed words are in-
cluded. Although any corpus will do for KWIC,
we have chosen the EDR English corpus [8]
which consists of 125,820 tagged sentences. Be-
cause the root form of each word is described as
a tag, conjugated forms are also searched.
 Annotation module: The instructor corrects the
wrong usages of the students’ English. This op-
PROPOSITION
English should be taught as early as at an elementary school.
2. It is often said that language is most efficiently taught during his/her early age.
against early English education?
What kind of negative effect would the score-ism cause
#6   P3 (pro)
approval
refutation
refutation
supplement
correction
1. English is a global standard language and is indispensable for grownups.
might lower their total achievement.
I disagree with teaching English at an elementary school.
more effectively   tought   in a junior high school also.
Current curriculum of teaching English is not effective enough.
I agree that the method of teaching English should be improved.
However, if the similar improvement is performed against other subjects, early
English education has no bad effect to them.
taught
#1   P1 (pro)
#2   P2 (con)
#3   P3 (pro)
#4   P4 (con)
#5   P1 (pro)
I am for teaching English at an elementary school in the following points.
Teaching English relatively decreases the class hours of other subjects and
current education system of ‘  score-ism   ’, not the starting age.
he/she is young, but what we have to do first is   to improve the
It is true that learning a foreign language is best effective when
question
approval
We can expect that children who become familiar with English earlier can be
Figure 1: Submission Statements and their Relations
NL resources
EDRcorpus
dictionary
Parser
Charniak
GDA tagger
KWIC
NLP modules
results
user interface
user
profile
DB
Retriever
Summarizer
with data
commands
DB
contents
document in KeML
client server
user
user operations
info
display
operations
Figure 2: System Architecture
Figure 3: A Snapshot of Interface Window
eration is recorded as annotation, not overwrit-
ing the originals. Preserving the originals is ef-
fective for education; it can prevent other stu-
dents from making the similar mistakes. When
the debater expresses his opinion against/for
someone else’s, that operation is also observed
and attached to the contents, which will be ex-
ploited by NLP.
 Interface module: This module enables the user
to type in sentences, specify what part he is
arguing about, express his/her attitude, etc, ef-
 ciently. This module displays the contents ef-
fectively according to the needs of the user with
the help of annotations. Our current interface
snapshot is shown in Figure 3.
 Debate Flow module
It is important to know the debate  ow when
one expresses his/her opinion. Since the rela-
tions among statements are annotated, precise
analysis of the debate  ow is possible.
In the following sections, the annotation module
is explained deeply.
3 Annotation by the Instructor and
Students
When a student expresses his/her opinion in re-
sponse to someone else’s, he can specify and denote
what part he is arguing about. This linkage is an-
notated by the user and recorded in the contents.
The corrections/comments by the instructor are also
stored in the learning contents as annotations. Ar-
rows in Figure 1 show the relation of statements,
where a dotted line expresses the linkage denoted by
the instructor, and solid lines mean that the debater
speci ed those relations.
4 The Tag Set for Debate
We have de ned a tag set for annotating debates
in an XML format called KeML (Keio e-learning
Markup Language). Here we describe our tag set
along with how each tag is attached through opera-
tions by the instructor or students.
<debate> encloses the whole debate and is at-
tached when a new debate starts. No at-
tribute is allowed. Possible child-nodes are one
<proposition>and zero or more<statement>s.
<proposition> is attached when a new proposition
is submitted. Its mandatory attribute is ‘id’
and whose value is always ‘0’. Its child-node
is <su> of GDA. The instructor or students
should remark the proposition as pros or cons.
<statement> This tag is attached when a state-
ment to a proposition or other statements is
submitted by the instructor or students. Its
mandatory attributes are ‘attitude’ whose value
would be pro or con, ‘person’ whose value in-
dicates who submitted that statement, ‘time’
which indicates when that statement was given,
and ‘id’ number (an integer). The values of the
 rst two attributes are given by the user explic-
itly, while those of the last two are  lled by the
system. Its optional attributes are ‘approval’,
‘refutation’, ‘supplement’, ‘summary’, ‘ques-
tion’, and ‘answer’ (some of those attributes ap-
pear in Figure 1). They are expressed as \ap-
proval=target id" for example. Its child-node is
<su> of GDA.
Such tags below the <su> level as <np> or <v>
are attached by the parser according to the
GDA speci cations. Every tag must have ‘id’
attribute and its value is  lled automatically by
the server.
Appendix shows the annotated contents of the de-
bate example in Figure 1.
5 Preserving Corrected Contents
In order that a novice student could observe mis-
takes by other students, our system preserves the
original contents and shows them e ectively when
needed. While some mistakes are obvious, others
are not. Only the instructor can correct or comment
those errors and KeML o ers two levels of correction
preservation. Obvious mistakes are stored as the
value of ‘original’ attribute; ‘<np original="tought"
..>taught</np>’ for instance. Unobvious mistakes
are commented in the value of ‘comment’ attribute;
‘<su comment="This is a comment for this sen-
tence." ....</su>’ for example. When further cor-
rection is made against already corrected contents,
only the very  rst version is preserved. Our current
implementation allows the correction/comments un-
der <su> nodes.
6 Conclusions
We have implemented an e-learning system which fa-
cilitates group-discussion for second language learn-
ing. Plain texts become solid as being used because
of the embedded explicit and implicit annotations by
the instructor and students. Accumulated contents
will be a good resource for statistical analysis and
example-based learning.

References
[1] MATE Workbench Homepage:
http://www.cogsci.ed.ac.uk/~dmck/MateCode/
[2] MATE Homepape:
http://mate.nis.sdu.dk
[3] Anvil Homepage:
http://www.dfki.de/~kipp/anvil/
[4] The GDA Tag Set Homepage:
http://www.i-content.org/gda/
[5] Charniak, E. \A Maximum-Entropy-Inspired
Parser", NAACL 2000. (For software, see
http://www.cs.brown.edu/people/ec/)
[6] WordNet
http://www.cogsci.princeton.edu/~wn/
[7] Miyata, T. and Hasida, K. \Information Retrieval
Based on Linguistic Structure" in Proceedings of
the Japanese-German Workshop on Natural Language
Processing, July 2003.
[8] EDR Electric Dictionary, EDR English Corpus
http://www2.crl.go.jp/kk/e416/EDR/
