Proceedings of HLT/EMNLP 2005 Demonstration Abstracts, pages 4–5,
Vancouver, October 2005.
Classummary:
Introducing Discussion Summarization to Online Classrooms
Liang Zhou, Erin Shaw, Chin-Yew Lin, and Eduard Hovy
University of Southern California
Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292-6695
{liangz, shaw, hovy}@isi.edu
Abstract
This paper describes a novel summariza-
tion system, Classummary, for interactive
online classroom discussions. This system
is originally designed for Open Source
Software (OSS) development forums.
However, this new application provides
valuable feedback on designing summari-
zation systems and applying them to eve-
ryday use, in addition to the traditional
natural language processing evaluation
methods. In our demonstration at HLT,
new users will be able to direct this sum-
marizer themselves.
1 Introduction
The availability of many chat forums reflects the
formation of globally dispersed virtual communi-
ties, one of which is the very active and growing
movement of Open Source Software (OSS) devel-
opment. Working together in a virtual community
in non-collocated environments, OSS developers
communicate and collaborate using a wide range
of web-based tools including Internet Relay Chat
(IRC), electronic mailing lists, and more.
Another similarly active virtual community is
the distributed education community. Whether
courses are held entirely online or mostly on-
campus, online asynchronous discussion boards
play an increasingly important role, enabling class-
room-like communication and collaboration
amongst students, tutors and instructors. The Uni-
versity of Southern California, like many other
universities, employs a commercial online course
management system (CMS).  In an effort to bridge
research and practice in education, researchers at
ISI replaced the native CMS discussion board with
an open source board that is currently used by se-
lected classes. The board provides a platform for
evaluating new teaching and learning technologies.
Within the discussion board teachers and students
post messages about course-related topics. The
discussions are organized chronologically within
topics and higher-level forums. These ‘live’ dis-
cussions are now enabling a new opportunity, the
opportunity to apply and evaluate advanced natural
language processing (NLP) technology.
Recently we designed a summarization system
for technical chats and emails on the Linux kernel
(Zhou and Hovy, 2005). It clusters discussions ac-
cording to subtopic structures on the sub-message
level, identifies immediate responding pairs using
machine-learning methods, and generates subtopic-
based mini-summaries for each chat log. Incorpo-
ration of this system into the ISI Discussion Board
framework, called Classummary, benefits both
distance learning and NLP communities. Summa-
ries are created periodically and sent to students
and teachers via their preferred medium (emails,
text messages on mobiles, web, etc). This relieves
users of the burden of reading through a large vol-
ume of messages before participating in a particu-
lar discussion. It also enables users to keep track of
all ongoing discussions without much effort. At the
same time, the discussion summarization system
can be measured beyond the typical NLP evalua-
4
tion methodologies, i.e. measures on content cov-
erage. Teachers and students’ willingness and con-
tinuing interest in using the software will be a
concrete acknowledgement and vindication of such
research-based NLP tools. We anticipate a highly
informative survey to be returned by users at the
end of the service.
2  Summarization Framework
In this section, we will give a brief description of
the discussion summarization framework that is
applied to online classroom discussions.
One important component in the original system
(Zhou and Hovy, 2005) is the sub-message clus-
tering. The original chat logs are in-depth technical
discussions that often involve multiple sub-topics,
clustering is used to model this behavior. In Clas-
summary, the discussions are presented in an orga-
nized fashion where users only respond to and
comment on specific topics. Thus, it eliminates the
need for clustering.
 All messages in a discussion are related to the
central topic, but to varying degrees. Some are an-
swers to previously asked questions, some make
suggestions and give advice where they are re-
quested, etc. We can safely assume that for this
type of conversational interactions, the goal of the
participants is to seek help or advice and advance
their current knowledge on various course-related
subjects. This kind of interaction can be modeled
as one problem-initiating message and one or more
corresponding problem-solving messages, formally
defined as Adjacent Pairs (AP). A support vector
machine, pre-trained on lexical and structural fea-
tures for OSS discussions, is used to identify the
most relevant responding messages to the initial
post within a topic.
Having obtained all relevant responses, we
adopt the typical summarization paradigm to ex-
tract informative sentences to produce concise
summaries. This component is modeled after the
BE-based multi-document summarizer (Hovy et
al., 2005). It consists of three steps. First, impor-
tant basic elements (BEs) are identified according
to their likelihood ratio (LR). BEs are automati-
cally created minimal semantic units of the form
head-modifier-relation (for example, “Libyans |
two | nn”, “indicted | Libyans | obj”, and “indicted
| bombing | for”). Next, each sentence is given a
score which is the sum of its BE scores, computed
in the first step, normalized by its length. Lastly,
taking into consideration the interactions among
summary sentences, a MMR (Maximum Marginal
Relevancy) model (Goldstein et al., 1999) is used
to extract sentences from the list of top-ranked
sentences computed from the second step.
3 Accessibility
Classummary is accessible to students and teachers
while classes are in session. At HLT, we will dem-
onstrate an equivalent web-based version. Discus-
sions are displayed on a per-topic basis; and
messages belonging to a specific discussion are
arranged in ascending order according to their
timestamps. While viewing a new message on a
topic, the user can choose to receive a summary of
the discussion so far or an overall summary on the
topic. Upon receiving the summary (for students,
at the end of an academic term), a list of questions
is presented to the user to gather comments on
whether Classummary is useful. We will show the
survey results from the classes (which will have
concluded by then) at the conference.
References
Hovy, E., C.Y. Lin, and L. Zhou. 2005. A BE-based
multi-document summarizer with sentence compres-
sion. To appear in Proceedings of Multilingual Sum-
marization Evaluation (ACL 2005), Ann Arbor, MI.
Goldstein, J., M. Kantrowitz, V. Mittal, and J. Car-
bonell. Summarizing Text Documents: Sentence Se-
lection and Evaluation Metrics. Proceedings of the
22
nd
 International ACM Conference on Research and
Development in Information Retrieval (SIGIR-99),
Berkeley, CA, 121-128.
Zhou, L. and E. Hovy. 2005. Digesting virtual “geek”
culture: The summarization of technical internet re-
lay chats. To appear in Proceedings of Association of
Computational Linguistics (ACL 2005), Ann Arbor,
MI.
5
