Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL, pages 15–22,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
“Language and Computers”
Creating an Introduction for a General Undergraduate Audience
Chris Brew
Department of Linguistics
The Ohio State University
cbrew@ling.osu.edu
Markus Dickinson
Department of Linguistics
The Ohio State University
dickinso@ling.osu.edu
W. Detmar Meurers
Department of Linguistics
The Ohio State University
dm@ling.osu.edu
Abstract
This paper describes the creation
of Language and Computers, a new
course at the Ohio State University de-
signed to be a broad overview of topics
in computational linguistics, focusing
on applications which have the most
immediate relevance to students. This
course satisfies the mathematical and
logical analysis requirement at Ohio
State by using natural language sys-
tems to motivate students to exercise
and develop a range of basic skills in
formal and computational analysis. In
this paper we discuss the design of the
course, focusing on the success we have
had in offering it, as well as some of the
difficulties we have faced.
1 Introduction
In the autumn of 2003, we created Language
and Computers (Linguistics 384), a new course
at the Ohio State University that is designed to
be a broad overview of topics in computational
linguistics, focusing on applications which have
the most immediate relevance to students. Lan-
guage and Computers is a general enrollment
course designed to meet the Mathematical and
Logical Analysis requirement that is mandated
for all undergraduates at the Ohio State Uni-
versity (OSU), one of the largest universities in
the US. We are committed to serving the av-
erage undergraduate student at OSU, including
those for whom this is the first and last Lin-
guistics course. Some of the students take the
course because it is an alternative to calculus,
others because of curiosity about the subject
matter. The course was first taught in Win-
ter 2004, drawing a wide range of majors, and
has since expanded to three sections of up to 35
students each. In this paper we will discuss the
design of the course, focusing on the success we
have had in offering it, as well as some of the
difficulties we have faced.
2 General Context
The Linguistics Department at OSU is the home
of a leading graduate program in which 17 grad-
uate students are currently specializing in com-
putational linguistics. From the perspective of
the graduate program, the goalofthe new course
development was to create more appropriate
teaching opportunities for the graduate students
specializing in computational linguistics. Much
of the undergraduate teaching load in Linguis-
tics at OSU is borne by graduate teaching assis-
tants (GTAs) who receive stipends directly from
the department. After a training course in the
first year, most such GTAs act as instructors on
the Department’s “Introduction to Language,”
which is taught in multiple small sections. In-
structors are given considerable responsibility
for all aspects of course design, preparation, de-
livery, and grading. This works very well and
produces many superb instructors, but by 2003
it was apparent that increasing competition was
reducing the pool of undergraduates who want
to take this general overview course.
The Ohio State University has a distribution
requirement, the General Education Curricu-
15
lum (GEC), that is designed to ensure adequate
breadth in undergraduate education. The twin
demands of the student’s major and the distri-
bution requirement are sufficient to take up the
vast majority of the credit hours required for
graduation. In practice this means that students
tend to make course selections motivated pri-
marily by the goal of completing the necessary
requirements as quickly and efficiently as they
can, possibly at the expense of curiosity-driven
exploration. Linguistics, as an interdisciplnary
subject, can create courses that satisfy both cu-
riosity and GEC requirements.
To fill this interdisciplinary niche, the OSU
Department of Linguistics has created a range
of new courses such as Language and Gender,
Language and the Mind, Language and the Law,
and the Language and Computers course dis-
cussed in this paper. In addition to filling a dis-
tribution requirement niche for undergraduates,
the courses also allow the linguistics GTAs to
teach courses on topics that are related to their
area of specialization, which can be beneficial
both to the instructors and to those instructed.
Prior to creation of the new Language and Com-
puters course, there were virtually no opportu-
nities for student members of the computational
linguistics group to teach material close to their
focus.
3 Course overview
The mission statement for our course reads:
In the past decade, the widening use
of computers has had a profound influ-
ence on the way ordinary people com-
municate, search and store informa-
tion. For the overwhelming majority
of people and situations, the natural
vehicle for such information is natu-
ral language. Text and to a lesser ex-
tent speech are crucial encoding for-
mats for the information revolution.
This course will give students insight
into the fundamentals of how comput-
ers are used to represent, process and
organize textual and spoken informa-
tion, as well as providing tips on how
to effectively integrate this knowledge
into their working practice. The course
will cover the theory and practice of
human language technology.
The course was designed to meet the Math-
ematical and Logical Analysis (MLA) require-
ment for students at the Ohio State University,
which is characterized in the following way:
A student in a B.A. program must take
one course that focuses on argument in
a context that emphasizes natural lan-
guage, mathematics, computer science
or quantitative applications not pri-
marily involving data. Courses which
emphasize the nature of correct argu-
mentation either in natural languages
or in symbolic form would satisfy this
requirement, as would many mathe-
matics or computer science courses.
...The courses themselves should em-
phasize the logical processes involved
in mathematics, inductive or deductive
reasoning, or computing and the the-
ory of algorithms.
Linguistics 384 responds to this specification
by using natural language systems to motivate
students to exercise and develop a range of ba-
sic skills in formal and computational analysis.
The course combines lectures with group work
and in-class discussions, resulting in a seminar-
like environment. We enrol no more than 35
students per section, often significantly fewer at
unpopular times of day.
The course philosophy is to ground abstract
concepts in real world examples. We intro-
duce strings, regular expressions, finite-state
and context-free grammars, as well as algo-
rithms defined over these structures and tech-
niques for probing and evaluating systems that
rely on these algorithms. This meets the MLA
objective to emphasize the nature of correct ar-
gumentation in symbolic form as well as the logi-
cal processes involved in computing and the the-
ory of algorithms. These abstract ideas are em-
bedded in practical applications: web searching,
16
spelling correction, machine translation and di-
alogue systems. By covering the technologies
behind these applications, the course addresses
the requirement to sharpen a student’s ability
to reason critically, construct valid arguments,
think creatively, analyze objectively, assess ev-
idence, perceive tacit assumptions, and weigh
evidence.
Students have impressions about the quality
of such systems, but the course goes beyond
merely subjective evaluation of systems and em-
phasizes the use of formal reasoning to draw and
argue for valid conclusions about the design, ca-
pabilities and behavior of natural language sys-
tems.
In ten weeks, we cover eight topics, using a
data projector in class, with copies of the slides
being handed out to the student before each
class. There is no textbook, and there are rel-
atively few assigned readings, as we have been
unable to locate materials appropriate for an av-
erage student without required background who
may never take another (computational) linguis-
tics class. The topics covered are the following,
in this order:
• Text and speech encoding
• (Web-)Searching
• Spam filtering (and other classification
tasks, such as language identification)
• Writers’ aids (Spelling and grammar correc-
tion)
• Machine translation (2 weeks)
• Dialogue systems (2 weeks)
• Computer-aided language learning
• Social context of language technology use
In contrast to the courses of which we are
aware that offer computational linguistics to un-
dergraduates, our Language and Computers is
supposed to be accessible without prerequisites
to students from every major (a requirement for
GEC courses). For example, we cannot assume
any linguistic background or language aware-
ness. Like Lillian Lee’s Cornell course (Lee,
2002), the course cannot presume programming
ability. But the GEC regulations additionally
prohibit us from requiring anything beyond high
school level abilities in algebraic manipulation.
We initially hoped that this meant that we
would be able to rely on the kind of math knowl-
edge that we ourselves acquired in secondary
school, but soon found that this was not real-
istic. The sample questions from Lee’s course
seem to us to be designed for students who ac-
tively enjoy math. Our goal is different: we
want to exercise and extend the math skills of
the general student population, ensuring that
the course is as accessible to the well-motivated
dance major as it is to the geekier people with
whom we are somewhat more familiar. This is
hard, but worthwhile.
The primary emphasis is on discrete math-
ematics, especially with regard to strings and
grammars. In addition, the text classification
and spam-filtering component exercise the abil-
ity to reason clearly using probabilities. All of
this can be achieved for students with no colle-
giate background in mathematics.
Specifically, Linguistics 384 uses non-trivial
mathematics at a level at or just beyond algebra
1 in the following contexts:
• Reasoning about finite-state automata and
regular expressions (in the contexts of web
searching and of information management).
Students reason about relationships be-
tween specific and general search terms.
• Reasoning about more elaborate syntactic
representations (such as context-free gram-
mars) and semantic representations (such
as predicate calculus), in order to better
understand grammar checking and machine
translation errors.
• Reasoning about the interaction between
components of natural language systems (in
the contexts of machine translation and of
dialog systems).
• Understanding the basics of dynamic pro-
gramming via spelling correction (edit dis-
tance) and applying algebraic thinking to
algorithm design.
17
• Simple probabilistic reasoning (in the con-
text of text classification).
There is also an Honors version of the course,
which is draws on a somewhat different pool
of students. In 2004 the participants in Hon-
ors 384 were equally split between Linguistics
majors looking for a challenging course, people
with a computer background and some interest
in language and people for whom the course was
a good way of meeting the math requirement at
Honors level. Most were seniors, so there was lit-
tle feed-through to further Linguistics courses.
The Honors course, which used to be
called Language Processing Technology, pre-
dates Language and Computers, and includes
more hands-on material. Originally the first half
of this course was an introduction to phonetics
and speech acoustics through Praat, while the
second was a Prolog-based introduction to sym-
bolic NLP. We took the opportunity to redesign
this course when we created the non-honors ver-
sion. In the current regime, the hands-on aspect
is less important than the opportunities offered
by the extra motivation and ability of these stu-
dents. Two reading assignments in the honors
version were Malcolm Gladwell’s book review on
the Social Life of Paper (Gladwell, 2001) and
Turing’s famous paper on the Imitation Game
(Turing, 1950). We wondered whether the ec-
centricity and dated language of the latter would
be a problem, but it was not.
Practical assignments in the laboratory are
possible in the honors course, because the class
size can be limited. One such assignment was
a straightforward run-through of the clock tu-
torial from the Festival speech synthesis system
and another a little machine translation system
between digits and number expressions. Having
established that they can make a system that
turns 24 into ’twenty four’, and so on, the stu-
dents are challenged to adapt it to speak “Fairy
Tale English”: that is, to make it translate 24
into ’four and twenty’, and vice-versa.
1
1For a complete overview of the course materials,
there are several course webpages to check out. The web-
page for the first section of the course (Winter 2004)
4 General themes of the course
Across the eight different topics that are taught,
we try to maintain a cohesive feel by emphasiz-
ing and repeating different themes in computa-
tional linguistics. Each theme allows the stu-
dents to see that certain abstract ideas are quite
powerful and can inform different concrete tasks.
The themes which have been emphasized to this
point are as follows:
• There are both statistical and rule-based
methods for approaching a problem in nat-
ural language processing. We show this
most clearly in the spam filtering unit and
the machine translation unit with different
types of systems.
• There is a tension between developing tech-
nology in linguistically-informed ways and
developing technology so that a product is
effective. In the context of dialogue sys-
tems, for example, the lack of any linguistic
knowledge in ELIZA makes it fail quickly,
but an ELIZA with a larger database and
still no true linguistic knowledge could have
more success.
• Certain general techniques, such as n-gram
analysis, can be applied to different compu-
tational linguistic applications.
• Effective technology does not have to solve
every problem; focusing on a limited do-
main is typically more practical for the ap-
plications we look at. In machine transla-
tion, this means that a machine translation
system translating the weather (e.g., the
METEO system) will perform better than
a general-purpose system.
• Intelligent things are being done to improve
natural language technology, but the task is
a very difficult one, due to the complexities
of language. Part of each unit is devoted to
is at http://ling.osu.edu/~dickinso/384/wi04/. A
more recent section (Winter 2005) can be found at http:
//ling.osu.edu/~dm/05/winter/384/. For the honors
course, the most recent version is located at http:
//ling.osu.edu/~cbrew/2005/spring/H384/. A list of
weblinks to demos, software, and on-line tutorials cur-
rently used in connection with the course can be found
at http://ling.osu.edu/~xflu/384/384links.html
18
showing that the problem the technology is
addressing is a complex one.
5 Aspects of the course that work
The course has been a positive experience, and
students overall seemed pleased with it. This
is based on the official student evaluation of
instruction, anonymous, class specific question-
naires we handed out at the end of the class,
personal feedback, and new students enrolling
based on recommendations from students who
took the course. We attribute the positive re-
sponse to several different aspects of the course.
5.1 Topics they could relate to
Students seem to most enjoy those topics which
were most relevant to their everyday life. On the
technological end, this means that the units on
spam filtering, web searching, and spell check-
ing are generally the most well-received. The
more practical the focus, the more they seem
to appreciate it; for web searching, for instance,
they tend to express interest in becoming better
users of the web. On the linguistic end, discus-
sions of how dialogue works and how language
learning takes place, as part of the units on di-
alogue systems and CALL, respectively, tend to
resonate with many students. These topics are
only sketched out insofar as they were relevant
to the NLP technology in question, but this has
the advantage of not being too repetitive for the
few students who have had an introductory lin-
guistics class before.
5.2 Math they can understand
Students also seem to take pride in being able
to solve what originally appear to be difficult
mathematical concepts. To many, the concept
and look of a binary number is alien, but they
consistently find this to be fairly simple. The
basics of finite-state automata and boolean ex-
pressions (even quite complicated expressions)
provide opportunities for students to understand
that they are capable of learning concepts of log-
ical thinking. Students with more interest and
more of an enjoyment for math are encouraged
to go beyond the material and, e.g., figure out
the nature of more complicated finite-state au-
tomata. In this way, more advanced students are
able to stay interested without losing the other
students.
More difficult topics, such as calculating the
minimum edit distance between a word and its
misspelling via dynamic programming, can be
frustrating, but they just as often are a source
of a greater feeling of success for students. After
some in-class exercises, when it becomes appar-
ent that the material is learnable and that there
is a clear, well-motivated point to it, students
generally seem pleased in conquering somewhat
more difficult mathematical concepts.
5.3 Interactive demos
In-class demos of particular software are also
usually well-received, in particular when they
present applications that students themselves
can use. These demos often focus on the end
result of a product, such as simply listening to
the output of several text-to-speech synthesiz-
ers, but they can also be used for understanding
how the applications works. For example, some
sections attempt to figure out as a class where
a spelling checker fails and why. Likewise, an
in-class discussion with ELIZA has been fairly
popular, and students are able to deduce many
of the internal properties of ELIZA.
5.4 Fun materials
In many ways, we have tried to keep the tone
of the course fairly light. Even though we
are teaching mathematical and logical concepts,
these concepts are still connected to the real
world, and as such, there is much opportunity
to present the material in a fun and engaging
manner.
Group work One such way to make the learn-
ing process more enjoyable was to use group
work. In the past few quarters, we have been
refining these exercises. Because of the nature
of the topics, some topics are easier to derive
group exercises for than others. The more math-
ematical topics, such as regular expressions, suit
themselves well for straightforward group work
on problem sets in class; others can be more
19
creative. The group exercises usually serve as a
way for students to think about issues they al-
ready know something about, often as a way to
introduce the topic.
For example, on the first day, they are given
a sheet and asked to evaluate sets of opposing
claims, giving arguments for both sides, such as
the following:
1. A person will have better-quality papers if
they use a spell checker.
A person will have worse-quality papers if
they use a spell checker.
2. An English-German dictionary is the main
component needed to automatically trans-
late from English to German.
An English-German dictionary is not the
main component needed to automatically
translate from English to German.
3. Computers can make you sound like a na-
tive speaker of another language.
Computers cannot make you sound like a
native speaker of another language.
To take another example, to get students
thinking about the social aspects of the use of
language technology, they are asked in groups to
consider some of the implications of a particu-
lar technology. The following is an excerpt from
one such handout.
You work for a large software company
and are in charge of a team of com-
putational linguists. One day, you are
told: “We’d like you and your team to
develop a spell checker for us. Do you
have any questions?” What questions
do you have for your boss?
...
Somehow or another, the details of
your spell checker have been leaked to
the public. This wouldn’t be too bad,
except that it’s really ticked some lin-
guists off. “It’s just a big dictionary!”
they yell. “It’s like you didn’t know
anything about morphology or syntax
or any of that good stuff.” There’s
a rumor that they might sue you for
defamation of linguistics. What do you
do?
Although the premise is somewhat ridiculous,
with such group work, students are able to con-
sider important topics in a relaxed setting. In
this case, they have to first consider the speci-
fications needed for a technology to work (who
will be using it, what the expectations are, etc.)
and, secondly, what the relationship is between
the study of language and designing a product
which is functional.
Fun homework questions In the home-
works, students are often instructed to use a
technology on the internet, or in some way to
take the material presented in class a step far-
ther. Additionally, most homework assignments
had at least one lighter question which allowed
students to be more creative in their responses
while at the same time reinforcing the material.
For example, instructors have asked students
to send them spam, and the most spam-worthy
message won a prize. Other homework ques-
tions have included sketching out what it would
take to convert an ELIZA system into a hostage
negotiator—and what the potential dangers are
in such a use. Although some students put down
minimal answers, many students offer pages of
detailed suggestions to answer such a question.
This gives students a taste of the creativity in-
volved in designing new technology without hav-
ing to deal with the technicalities.
6 Challenges for the course
Despite the positive response, there are several
aspects to the course which have needed im-
provement and continue to do so. Teaching
to a diverse audience of interests and capabili-
ties presents obstacles which are not easily over-
come. To that end, here we will review aspects
of the course which students did not generally
enjoy and which we are in the process of adapt-
ing to better suit our purposes and our students’
needs.
20
6.1 Topics they do not relate to
For such a range of students, there is the diffi-
culty of presenting abstract concepts. Although
we try to relate everything to something which
students actually use or could readily use, we
sometimes include topics from computational
linguistics that make one better able to think
logically in general and which we feel will be
of future use for our students. One such topic
is that of regular expressions, in the context of
searching for text in a document or corpus. As
most students only experience searching as part
of what they do on the web, and no web search
engine (to the best of our knowledge) currently
supports regular expression searching, students
often wonder what the point of the topic is. In
making most topics applicable to everyday life,
we had raised expect. In this particular case,
students seemed to accept regular expressions
more once it they saw that Microsoft Word has
something roughly analogous.
Another difficulty that presented itself for a
subset of the students was that of using for-
eign language text to assist in teaching ma-
chine translation and computer-aided language
learning. Every example was provided with an
English word-by-word gloss, as well as a para-
phrase, yet the examples can still be difficult to
understand without a basic appreciation for the
relevant languages. If the students know Span-
ish, the example is in Spanish and the instruc-
tor has a decent Spanish accent, things can go
well. But students tend to blame difficulties in
the machine translation homework on not know-
ing the languages used in the examples. Under-
standing the distinction between different kinds
of machine translation systems requires some
ability to grasp how languages can differ, so we
certainly must (unless we use proxies like fairy-
tale English) present some foreign material, but
we are in dire need of means to do this as gently
as possible
6.2 Math they do not understand
While some of the more difficult mathemati-
cal concepts were eventually understood, oth-
ers continued to frustrate students. The al-
ready mentioned regular expressions, for exam-
ple, caused trouble. Firstly, even if you do
understand them, they are not necessarily life-
enhancing, unless you are geeky enough to write
your papers in a text editor that properly sup-
ports them. Secondly, and more importantly,
many students saw them as unnecessarily ab-
stract and complex. For instance, some stu-
dents were simply unable to understand the no-
tion that the Kleene star is to be interpreted as
an operator rather than as a special character
occurring in place of any string.
Even though we thought we had calibrated
our expectations to respect the fact that our
students knew no math beyond high school, the
amount that they had retained from high school
was often less than we expected. For exam-
ple, many students behaved exactly as if they
had never seen Venn diagrams before, so time
had to be taken away from the main material
in order to explain them. Likewise, figuring
out how to calculate probabilities for a bag of
words model of statistical machine translation
required a step-by-step explanation of where
each number comes from. A midterm ques-
tion on Bayesian spam filtering needed the same
treatment, revealing that even good students
may have significant difficulties in deploying the
high school math knowledge they almost cer-
tainly possess.
6.3 Technology which did not work
Most assignments required students to use the
internet or the phone in some capacity, usu-
ally to try out a demo. With such tasks, there
is always the danger that the technology will
not work. For example, during the first quar-
ter the course was taught, students were asked
to call the CMU Communicator system and in-
teract with it, to get a feel for what it is like
to interact with a computer. As it turns out,
halfway through the week the assignment was
due, the system was down, and thus some stu-
dents could not finish the exercise. Follow-
ing this episode, homework questions now come
with alternate questions. In this case, if the sys-
tem is down, the first alternate is to listen to a
pre-recorded conversation to see how the Com-
21
municator works. Since some students are un-
able to listen to sounds in the campus computer
labs, the second alternate is to read a transcript.
Likewise, students were instructed to view the
page source code for ELIZA. However, some
campus computer labs at OSU do not allow stu-
dents to view the source of a webpage. In re-
sponse to this, current versions of the assign-
ment have a separate webpage with the source
code written out as plain text, so all students
can view it.
One final note is that students have often com-
plainedof weblinks failingto work, butthis “fail-
ure” is most often due to students mistyping
the link provided in the homework. Providing
links directly on the course webpage or including
them in the web- or pdf-versions of the home-
work sheets is the simplest solution for this prob-
lem.
7 Summary and Outlook
We have described the course Language and
Computers (Linguistics 384), a general introduc-
tion to computational linguistics currently being
taught at OSU. While there are clear lessons
to be learned for developing similar courses at
other universities, there are also more general
points to be made. In courses which assume
some CS background, for instance, it is still
likely the case that students will want to see
some practical use of what they are doing and
learning.
There are several ways in which this course
can continue to be improved. The most pressing
priority is to develop a course packet and pos-
sibly a textbook. Right now, students rely only
on the instructor’s handouts, and we would like
to provide a more in-depth and cohesive source
of material. Along with this, we want to de-
velop a wider range of readings for students (e.g.
Dickinson, to appear) to provide students with
a wider variety of perspectives and explanations
for difficult concepts.
To address the wide range of interests and ca-
pabilities of the students taking this course as a
general education requirement, it would be good
to tailor some of the sections to audiences with
specific backgrounds—but given the lack of a
dedicated free time slot for all students of a par-
ticular major, etc., it is unclear whether this is
feasible in practice.
We are doing reasonably well in integrating
mathematical thinking into the course, but we
would like to give students more experience of
thinking about algorithms. Introducing a ba-
sic form of pseudocode might go some way to-
wards achieving this, provided we can find a mo-
tivating linguistic example that is both simple
enough to grasp and complex enough to justify
the overhead of introducing a new topic. Fur-
ther developments might assist us in developing
a course between Linguistics 384 and Linguistics
684, our graduate-level computational linguis-
tics course, as we currently have few options for
advanced undergraduates.
Acknowledgements We would like to thank
the instructors of Language and Computers for
their discussions and insights into making it a
better course: Stacey Bailey, Anna Feldman, Xi-
aofei Lu, Crystal Nakatsu, and Jihyun Park. We
are also grateful to the two ACL-TNLP review-
ers for their detailed and helpful comments.

References
Markus Dickinson, to appear. Writers’ Aids. In
Keith Brown (ed.), Encyclopedia of Language
and Linguistics. Second Edition, Elsevier, Ox-
ford.
Malcolm Gladwell, 2001. The Social Life
of Paper. New Yorker. available from
http://www.gladwell.com/archive.html.
Lillian Lee, 2002. A non-programming introduc-
tion to computer science via NLP, IR, and
AI. In ACL Workshop on Effective Tools
and Methodologies for Teaching Natural Lan-
guage Processing and Computational Linguis-
tics. pp. 32–37.
A.M. Turing, 1950. Computing Machinery and
Intelligence. Mind, 59(236):433–460.
