A non-programming introduction to computer science via
NLP, IR, and AI
Lillian Lee
Department of Computer Science
Cornell University
Ithaca, NY, USA, 14853-7501
llee@cs.cornell.edu
Abstract
This paper describes a new Cornell
University course serving as a non-
programming introduction to com-
puter science, with natural language
processing and information retrieval
forming a crucial part of the syllabus.
Material was drawn from a wide vari-
ety of topics (such as theories of dis-
course structure and random graph
models of the World Wide Web) and
presented at some technical depth, but
was massaged to make it suitable for
a freshman-level course. Student feed-
back from the flrst running of the class
was overall quite positive, and a grant
from the GE Fund has been awarded
to further support the course’s devel-
opment and goals.
1 Introduction
Algorithmic concepts and programming tech-
niques from computer science are very useful to
researchers in natural language processing. To
ensure the continued strength of our fleld, then,
it is important to encourage undergraduates in-
terested in NLP to take courses conveying com-
puter science content. This is especially true for
students not intending to become computer sci-
ence majors.
Usually, one points beginning students inter-
ested in NLP towards the flrst programming
course (henceforth \CS101"). However, at many
institutions, CS101 is mandatory for a large por-
tion of the undergraduates (e.g., all engineering
students) and is designed primarily to trans-
mit speciflc programming skills. Experience
suggests that a signiflcant fraction of students
flnd CS101’s emphasis on skills rather than con-
cepts unstimulating, and therefore decide not to
take further computer science courses. Unfortu-
nately, down the road this results in fewer en-
tering NLP graduate students having been ed-
ucated in important advanced computer-science
concepts. Furthermore, fewer students are intro-
duced to NLP at all, since the subject is often
presented only in upper-level computer-science
classes.
In an attempt to combat these problems, I cre-
ated a new freshman-level course, Computation,
Information, and Intelligence1, designed to in-
troduce entering undergraduates to some of the
ideas and goals of AI (and hence computer sci-
ence). The premise was that if freshmen flrst
learned something of what artiflcial intelligence
is about, what the technical issues are, what
has been accomplished, and what remains to be
done, then they would be much more motivated
when taking CS101, because they would under-
stand what they are learning programming for.
Three major design decisions were made at
the outset:
† No programming: Teaching elementary pro-
gramming would be a needless reduplication of
efiort, since programming pedagogy is already
well-honed in CS101 and other such classes.
Moreover, it was desirable to attract students
having little or no programming experience: the
new course would ofier them an opportunity for
1http://www.cs.cornell.edu/courses/cs172/2001fa
                     July 2002, pp. 33-38.  Association for Computational Linguistics.
              Natural Language Processing and Computational Linguistics, Philadelphia,
         Proceedings of the Workshop on Effective Tools and Methodologies for Teaching
initial exploration at a conceptual level. Indeed,
for the flrst edition of the class, students with
programming experience were actively discour-
aged from enrolling, in order to ensure a more
level playing fleld for those without such back-
ground.2
† Emphasis on technically challenging material:
Although no programming would be involved,
the course would nevertheless bring students
face-to-face with substantial technical material
requiring mathematical and abstract reasoning
(indeed, topics from graduate courses in NLP
and machine learning were incorporated). To
achieve this aim, the main coursework would
involve challenging pencil-and-paper problems
with signiflcant mathematical content.3
Of course, one had to be mindful that the in-
tended audience was college freshmen, and thus
one could only assume basic calculus as a pre-
requisite. Even working within this constraint,
though, it was possible to craft problem sets
and exams in which students explored concepts
in some depth; the typical homework problem
asked them not just to demonstrate comprehen-
sion of lecture material but to investigate alter-
native proposals. Sample questions are included
in the appendix.
† Substantial NLP and IR content:4 Because
many students have a lot of experience with
search engines, and, of course, all students have
a great deal of experience with language, NLP
and IR are topics that freshmen can easily relate
to without being introduced to a lot of back-
ground flrst.
2Students who have programmed previously are more
likely to happily enroll in further computer science
courses, and thus are already well-served by the standard
curriculum.
3An alternative to a technically- and mathematically-
oriented course would have been a \computers and the
humanities" class, but Cornell already ofiers classes on
the history of computing, the philosophy of AI, and the
social implications of living in an information society.
One of the goals for Computation, Information, and In-
telligence was that students learn what \doing AI" is re-
ally like.
4In this class, I treated information retrieval as a spe-
cial type of NLP for simplicity’s sake.
2 Course content
The course title, Computation, Information,
and Intelligence, re ects its organization, which
was inspired by Herb Simon’s (1977) statement
that \Knowledge without appropriate proce-
dures for its use is dumb, and procedure without
suitable knowledge is blind." More speciflcally,
the flrst 15 lectures were mainly concerned with
algorithms and computation (game-tree search,
perceptron learning, nearest-neighbor learning,
Turing machines, and the halting problem). For
the purposes of this workshop, though, this pa-
per focuses on the remaining 19 lectures, which
were devoted to information, and in particular,
to IR and NLP. As mentioned above, sample
homework problems for each of the units listed
below can be found in the appendix.
We now outline the major topics of the last
22 lectures. Observe that IR was presented be-
fore NLP, because the former was treated as a
special, simpler case of the latter; that is, we
flrst treated documents as bags of words before
considering relations between words.
Document retrieval [3 lectures]. Students
were flrst introduced to the Boolean query re-
trieval model, and hence to the concepts of index
data structures (arrays and B-trees) and binary
search. We then moved on to the vector space
model5, and considered simple term-weighting
schemes like tf-idf.
The Web [4 lectures]. After noting how Van-
nevar Bush’s (1945) famous \Memex" article an-
ticipated the development of the Web, we stud-
ied the Web’s global topology, brie y consider-
ing the implications of its so-called \bow-tie"
structure (Broder et al., 2000) for web crawlers
| students were thus introduced to graph-
theoretic notions of strongly-connected compo-
nents and node degrees. Then, we investigated
Kleinberg’s (1998) hubs and authorities algo-
rithm as an alternative to mere in-link counting:
5This does require some linear algebra background in
that one needs to compute inner products, but this was
covered in the section of the course on perceptrons. Since
trigonometry is actually relatively fresh in the minds of
flrst-year students, their geometric intuitions tended to
serve them fairly well.
fortunately, the method is simple enough that
students could engage in hand simulations. Fi-
nally, we looked at the suitability of various ran-
dom graph generation models (e.g., the \rich-
get-richer" (Barab¶asi et al., 1999) and \copy-
ing" models (Kumar et al., 2000)) for capturing
the local structure of the Web, such as the phe-
nomenon of in-degree distributions following a
power law | conveniently, these concepts could
be presented in such a way that only required
the students to have intuitive notions of proba-
bility and the ability to take derivatives.
Language structure [7 lectures]. Relying on
students’ instincts about and experience with
language, we considered evidence for the exis-
tence of hidden language structure; such clues
included possible and impossible syntactic and
discourse ambiguities, and movement, prosody
and pause cues for constituents. To describe
this structure, we formally deflned context-free
grammars. We then showed how (a tiny frag-
ment of) X-bar theory can be modeled by a
context-free grammar and, using its structural
assignments and the notion of heads of con-
stituents, accounted for some of the ambiguities
and non-ambiguities in the linguistic examples
we previously examined.
The discussion of context-free grammars nat-
urally led us to pushdown automata (which pro-
vided a nice contrast to the Turing machines
we studied earlier in the course). And, hav-
ing thus introduced stacks, we then investigated
the Grosz and Sidner (1986) stack-based theory
of discourse structure, showing that language
structures exist at granularities beyond the sen-
tence level.
Statistical language processing [6 lectures]
We began this unit by considering word fre-
quency distributions, and in particular, Zipf’s
law | note that our having studied power-law
distributions in the Web unit greatly facilitated
this discussion. In fact, because we had pre-
viously investigated generative models for the
Web, it was natural to consider Miller’s (1957)
\monkeys" model which demonstrates that very
simple generative models can account for Zipf’s
law. Next, we looked at methods taking advan-
tage of statistical regularities, including the IBM
Candide statistical machine translation system,
following Knight’s (1999) tutorial and treating
probabilities as weights. It was interesting to
point out parallels with the hubs and authorities
algorithm | both are iterative update proce-
dures with auxiliary information (alignments in
one case, hubs in the other). We also discussed
an intuitive algorithm for Japanese segmenta-
tion drawn from one of my own recent research
collaborations (Ando and Lee, 2000), and how
word statistics were applied to determining the
authorship of the Federalist Papers (Mosteller
and Wallace, 1984). We concluded with an ex-
amination of human statistical learning, focus-
ing on recent evidence indicating that human
infants can use statistics when learning to seg-
ment continuous speech into words (Safiran et
al., 1996).
The Turing test [2 lectures] Finally, we
ended the course with a consideration of intel-
ligence in the large. In particular, we focused
on Turing’s (1950) proposal of the \imitation
game", which can be interpreted as one of the
flrst appearances of the claim that natural lan-
guage processing is \AI-complete", and Searle’s
(1980) \Chinese Room" rebuttal that  uent lan-
guage behavior is not a su–cient indication of
intelligence. Then, we concluded with an exam-
ination of the flrst running of the Restricted Tur-
ing Test (Shieber, 1994), which served as an ob-
ject lesson as to the importance of careful eval-
uation in NLP, or indeed any science.
3 Experience
Twenty-three students enrolled, with only one-
third initially expressing interest in majoring in
computer science. By the end, I was approached
by four students asking if there were research op-
portunities available in the topics we had cov-
ered; interestingly, one of these students had
originally intended to major in electrical engi-
neering. Furthermore, because of the class’s
promise in drawing students into further com-
puter science study, the GE Fund awarded a
grant for the purpose of bringing in a senior out-
side speaker and supporting teaching assistants
in future years.
One issue that remains to be resolved is the
lack, to my knowledge, of a textbook or text-
books that would both cover the syllabus topics
and employ a level of presentation suitable for
freshmen. For flrst-year students to learn efiec-
tively, some sort of reference seems crucial, but
a signiflcant portion of the course material was
drawn from research papers that would proba-
bly be too di–cult. In the next edition of the
course, I plan to write up and distributeformal
lecture notes.
Overall, although Computation, Information,
and Intelligence proved quite challenging for the
students, for the most part they felt that they
had learned a lot from the experience, and based
on this evidence and the points outlined in the
previous paragraph, I believe that the course did
make deflnite progress towards its goal of inter-
esting students in taking further computer sci-
ence courses, especially in AI, IR, and NLP.
Acknowledgments
I thank my chair Charles Van Loan for en-
couraging me to develop the course described
in this paper, for discussing many aspects of
the class with me, and for contacting the GE
Fund, which I thank for supplying a grant sup-
porting the future development of the class.
Thanks to Jon Kleinberg for many helpful dis-
cussions, especially regarding curriculum con-
tent, and to the anonymous reviewers for their
feedback. Finally, I am very grateful to my
teaching assistants, Amanda Holland-Minkley,
Milo Polte, and Neeta Rattan, who helped im-
mensely in making the flrst outing of the course
run smoothly.

References
Rie Kubota Ando and Lillian Lee. 2000. Mostly-
unsupervised statistical segmentation of Japanese.
In First Conference of the North American Chap-
ter of the Association for Computational Linguis-
tics (NAACL), pages 241{248.
Albert-L¶aszl¶o Barab¶asi, R¶eka Albert, and Hawoong
Jeong. 1999. Mean-fleld theory for scale-free ran-
dom networks. Physica, 272:173{187.
Andrei Broder, Ravi Kumar, Farzin Maghoul, Prab-
hakar Raghavan, Sridhar Rajagopalan, Raymie
Stata, Andrew Tomkins, and Janet Wiener. 2000.
Graph structure in the web. In Proceedings of the
Ninth International World Wide Web Conference,
pages 309{430.
Vannevar Bush. 1945. As we may think. The At-
lantic Monthly, 176(1):101{108.
Ralph Grishman. 1986. Computational Linguistics:
An Introduction. Studies in Natural Language
Processing. Cambridge.
Barbara J. Grosz and Candace L. Sidner. 1986. At-
tention, intentions, and the structure of discourse.
Computational Linguistics, 12(3):175{204.
Jon Kleinberg. 1998. Authoritative sources in a hy-
perlinked environment. In Proceedings of the 9th
ACM-SIAM Symposium on Discrete Algorithms
(SODA), pages 668{677.
Kevin Knight. 1999. A statistical MT tutorial work-
book. http://www.isi.edu/natural-language/-
mt/wkbk.rtf, August.
Ravi Kumar, Prabhakar Raghavan, Sridhar Ra-
jagopolan, D. Sivakumar, Andrew Tomkins, and
Eli Upfal. 2000. Stochastic models for the web
graph. In Proceedings of the 41st IEEE Sympo-
sium on the Foundations of Computer Science,
pages 57{65.
George A. Miller. 1957. Some efiects of intermittent
silence. American Journal of Psychology, 70:311{
313.
Frederick Mosteller and David L. Wallace. 1984. Ap-
plied Bayesian and Classical Inference: The Case
of the Federalist Papers. Springer-Verlag.
Jenny R. Safiran, Richard N. Aslin, and Elissa L.
Newport. 1996. Statistical learning by 8-month-
old infants. Science, 274(5294):1926{1928, De-
cember.
John R. Searle. 1980. Minds, brains, and programs.
Behavioral and Brain Sciences, 3(3):417{457.
Stuart M. Shieber. 1994. Lessons from a re-
stricted Turing test. Communications of the
ACM, 37(6):70{78.
Herb A. Simon. 1977. Artiflcial intelligence systems
that understand. In Proceedings of the Fifth In-
ternational Joint Conference on Artiflcial Intelli-
gence, volume 2, pages 1059{1073.
Alan M. Turing. 1950. Computing machinery and
intelligence. Mind, LIX:433{60.
