File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0207_intro.xml

Size: 7,793 bytes

Last Modified: 2025-10-06 14:01:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0207">
  <Title>Pasteur's Quadrant, Computational Linguistics, LSA, Education</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Introduction
</SectionTitle>
    <Paragraph position="0"> In my outsider's opinion--I'm not a linguist and this is my first ACL meeting--this workshop marks an important turn in the study of language.</Paragraph>
    <Paragraph position="1"> Here is why I think so.</Paragraph>
    <Paragraph position="2"> Donald Stokes, in Pasteur's Quadrant (1997), argues that the standard view that science progresses from pure to applied research to engineering implementations is often wrong. This doctrine was the brainchild of Vannevar Bush, who was Roosevelt's science advisor during war II. It has, of course, since been enshrined in the DoD's 6.1,2,3 funding structure, and modeled in the national research institutes and large industrial laboratories such as Bell Labs, IBM and Microsoft. Stokes shows that while this trajectory is sometimes followed, often with dramatic success, over the whole course of scientific advance it has been the exception rather than the rule, and for good reasons. Stokes summarized his view of the real relations in a two by two table much like the one in the figure, in which I have made a few minor additions and modifications.</Paragraph>
    <Paragraph position="3"> Pure research Pasteur's quadrant (random walk research) Pragmatic engineering  conception of science, slightly modified.</Paragraph>
    <Paragraph position="4"> The upper left quadrant is &amp;quot;pure&amp;quot; research, driven by a desire to understand nature, its problems chosen by what natural phenomena are most pervasive, mysterious or intuitively interesting.</Paragraph>
    <Paragraph position="5"> Particle physics is its standard bearer. The lower right quadrant is empirical engineering, incremental cut and try, each improvement based on lessons learned from the successes and failures of previous attempts. Internal combustion engines are a type case.</Paragraph>
    <Paragraph position="6"> The upper right quadrant, Pasteur's, is research driven by the desire to solve practical problems, for Pasteur preventing the spoilage of vinegar, beer, wine and milk, and conquering diseases in silkworms, sheep, chickens, cattle and humans. Such problems inspire and set concrete goals for research. To solve them it is often necessary to delve into empirical facts and first causes. The quadrant also offers an important way to evaluate scientific success; because failure proves a lack of full understanding.</Paragraph>
    <Paragraph position="7"> Stokes doesn't name the lower left quadrant, but it might be dubbed &amp;quot;random walk&amp;quot; science. It resembles theological scholasticism, where the next problem is chosen by flaws in the answer to the last. In my field, cognitive psychology, it is exemplified by 100 years of experiment, thousands of papers, and dozens of quantitative models about how people remember lists of words.</Paragraph>
    <Paragraph position="8"> Of course, these activities bleed into one another and sometimes evince the Bush progression.</Paragraph>
    <Paragraph position="9"> Even list learning has produced basic principles that can be used effectively in education and the treatment of dementia. Nonetheless, the argument is that efforts in Pasteur's quadrant, because they avoid the dangers of excessive-abstraction, simplification and irrelevance, are the most productive, both of scientific advance and of practical value.</Paragraph>
    <Paragraph position="10"> I believe that the Pasteur attitude is especially important in psychology, because identifying problems that are critical for understand the human mind is anything but easy. Human minds do many unique and currently unexplainable things.</Paragraph>
    <Paragraph position="11"> Their first-cause mechanisms are hidden deeply in the intricate connections of billions neurons and billions of experiences. Better keys to the secrets of the mind are needed than hunches of the kind that have motivated list-learning research. To be surer that what we study is actually relevant to the real topic of interest we need to try to solve problems at the level of normal, representative mental functions. Although there are other good candidates, such as automobile driving and economic decision making, education is particularly apt. This is partly because cognitive psychology already knows quite a lot about learning, but more importantly because education is the primary venue in which society intentionally focuses on making a cognitive function happen well, and where success and failure can tell us what we do and do not know, and do so with some guarantee that the knowing is important to understanding the target phenomena.</Paragraph>
    <Paragraph position="12"> It seems to me that computational linguistics is in much the same position. Much traditional linguistics has concerned itself with descriptions of abstract properties of language whose actual role in the quotidian human use of language is not often studied, and, therefore, whose promise to explain how language is acquired and works for its users is sometimes hard to evaluate. Computational linguistics itself appears to have been devoted mostly to the upper left and lower right quadrants; on one hand it has spent much of its effort automating or supporting traditional linguistic analyses such as parsing, part-of-speech tagging and semantic role classification. On the other hand, it has developed practical tools, such as dictionaries, ontologies and n-gram language models for doing practical language engineering tasks, such as speech-to-text conversion and machine translation. There has been relatively little effort to use the successes and failures of computer automations to guide, illuminate, or test models of how human language works.</Paragraph>
    <Paragraph position="13"> This workshop, represents an important step northeast in Stokes' map. Not only is education accomplished primarily through the use of language, it is also a critical source of advanced abilities to use language-- reading, writing, and thinking--and is the primary medium by which the fruits of education are made useful. Thus trying to improve education is just the kind of thing that the Pasteur approach exploits, compelling reasons to understand, a laboratory for exploration, and strong, broad, relevant tests of success. Putting this argument starkly, it is too easy to treat language as an isolated abstract system and ignore its functional role in human life, and it is too easy to treat education as a humanity, where abstract philosophical arguments, ethical principles or historical precedent guide practice. Attempts to enhance the role of language in education through computation, which makes exquisitely specific what we are doing, should lead to new understanding of the nature of language--and vice versa.</Paragraph>
    <Paragraph position="14"> Now for a few words on my own work, and some ways in which it has, at least in part, followed the Pasteur path, plus a few words on how computational linguistics in education might make use of some of its outcomes. This will be take the form of a review of Latent Semantic Analysis (LSA): its origins and history, its computationally simulated mental mechanisms, its applications in education, and some implications it may have for understanding how the mind does language. I'll briefly describe where LSA came from, how it works, what it does and doesn't do, some educational applications in which what it does is useful, some things that limit its usefulness and beg for better basic science, and some nitty-gritty on how and how not to apply it.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML