File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/w96-0304_intro.xml

Size: 7,656 bytes

Last Modified: 2025-10-06 14:06:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0304">
  <Title>Using Lexical Semantic Techniques to Classify Free-Responses</Title>
  <Section position="2" start_page="20" end_page="22" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> There is a movement in testing to augment the conventional multiple-choice items (i.e., test questions) with short-answer free-response items. Due to the large volume of tests administered yearly by Educational Testing Service (ETS), hand-scoring of these tests with these types of items is costly and time-consuming for practical testing programs. ETS is currently working on natural language understanding systems which could be used for computer-assisted scoring of short-answer free-responses (see Kaplan and Bennett (1994) and Burstein and Kaplan (1995))) The overall goal of our current research is to develop a scoring system that can handle short-answer free-response items. Such a scoring system has to be able to identify the relevant content of a response and assign it to an appropriate content category. Another consideration in the development of a scoring system is that the data sets that are available to us are relatively small, and the responses in these data sets lack lexico-~syntactic patterning. The items which we work with are either experimental, or have been administered as paper-and-pencil exams. In the former case, there is a limited subject pool, and in the latter case, we rely on what has been put into electronic form. The response sets typically range from 300-700 responses which we have to use for training and testing. This is quite a different scenario from natural language understanding systems which can be designed using large corpora from full text sources, such as the AP News and the Wall Street Journal. This paper discusses a case study that examined how lexical semantic techniques could be used to build scoring systems, based on small data sets. Previous attempts to classify these responses using lexically-based statistical techniques and structure-independent content grammars were not reliable (Burstein and Kaplan (1995)). The results of this case study illustrate the reliability of lexical semantic methods.</Paragraph>
    <Paragraph position="1"> For this study, a concept-based lexicon and a concept grammar were built to represent a response set.</Paragraph>
    <Paragraph position="2"> The lexicon can best be characterized by Bergler's (1995) layered lexicon in that the list of lexical entry words and terms can remain constant, while the features associated with each entry are modular, so that they can be replaced as necessary. Concepts in the concept grammars were linked to the lexicon. In this paper, concepts are superordinate terms which contain one or more subordinate, metonymic terms. A prototype was implemented to test our hypothesis that a lexical semantics approach to scoring would yield accurate results.</Paragraph>
    <Paragraph position="3">  2. Test Item Types, Response Sets, and Lexical Semantics</Paragraph>
    <Section position="1" start_page="20" end_page="21" type="sub_section">
      <SectionTitle>
2.1 Test Item Types and Response Sets
</SectionTitle>
      <Paragraph position="0"> Our previous research with regard to language use in test items revealed that different test items use domain-specific language (Kaplan and Bennett (1994)). Lexicons restricted to dictionary knowledge of words are not sufficient for interpreting the meaning of responses for unique items. Concept knowledge bases built from an individual data set of examinee responses can be useful for representing domain-specific language. To illustrate the use of such knowledge bases in the development of scoring systems, linguistic information from the response set of an inferencing item ~In this paper, a response refers to an examinees 15 - 20 word answer to an item which can be either in the form of a complete sentence or sentence fragment.</Paragraph>
      <Paragraph position="1">  will be discussed. For this item type, examinees are reliant on real-world knowledge with regard to item topic, and responses are based on an examinees own ability to draw inferences.</Paragraph>
      <Paragraph position="2"> Responses do not appear show typical features of sublanguage in that there are no domain-specific structures, and the vocabulary is not as restricted. Therefore, sublanguage techniques such as Sager (1981) and Smadja (1993) do not work. In situations where lexico-syntactic patterning is deficient, a lexicon with specified metonymic relations can be developed to yield accurate scoring of response content. We define metonyms as words which can be used in place of one another when they have a domain-specific relation (Gerstl (1991))</Paragraph>
    </Section>
    <Section position="2" start_page="21" end_page="22" type="sub_section">
      <SectionTitle>
2.2 Using Lexical Semantics for Response Representation
</SectionTitle>
      <Paragraph position="0"> Our goal in building a scoring system for free-responses is to be able to classify individual responses by content, as well as to determine when responses have duplicate meaning (i.e., one response is the paraphrase of another response). In previous research, we used a concept-based approach similar to the one described in this study. The difference between the previous system and our current prototype is that in the previous system, concepts were not represented with regard to structure, and the lexicon was domain-independent. The underspecification of concept-structure relationships, and the lack of a domain-specific lexicon degraded the performance of that system (Kaplan and Bennett (1994). A second lexically-based, statistical approach performed poorly for the same reasons described above. The second approach looked at similarity measures between responses based on lexical overlap. Again, structure was not considered, and the lexicon was domain-independent which contributed to the system's poor performance (Burstein and Kaplan (1995)).</Paragraph>
      <Paragraph position="1"> Any system we build must have the ability to analyze the concept-structure patterning in a response, so that response content can be recognized for sconng purposes. Given our small data set, our assumption was that a lexical semantic approach which employed domain-specific language and concept grammars with concept-structure patterns would facilitate reliable scoring. Our hypothesis is that this type of representation would denote the content of a response based on its lexical meanings and their relationship to the syntactic structure of the response.</Paragraph>
      <Paragraph position="2"> It would appear that Jackendoff's (1983) Lexical Conceptual Structure (LCS) representation may be applicable to our problem. These structures are considered to be conceptual universals and have been successfully used by Dorr, et al (1995) and Holland (1994) in natural language understanding tasks.</Paragraph>
      <Paragraph position="3"> Holland points out, however, that LCSs cannot represent domain knowledge, nor can they handle the interpretation of negation and quantification, all of which are necessary in our scoring systems. Holland also states that LCSs could not represent a near-match between the two sentences, The person bought a vehicle, and The man bought a car. As is discussed later in the paper, our scoring systems must be able to deal with such near-match responses. Based on the above-mentioned limitations of LCSs, the use of such representation for scoring systems does not seem compatible with our response classification problem.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML