File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2609_intro.xml
Size: 3,657 bytes
Last Modified: 2025-10-06 14:04:05
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2609"> <Title>Learning to Identify Definitions using Syntactic Features</Title> <Section position="2" start_page="0" end_page="64" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Answering definition questions is a challenge for question answering systems. Much work in QA hasfocused on answering factoid questions, which are characterized by the fact that given the question, one can typically make strong predictions about the type of expected answer (i.e. a date, name of a person, amount, etc.). Definition questions require a different approach, as a definition can be a phrase or sentence for which only very global characteristics hold.</Paragraph> <Paragraph position="1"> In the CLEF 2005 QA task, 60 out of 200 questions were asking for the definition of a named entity (a person or organization) such as Who is Goodwill Zwelithini? or What is IKEA? Answers are phrases such as current king of the Zulu nation, or Swedish home furnishings retailer. For answering definition questions restricted to named entities, itgenerally sufficestosearch fornounphrases consisting of the named entity and a preceding or following nominal phrase. Boumaetal. (2005) extract all such noun phrases from the Dutch CLEF corpus off-line, and return the most frequent heads of co-occurring nominal phrases expanded with adjectival or prepositional modifiers as answer to named entity definition questions. The resulting system answers 50% of the CLEF 2005 definition questions correctly.</Paragraph> <Paragraph position="2"> For a Dutch medical QA system, which is being developed as part of the IMIX project1, several sets of test questions were collected. Approximately 15% ofthequestions are definition questions, such as What is a runner's knee? and What is cerebrovascular accident?. Answers to such questions (asking for the definition of a concept) are typically found in sentences such as A runner's knee is a degenerative condition of the cartilage surface of the back of the knee cap, or patella or A cerebrovascular accident is a decrease in the number of circulating white blood cells (leukocytes) in the blood. One approach to finding answers to concept definitions simply searches the corpus for sentences consisting of a subject, a copular verb, and a predicative phrase. If the concept matches the subject, the predicative phrase can be returned as answer. A preliminary evaluation of this technique in Tjong Kim Sang et al. (2005) revealed that only 18% of the extracted sentences (from a corpus consisting of a mixture of encyclopedic texts and web documents) is actually a definition.</Paragraph> <Paragraph position="3"> For instance, sentences such as RSI is a major problem in the Netherlands, every suicide attempt is an emergency or an infection of the lungs is the most serious complication are of the relevant syntactic form, but do not constitute definitions.</Paragraph> <Paragraph position="4"> In this paper, we concentrate on a method for improving the precision of recognizing definition sentences. In particular, we investigate to what extent machine learning techniques can be used to distinguish definitions from non-definitions in a corpus of sentences containing a subject, copular verb, and predicative phrase. A manually annotated subsection of the corpus was divided into definition and non-definition sentences. Next, we trained various classifiers using unigram and bi-gram features, and various syntactic features. The bestclassifier achieves a60% error reduction compared to our baseline system.</Paragraph> </Section> class="xml-element"></Paper>