File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2609_concl.xml
Size: 2,281 bytes
Last Modified: 2025-10-06 13:55:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2609"> <Title>Learning to Identify Definitions using Syntactic Features</Title> <Section position="9" start_page="710" end_page="710" type="concl"> <SectionTitle> 8 Conclusions and future work </SectionTitle> <Paragraph position="0"> We have presented an experiment in identifying definition sentences using syntactic properties and learning-based methods. Our method is concentrated on improving the precision of recognizing definition sentences. The first step is extracting candidate definition sentences from a fully parsed text using syntactic properties of definitions. To distinguish definition from non-definition sentences, we investigated several machine learning methods, namely naive Bayes, maximum entropy, and SVMs. We also experimented with several attribute configurations. In this selection, we combinetextproperties, document properties, andsyntactic properties of the sentences. We have shown that adding syntactic properties, in particular the position of subjects in the sentence, type of determiner of each subject and predicative complement, improves the accuracy of most machine learning techniques, and leads to the most accurate result overall.</Paragraph> <Paragraph position="1"> Our method has been evaluated on a subset of manually annotated data from Wikipedia. The combination of highly structured text material and a syntactic filter leads to a relatively high initial baseline.</Paragraph> <Paragraph position="2"> Our results on the performance of SVMs do not confirm the superiority of this learning method for (text) classification tasks. Naive Bayes, which is well known from its simplicity, appears to give reasonably high accuracy. Moreover, it achieves a high accuracy on simple attribute configuration sets (containing no syntactic properties). In general, our method will give the best result if all properties except named entity classes and root forms are used as attributes and maximum entropy is applied as a classifier.</Paragraph> <Paragraph position="3"> We are currently working on using more syntactic patterns to extract candidate definition sentences. This will increase the number of definition sentences that we can identify from text.</Paragraph> </Section> class="xml-element"></Paper>