File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-1032_concl.xml
Size: 1,629 bytes
Last Modified: 2025-10-06 13:53:30
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1032"> <Title>Frequency Estimates for Statistical Word Similarity Measures</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> Using a large corpus and human-oriented tests we describe a comprehensive study of word similarity measures and co-occurrence estimates, including variants on corpus size. Without any parameter training, we were able to correctly answer at least 75% questions in all test sets. From all combinations of estimates and measures, document retrieval with a maximum window of 16 words and pointwise mutual information performs best on average in the three test sets used. However, both document or windows-oriented approach for frequency estimates produce similar results in average. The impact of the corpus size is not very conclusive, it suggests that the increase in the corpus size normally reaches an asymptote, but the points where this occurs is distinct among different measures and frequency estimates.</Paragraph> <Paragraph position="1"> Our results outperform the previously reported results on test sets when no context is used, being able to correctly answer 81.25% of TOEFL synonym questions, compared with a previous best result of 73.5%. A human average score on the same type of questions is 64.5% (Landauer and Dumais, 1997). We also perform better than previous work on another test set used as practice questions for TOEFL, obtaining 80% correct answers compared to a best result of 74% from previous work.</Paragraph> </Section> class="xml-element"></Paper>