File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2084_concl.xml

Size: 2,699 bytes

Last Modified: 2025-10-06 13:55:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2084">
  <Title>Combining Association Measures for Collocation Extraction</Title>
  <Section position="8" start_page="657" end_page="657" type="concl">
    <SectionTitle>
6 Conclusions and discussion
</SectionTitle>
    <Paragraph position="0"> We created and manually annotated a reference data set consisting of 12 232 Czech dependency bigrams. 20.9% of them were agreed to be a collocation by three annotators. We implemented 82 association measures, employed them for collocation extraction and evaluated them against the reference data set by averaged precision-recall curves and mean average precision in five-fold cross validation. The best result was achieved by a method measuring cosine context similarity in boolean vector space with mean average precision of 66.49%.</Paragraph>
    <Paragraph position="1"> We exploit the fact that different subgroups of collocations have different sensitivity to certain association measures and showed that combining these measures aids in collocation extraction. All investigated methods significantly outperformed individual association measures. The best results were achieved by a simple neural network with five units in the hidden layer. Its mean average precision was 80.81% which is 21.53% relative improvement with respect to the best individual measure. Using more complex neural networks or a quadratic separator in support vector machines led to overtraining and did not improve the performace on test data.</Paragraph>
    <Paragraph position="2"> We proposed a stepwise feature selection algorithm reducing the number of predictors in combination models and tested it with the neural network. We were able to reduce the number of its variables from 82 to 17 without significant degradation of its performance.</Paragraph>
    <Paragraph position="3"> No attempt in our work has been made to select the &amp;quot;best universal method&amp;quot; for combining association measures nor to elicit the &amp;quot;best association measures&amp;quot; for collocation extraction. These tasks depend heavily on data, language, and notion of collocation itself. We demonstrated that combining association measures is meaningful and improves precission and recall of the extraction procedure and full performance improvement can be achieved by a relatively small number of measures combined.</Paragraph>
    <Paragraph position="4"> Preliminary results of our research were already published in Pecina (2005). In the current work, we used a new version of the Prague Dependecy Treebank (PDT 2.0, 2006) and the reference data was improved by additional manual anotation by two linguists.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML