XML Viewer - w01-1011

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/w01-1011_concl.xml
Size: 7,489 bytes
Last Modified: 2025-10-06 13:53:05
<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1011">
  <Title>GIST-IT: Summarizing Email Using Linguistic Knowledge and Machine Learning Evelyne Tzoukermann</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
4 Evaluation and Experimental Results
</SectionTitle>
    <Paragraph position="0"> Since there are many different summaries for each document, evaluating summaries is a difficult problem. Extracting the salient noun phrases is the first key step in the summarization method that we adopt in this paper. Thus, we focus on evaluating the performance of GIST-IT on this task, using three classification schemes and two different feature settings.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Evaluation Scheme
</SectionTitle>
      <Paragraph position="0"> There are several questions that we address in this paper:  features are important in determining the degree of salience of an NP? Following our assumption that each constituent of the noun phrase is equally meaningful, we evaluate the impact of adding m_htfidf (see section 3.2.3), as an additional feature in the feature vector. This is shown in Table 2 in the different feature vectors fv1 and fv2.</Paragraph>
      <Paragraph position="1"> fv1- head_focc head_tfidf np_focc np_tfidf np_length_words np_length_chars par_pos sent_pos fv2 - head_focc head_tfidf m_htfidf np_focc np_tfidf np_length_words np_length_chars par_pos sent_pos Table 2 Two feature settings to evaluate the impact of m_htfidf  adequate to our task? We evaluate the performance of three different classifiers in the task of extracting salient noun phrases. As measures of performance we use precision (p) and recall (r). The evaluation was performed according to what degree the output of the classifiers corresponds to the user judgments.</Paragraph>
      <Paragraph position="2">  three classifiers Table 3 shows our results that answer these two questions. The table rows represent the two feature vectors we are comparing, and the columns correspond to the three classifiers chosen for the evaluation. 4.1.3 Is linguistic filtering an important step in extracting salient NPs? In the third evaluation we analyse the impact of linguistic filtering on the classifier's performance. It turns out that results show major improvements, from 69.2% to 85.7% for precision of fv2, and from 56.25% to 87.9% for recall of fv2. For detailed results, see [Muresan et al, (2001)].</Paragraph>
      <Paragraph position="3"> 4.1.4 After the filtering and classification, are noun phrases good candidates for representing the gist of an email message? In order to answer this question, we compare the output of GIST-IT on one email with the results of KEA system [Witten et al (1999)] that uses a 'bag-of-words' approach to key phrase extraction (see Table 4).</Paragraph>
      <Paragraph position="4">  The results shown indicate that best system performance reached 87.9% recall and 85.7% precision. Although these results are very high, judging NP relevance is a complex and highly variable task. In the future, we will extend the gold standard with more judges, more data, and thus a more precise standard for measurement.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 The right selection of features
</SectionTitle>
      <Paragraph position="0"> Feature selection has a decisive impact on overall performance. As seen in Table 2, fv2 has m_htfidf as an additional feature, and its performance shown in Table 3 is superior to fv1; the DFC classifier shows an increase both in precision and recall. These results support the original hypothesis that in the context of gisting, the syntactic head of the noun phrase is not always the semantic head, and modifiers can also have an important role.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Different classification models
</SectionTitle>
      <Paragraph position="0"> The effectiveness of different classification schemes in the context of our task is discussed here. As shown in Table 3, C4.5 performs well especially in terms of recall. RIPPER, as discussed in [Cohen (1995)], is more appropriate for noisy and sparse data collection than C4.5, showing an improvement in precision.</Paragraph>
      <Paragraph position="1"> Finally, DFC which is a combination of classifiers, shows improved performance.</Paragraph>
      <Paragraph position="2"> The classifier was run with an augumented feature vector that included pairwise sums, differences and products of the features.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Impact of linguistic knowledge
</SectionTitle>
      <Paragraph position="0"> As shown in previous section, DFC performed best in our task, so we chose only this classifier to present the impact of linguistic knowledge. Linguistic filtering improved precision and recall, having an important role especially on fv2, where the new feature m_tfidf was used. This is explained by the fact that the filtering presented in section 3.1.2 removed the noise introduced by unimportant modifiers, common and empty nouns, thus giving this new feature a larger impact.</Paragraph>
      <Paragraph position="1"> 5.4 Noun phrases are better than n-grams Presenting the gist of an email message by phrase extraction addresses one obvious question: can any phrasal extract represent the content of a document, or must a well defined linguistic phrasal structure be used? To answer this question we compare the results of our system that extract linguistically principled phrasal units, with KEA output, that extracts bigrams and trigrams as key phrases [Witten et al (1999)]. Table 4 shows the results of the KEA system.</Paragraph>
      <Paragraph position="2"> Due to the n-gram approach, KEA output contains phrases like sort of batch, extracting lots, wn, and even urls that are unlikely to represent the gist of a document.</Paragraph>
      <Paragraph position="3"> Conclusion and future work In this paper we presented a novel technique for document gisting suitable for domain and genre independent collections such as email messages. The method extracts simple noun phrases using linguistic techniques and then use machine learning to classify them as salient for the document content. We evaluated the system in different experimental settings using three classification models. In analyzing the structure of NPs, we demonstrated that the modifiers of a noun phrase can be semantically as important as the head for the task of gisting. GIST-IT is fully implemented, evaluated, and embedded in an application, which allows user to access a set of information including email, finances, etc.</Paragraph>
      <Paragraph position="4"> We plan to extend our work by taking advantage of structured email, by classifying messages into folders, and then by applying information extraction techniques. Since NPs and machine learning techniques are domain and genre independent, we plan to test GIST-IT on different data collections (e.g. web pages), and for other knowledge management tasks, such as document indexing or query refinement.</Paragraph>
      <Paragraph position="5"> Additionally, we plan to test the significance of the output for the user, i.e. whether the system provide informative content and adequate gist of the message.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML