File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/p96-1054_concl.xml

Size: 3,517 bytes

Last Modified: 2025-10-06 13:57:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1054">
  <Title>Transitivity and Foregrounding in News Articles: experiments in information retrieval and automatic summarising</Title>
  <Section position="6" start_page="369" end_page="370" type="concl">
    <SectionTitle>
4 Experimental Procedure
</SectionTitle>
    <Paragraph position="0"> The feasibility of using transitivity as a tool in text processing will be assessed by two experiments using the same corpus. Clauses in the corpus must be hand-coded for transitivity. The difficulties encountered in this process will determine the basis for future automation. For the information retrieval task, only the clauses containing query/document matching terms will be coded for transitivity. For the automatic summarising experiment all sentences within a text will be coded.</Paragraph>
    <Paragraph position="1"> For the information retrieval experiment, ten queries are put to a newspaper database: a demonstration system running on WAIS (Wide Area Information Server), carrying two weeks of articles from the Times newspaper from 1993 and 1994. The results of the queries are downloaded in their initial ranked order (ranked by a host ranking algorithm) and re-ranked by a serial batch processor written in C+-t-. The processor identifies the transitivity features associated with each matching clause and produces a ranked output of documents based on the weights assigned to each clause in which the search terms occur. The weights assigned to each clause are  numerically equivalent to the number of transitivity features associated with each clause. The total transitivity weight for an entire document is the sum of clause weights normalised by document length.</Paragraph>
    <Paragraph position="2"> The output dataset consists of a total of 185 news articles, an average of 18.5 per batch. Each set of articles is ranked by volunteers. The articles are ranked for their degree of relevance to a query in two ways: on a scale of one to ten; and comparatively, by the degree of relevance of an article against all other articles. All terms are treated as equal so that discrimination between documents is based purely on accumulative transitivity scores. The performance of the ranking technique is evaluated according to two precision measures: the Spearman rank correlation coefficient (rho) and the CRE (Coefficient of Ranking Effectiveness) (Noreault et al. , 1977).</Paragraph>
    <Paragraph position="3"> For the automatic summarising experiment, ten articles are taken from the corpus at random. Summaries are produced by extracting clauses according to transitivity scores. In the initial implementation, transitivity scores will be equal to the number of transitivity features associated with the main clause of each sentence. The selection of sentences for a summary will be based, initially, on comparative transitivity scores and a reduction factor which will determine the number of sentences selected based on the length of a document.</Paragraph>
    <Paragraph position="4"> Summaries will be analysed and assessed by volunteers for coverage, in terms of the original text, and comprehensibility as a separate text. The summaries will be compared against summaries of the same texts compiled by the syntactic technique mentioned previously and also against summaries consisting of the first paragraph of each news article. The study is currently at the end of the coding stage for the information retrieval experiment.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML