File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1615_intro.xml

Size: 2,451 bytes

Last Modified: 2025-10-06 14:02:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1615">
  <Title>FarsiSum - A Persian text summarizer</Title>
  <Section position="3" start_page="0" end_page="2" type="intro">
    <SectionTitle>
2 SweSum
SweSum
</SectionTitle>
    <Paragraph position="0"> (Dalianis 2000) is a web-based automatic text summarizer developed at the Royal Institute of Technology (KTH) in Sweden. It uses text extraction based on statistical and linguistic as well as heuristic methods to obtain text summarization and its main domain is Swedish HTML-tagged newspaper text  .</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
2.1 SweSum's architecture
</SectionTitle>
      <Paragraph position="0"> SweSum is a client/server application. The summarizer is located on the web server. It takes a Swedish text as input and performs summarization in three phases to create the final output (the summarized text).</Paragraph>
      <Paragraph position="1">  identified by searching for periods, exclamation and question marks etc (with the exception of when periods occur in known abbreviations). The sentences are then scored by using statistical, linguistic and heuristic methods. The scoring depends on, for example, the position of the sentence in the text, numerical values in and  SweSum is also available for English, Danish, Norwegian, Spanish, French, German, and now with the implementation described in this paper, Farsi. various formatting of the sentence such as bold, headings, etc.</Paragraph>
      <Paragraph position="2">  word in the sentence is calculated and added to the sentence score. Sentences containing common content words get higher scores.</Paragraph>
      <Paragraph position="3"> Pass 3: In the third pass, the final summary file (HTML format) is created. This file includes: * The highest ranking sentences up to a pre-set threshold.</Paragraph>
      <Paragraph position="4"> * Optionally, statistical information about the summary, i.e. the number of words, number of lines, the most frequent keywords, actual compression rate etc.</Paragraph>
      <Paragraph position="5"> For most languages SweSum uses a static lexicon containing many high frequent open class words. The lexicon is a data structure for storing key/value pairs where the key is the inflected word and the value is the stem/root of the word. For example boy and boys have different inflections but the same root (lemma).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML