File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2151_intro.xml

Size: 5,458 bytes

Last Modified: 2025-10-06 14:06:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2151">
  <Title>Automatic Text Summarization Based on the Global Document Annotation</Title>
  <Section position="3" start_page="0" end_page="917" type="intro">
    <SectionTitle>
2 Global Document Annotation
GDA (Global Document Annotation) is a chal-
</SectionTitle>
    <Paragraph position="0"> lenging project to make WWW texts machineunderstandable on the basis of a new tag set, and to develop content-based presentation, retrieval.</Paragraph>
    <Paragraph position="1"> question-answering, summarization, and translation systems with much higher quality than before. GDA thus proposes an integrated global platform for electronic content authoring, presentation, and reuse.</Paragraph>
    <Paragraph position="2"> The GDA tag set is based on XML (Extensible Markup Language), and designed as compatible as possible with HTML, TEI, EAGLES, and so forth.</Paragraph>
    <Paragraph position="3"> An example of a GDA-tagged sentence is as follows:</Paragraph>
    <Paragraph position="5"> &lt;/adp&gt;&lt;/vp&gt;. &lt;/su&gt; &lt;su&gt; means sentential unit. &lt;n&gt;. &lt;np&gt;. &lt;v&gt;, &lt;vp&gt;. &lt;ad&gt; and &lt;adp&gt; mean noun.  noun phrase, verb, verb phrase, adnoun or adverb (including preposition and postposition), and adnonfinal or adverbial phrase, respectively 1. The GDA initiative aims at having many WWW authors annotate their on-line documents with this common tag set so that machines can automatically recognize the underlying semantic and pragmatic structures of those documents much nmre easily than by analyzing traditional HTML files. A huge amount of annotated data is expected to emerge, which should serve not just as tagged linguistic corpora but also as a worldwide, self-extending knowledge base, mainly consisting of examples showing how our knowledge is manifested.</Paragraph>
    <Paragraph position="6"> GDA has three main steps:  1. Propose an XML tag set which allows machines to automatically infer the underlying structure of documents.</Paragraph>
    <Paragraph position="7"> 2. Pronmte development and spread of NLP/AI applications to turn tagged texts to versatile and intelligent contents.</Paragraph>
    <Paragraph position="8"> 3. Motivate thereby the authors of WWW files to annotate their documents using those tags.</Paragraph>
    <Section position="1" start_page="917" end_page="917" type="sub_section">
      <SectionTitle>
2.1 Themantic/Rhetorical Relations
</SectionTitle>
      <Paragraph position="0"> The tel attribute encodes a relationship in which the current element stands with respect to the element that it semantically depends on. Its value is called a relational term. A relational term denotes a binary relation, which may be a thematic role such as agent, patient, recipient, etc., or a rhetorical relation such as cause, concession, etc. Thus we conflate thematic roles and rhetorical relations here, because the distinction between them is often vague. For instance, concession may be both intrasentential and intersentential relation.</Paragraph>
      <Paragraph position="1"> Here is an example of a re1 attribute: &lt;su ctyp=fd&gt;&lt;name rel=agt&gt;Tom&lt;/name&gt; &lt;vp&gt;came&lt;/vp&gt;. &lt;/su&gt; ctyp=fd means that the first element &lt;name rel=agt&gt;Tom&lt;/name&gt; depends on the second element &lt;vp&gt;came&lt;/vp&gt;. rel=agt means that Tom has the agent role with respect to the event denoted by came.</Paragraph>
      <Paragraph position="2"> re1 is an open-class attribute, potentially encompassing all the binary relations lexicalized in natural languages. An exhaustive listing of thematic roles and rhetorical relations appears impossible, as widely recognized. We are not yet sure about how  many thematic roles and rhetorical relations are sufficient for engineering applications. However. the appropriate granulal~ty of classification will be determined by the current level of technology.</Paragraph>
    </Section>
    <Section position="2" start_page="917" end_page="917" type="sub_section">
      <SectionTitle>
2.2 Anaphora and Coreference
</SectionTitle>
      <Paragraph position="0"> Each element may have an identifier as the value of the id attribute. Anaphoric expression should have the aria attribute with its antecedent's id value. An example follows: &lt;name id=l&gt;John&lt;/name&gt; beats &lt;adp ana=l&gt;his&lt;/adp&gt; dog.</Paragraph>
      <Paragraph position="1"> A non-anaphoric coreference is marked by the crf attribute, whose usage is the same as the ana attl~bute. null When the coreference is at the level of type (kind. sort, etc.) which the referents of the antecedent and the anaphor are tokens of, we use the cotyp attribute as below: You bought &lt;np id=ll&gt;a car&lt;/np&gt;.</Paragraph>
      <Paragraph position="2"> I bought &lt;np cotyp=ll&gt;one&lt;/np&gt;, too.</Paragraph>
      <Paragraph position="3"> A zero anaphora is encoded by using the appropriate relational term as an attribute name with the referent's id value. Zero anaphors of compulsory elements, which describe the internal structure of the events represented by the verbs of adjectives are required to be resolved. Zero anaphors of optional elements such as with reason and means roles may not. Here is an example of a zero anaphora concerning an optional thematic role ben (for beneficiary): Tom visited &lt;name id=lll&gt;Mary&lt;/name&gt;.</Paragraph>
      <Paragraph position="4"> He &lt;v ben=111&gt;brought&lt;/v&gt; a present.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML