File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-0209_evalu.xml

Size: 5,455 bytes

Last Modified: 2025-10-06 13:59:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0209">
  <Title>Exploiting semantic information for manual anaphoric annotation in Cast3LB corpus</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Tools
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 3LB-SAT
</SectionTitle>
      <Paragraph position="0"> 3LB-SAT (Semantic Annotation Tool) is a tool for the semantic tagging of multilingual corpora. Main features of this tool are: + it is word-oriented, + it allows different format for input corpus; basically, the main formats used in corpus annotation: treebank format (TBF) and XML format; null + it uses EuroWordNet as a lexical resource. For the XML format a DTD has been defined, that allows to describe the information structure in each file of the corpus.</Paragraph>
      <Paragraph position="1"> In the annotation process, monosemic words are automatically annotated. So, 3LB-SAT is used to annotated only the polysemic words. When a file is loaded, all lemmas of the file are shown (Figure 1). The tool uses different colors to indicate the state of the annotation process: (i) no occurrence of the lemma in the file has been annotated, (ii) some occurrences of the lemma in the file have been annotated, or (iii) all the occurrences have been annotated. When the annotator selects a lemma, all its occurrences are shown. The selection of one of</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 3LB-RAT
</SectionTitle>
      <Paragraph position="0"> 3LB-RAT (Reference Annotation Tool) is a tool developed in 3LB project for the annotation and supervision of anaphora and coreferences at discourse level.</Paragraph>
      <Paragraph position="1"> The tool provides the annotator with two working ways: manual and semiautomatic. In the first one, the tool locates and shows all possible anaphoric and coreference elements and their possible antecedents. The annotator chooses one of these possible antecedents and indicates the certainty degree on this selection (standby, certain or uncertain).</Paragraph>
      <Paragraph position="2"> There are some exceptional cases that the tool always offers: + cases of cataphora, + possible syntactic mistakes (that will be used to review and to correct the syntactic annotation), + the possibility of a non-located antecedent, + the possibility that an antecedent doesn't appear explicitly in the text, + the possibility of non-anaphora, that is, the system has not correctly located an anaphoric expression. null In the semiautomatic way, the tool solves each coreference by means of the enriched resolution anaphora method previously explained. So the system proposes and shows the most suitable candidate to the annotator. The annotator can choose the solution that the resolution method offers in all cases, or choose another solution (manually).</Paragraph>
      <Paragraph position="3"> 3LB-RAT has been developed in Python language, which guarantees the portability to any Windows or Unix platform. It deals with XML files: it is designed to work and to understand the format used by the 3LB-SAT tool, but it is able to accept any other XML specification.</Paragraph>
      <Paragraph position="4"> As we said before, the tool uses syntactic, morphologic and semantic information for the specification of an anaphora and its antecedent. The semantic information used by the tool is limited to ontology concepts and synonymous. From the semantically annotated text, three tables are created, one for each syntactic function: subject, direct object and indirect object. In these tables the appearance frequency of nouns with verbs (with their correct senses) is stored. These tables are the base to construct the semantic compatibility patterns, which indicate the compatibility between the ontological concept related with the possible antecedent and the verb of the sentence where the anaphoric expression appears. In order to calculate this information, the occurrence frequency and the conceptual generality degree in the ontology are considered. In this case, a higher punctuation is given to the most concrete concepts. For example, &amp;quot;Human&amp;quot; concept gives us further information than &amp;quot;Natural&amp;quot; concept. These patterns are used in the semantic preferences application. For a specific candidate, its semantic compatibility is calculated from the compatible ontological concepts on the patterns. The candidates with greater compatibility are preferred.</Paragraph>
      <Paragraph position="5"> When the annotator selects a XML file to open, the possible anaphoric elements of the text and their candidates are located, and each anaphora is solved.</Paragraph>
      <Paragraph position="6"> The system shows two lists (Figure 2): the lower list shows each anaphora located and its solution.</Paragraph>
      <Paragraph position="7"> When the annotator selects one of these elements, in the upper box appears the possible candidates list besides the solution suggested by the system. At the same time, in the plain text, the anaphora and the selected candidates are shown with different colors.</Paragraph>
      <Paragraph position="8"> The annotator can choose any suggested option and the certainty degree of this election, or accept the solution given by the system.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML