File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-2008_metho.xml

Size: 6,199 bytes

Last Modified: 2025-10-06 14:07:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-2008">
  <Title>Interactive Paraphrasing Based on Linguistic Annotation</Title>
  <Section position="4" start_page="1" end_page="1" type="metho">
    <SectionTitle>
3 Linguistic Annotation
</SectionTitle>
    <Paragraph position="0"> Semantically embedding word sense definitions into the original document without changing the original context is much more difficult than showing the definition in popup windows.</Paragraph>
    <Paragraph position="1"> For example, replacing some word in a sentence only with its word sense definition may cause the original sentence to be grammatically wrong or less cohesive.</Paragraph>
    <Paragraph position="2">  Thisisduetothefactthatthewordsensedefinitions are usually incapable of simply replacing original words because of their fixed forms. For appropriately integrating the word sense definition into the original context, we employ syntactic annotation (described in the next section) to both original documents and the word sense definitions to let the machine know their contexts.</Paragraph>
    <Paragraph position="3"> Thus, we need two types of annotations for Interactive Paraphrasing. Oneisthewordsense annotation to retrieve the correct word sense definitionforaparticularword, andtheotheris the syntactic annotation for managing smooth integration of word sense definitions into the original document.</Paragraph>
    <Paragraph position="4"> In this paper, linguistic annotation covers syntactic annotation and word sense annotation. null</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.1 Syntactic Annotation
</SectionTitle>
      <Paragraph position="0"> Syntactic annotation is very useful to make on-line documents more machine-understandable on the basis of a new tag set, and to develop content-based presentation, retrieval, question-answering, summarization, and translation systems with much higher quality than is currently available. The new tag set was proposed by the GDA (Global Document Annotation) project (Hasida, http://www.etl.go.jp/etl/nl/gda/). It is based on XML , and designed to be as compatible as possible with TEI (The Text Encoding Initiative, http://www.uic.edu:80/orgs/tei/) and CES(Corpus Encoding Standard, http://www.cs.vassar.edu/CES/). It specifies modifier-modifiee relations, anaphor-referent relations, etc.</Paragraph>
      <Paragraph position="1"> An example of a GDA-tagged sentence is as follows:</Paragraph>
      <Paragraph position="3"> The tag, &lt;su&gt;, refers to a sentential unit.</Paragraph>
      <Paragraph position="4"> Theothertags above, &lt;n&gt;, &lt;np&gt;, &lt;v&gt;, &lt;ad&gt; and &lt;adp&gt; mean noun, noun phrase, verb, adnoun or adverb (including preposition and postposition), and adnominal or adverbial phrase, respectively. null Syntactic annotation is generated by automatic morphological analysis and interactive sentence parsing.</Paragraph>
      <Paragraph position="5"> Someresearchissuesconcerningsyntactic annotation are related to how the annotation cost can be reduced within some feasible levels. We have been developing some machine-guided annotation interfaces that conceal the complexity of annotation. Machine learning mechanisms also contribute to reducing the cost because they can gradually increase the accuracy of automatic annotation.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.2 Word Sense Annotation
Inthecomputational linguisticfield,wordsense
</SectionTitle>
      <Paragraph position="0"> disambiguation has been one of the biggest issues. For example, to have a better translation of documents, disambiguation of certain polysemic words is essential. Even if an estimation ofthewordsenseisachieved tosomeextent, incorrect interpretation of certain words can lead to irreparable misunderstanding.</Paragraph>
      <Paragraph position="1"> To avoid this problem, we have been promoting annotation of word sense for polysemic words in the document, so that their word senses can be machine-understandable.</Paragraph>
      <Paragraph position="2">  Forthispurpose,weneedadictionaryofconcepts, for which we use existing domain ontologies. Anontology is a set of descriptions ofconcepts - such as things, events, and relations that are specified in some way (such as specific natural language) in order to create an agreed-upon vocabulary for exchanging information.</Paragraph>
      <Paragraph position="3"> Annotating a word sense is therefore equal to creating a link between a word in the document andaconcept inacertain domainontology. We have made awordsenseannotating tool forthis purposewhich has been integrated with the annotation editor described in the next section.</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.3 Annotation Editor
</SectionTitle>
      <Paragraph position="0"> Our annotation editor, implemented as a Java application, facilitates linguistic annotation of the document. An example screen of our annotation editor is shown in Figure 5.</Paragraph>
      <Paragraph position="1">  The center window shows some text that was selected on the Web browser as shown on the righttopofthefigure. Theselectedareaisautomatically assigned an XPointer (i.e., a location identifier in the document) (World Wide Web Consortium, http://www.w3.org/TR/xptr/).</Paragraph>
      <Paragraph position="2"> Therightbottomwindowshowsthelinguistic structureofthesentenceintheselectedarea. In this window, the user can modify the results of the automatically-analyzed sentence structure.</Paragraph>
      <Paragraph position="3"> Using the editor, the user annotates text with linguistic structure (syntactic and semantic structure) and adds a comment to an element in the document. The editor is capable of basic natural language processing and interactive disambiguation.</Paragraph>
      <Paragraph position="4"> Thetool also supportswordsense annotation as shown in Figure 6. The ontology viewer appearsinthe rightmiddleofthe figure. Theuser can easily select a concept in the domain ontology and assign a concept ID to a word in the document as a word sense.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML