File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0610_intro.xml

Size: 4,960 bytes

Last Modified: 2025-10-06 14:03:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0610">
  <Title>On Distance between Deep Syntax and Semantic Representation</Title>
  <Section position="3" start_page="0" end_page="78" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The Prague Dependency Treebank 2.0 (PDT 2.0) described in Sgall et al. (2004) contains a large amount of Czech texts with complex and interlinked morphological (2 million words), syntactic (1.5M words), and complex semantic (tectogrammatical) annotation (0.8M words); in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level.</Paragraph>
    <Paragraph position="1"> The theoretical basis of the treebank lies in the Functional Generative Description (FGD) of language system by Sgall et al. (1986).</Paragraph>
    <Paragraph position="2"> PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current computational-linguistics research needs. The corpus itself is embedded into the latest annotation technology. Software tools for corpus search, annotation, and language analysis are included. Extensive documentation (in English) is provided as well.</Paragraph>
    <Paragraph position="3"> An example of a tectogrammatical tree from PDT 2.0 is given in gure 1. Function words are removed, their function is preserved in node attributes (grammatemes), information structure is annotated in terms of topic-focus articulation, and every node receives detailed semantic label corresponding to its function in the utterance (e.g., addressee, from where, how often, . . . ). The square node indicates an obligatory but missing valent.</Paragraph>
    <Paragraph position="4"> The tree represents the following sentence:</Paragraph>
    <Section position="1" start_page="0" end_page="78" type="sub_section">
      <SectionTitle>
1.1 MultiNet
</SectionTitle>
      <Paragraph position="0"> The representational means of Multilayered Extended Semantic Networks (MultiNet), which are  described in Helbig (2006), provide a universally applicable formalism for treatment of semantic phenomena of natural language. To this end, they offer distinct advantages over the use of the classical predicate calculus and its derivatives. The knowledge representation paradigm and semantic formalism MultiNet is used as a common backbone for all aspects of natural language processing (be they theoretical or practical ones). It is continually used for the development of intelligent information and communication systems and for natural language interfaces to the Internet. Within this framework, it is subject to permanent practical evaluation and further development.</Paragraph>
      <Paragraph position="1"> The semantic representation of natural language expressions by means of MultiNet is mainly independent of the considered language. In contrast, the syntactic constructs used in different languages to describe the same content are obviously not identical. To bridge the gap between different languages we can employ the deep syntactico-semantic representation available in the FGD framework.</Paragraph>
      <Paragraph position="2"> An example of a MultiNet structure is given in gure 2. The gure represents the following discourse: null Max gave his brother several apples.</Paragraph>
      <Paragraph position="3"> This was a generous gift.</Paragraph>
      <Paragraph position="4"> Four of them were rotten. (2) MultiNet is not explicitly model-theoretical and the extensional level is created only in those situations where the natural language expressions require it. It can be seen that the overall structure of the representation is not a tree unlike in Tectogrammatical representation (TR). The layer information is hidden except for the most important QUANT and CARD values. These attributes convey information that is important with respect to the content of the sentence. TR lacks attributes distinguishing intensional and extensional information and there are no relations like SUBM denoting relation between a set and its subset.</Paragraph>
      <Paragraph position="5"> Note that the MultiNet representation crosses the sentence boundaries. First, the structure representing a sentence is created and then this structure is assimilated into the existing representation.</Paragraph>
      <Paragraph position="6"> In contrast to CLASSIC (Brachman et al., 1991) and other KL-ONE networks, MultiNet contains a prede ned nal set of relation types, encapsulation of concepts, and attribute layers concerning cardinality of objects mentioned in discourse.</Paragraph>
      <Paragraph position="7"> In Section 2, we describe our motivation for extending the annotation in FGD to an even deeper level. Section 3 lists the MultiNet structural counterparts of tectogrammatical means. We discuss the related work in Section 4. Section 5 deals with various evaluation techniques and we conclude in Section 6.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML