File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0610_metho.xml

Size: 13,503 bytes

Last Modified: 2025-10-06 14:10:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0610">
  <Title>On Distance between Deep Syntax and Semantic Representation</Title>
  <Section position="4" start_page="78" end_page="79" type="metho">
    <SectionTitle>
2 FGD layers
</SectionTitle>
    <Paragraph position="0"> PDT 2.0 contains three layers of information about the text (as described in Haji c (1998)): Morphosyntactic Tagging. This layer represents the text in the original linear word order with a tag assigned unambiguously to each word form occurence, much like the Brown corpus does.</Paragraph>
    <Paragraph position="1"> Syntactic Dependency Annotation. It contains the (unambiguous) dependency representation of every sentence, with features describing the morphosyntactic properties, the syntactic function, and the lexical unit itself. All words from the sentence appear in its representation. null Tectogrammatical Representation (TR). At this level of description, we annotate every (autosemantic non-auxiliary) lexical unit with its tectogrammatical function, position in the scale of the communicative dynamism and its grammatemes (similar to the morphosyntactic tag, but only for categories which cannot be derived from the word's function, like number for nouns, but not its case).</Paragraph>
    <Paragraph position="2"> There are several reasons why TR may not be suf cient in a question answering system or MT: 1. The syntactic functors Actor and Patient disallow creating inference rules for cognitive roles like Affected object or State carrier. For example, the axiom stating that an affected  object is changed by the event ((v AFF o) (v SUBS change.2.1)) can not be used in the TR framework.</Paragraph>
    <Paragraph position="3"> 2. There is no information about sorts of con- null cepts represented by TR nodes. Sorts (the upper conceptual ontology) are an important source of constraints for MultiNet relations. Every relation has its signature which in turn  ysis and inferencing.</Paragraph>
    <Paragraph position="4"> 3. Lexemes of TR have no hierarchy which limits especially the search for an answer in a question answering system. In TR there is no counterpart of SUB, SUBR, and SUBS MultiNet relations which connect subordinate concepts to superordinate ones and individual object representatves to corresponding generic concepts.</Paragraph>
    <Paragraph position="5"> 4. In TR, each sentence is isolated from the rest of the text, except for coreference arrows heading to preceding sentences. This, in effect, disallows inferences combining knowledge from multiple sentences in one inference rule.</Paragraph>
    <Paragraph position="6"> 5. Nodes in TR always correspond to a word or a group of words in the surface form of  sentence or to a deleted obligatory valency of another node. There are no means for representing knowledge generated during the inference process, if the knowledge doesn't have a form of TR. For example, consider axiom of temporal precedence transitivity (3):</Paragraph>
    <Paragraph position="8"> In TR, we can not add an edge denoting (a ANTE c). We would have to include a proposition like a precedes c as a whole new clause.</Paragraph>
    <Paragraph position="9"> For all these reasons we need to extend our text annotation to a form suitable to more advanced tasks. It is shown in Helbig (2006) that MultiNet is capable to solve all the above mentioned issues. Helbig (1986) describes a procedure for automatic translation of natural language utterances into MultiNet structures used in WOCADI tool for German. WOCADI uses no theoretical intermediate structures and relies heavily on semantically annotated dictionary (HagenLex, see Hartrumpf et al. (2003)).</Paragraph>
    <Paragraph position="10"> In our approach, we want to take advantage of existing tools for conversions between layers in FGD. By combining several simpler procedures for translation between adjacent layers, we can improve the robustness of the whole procedure and the modularity of the software tools. Moreover, the process is divided to logical steps corresponding to theoretically sound and well de ned structures. On the other hand, such a multistage processing is susceptible to accumulation of errors made by individual components.</Paragraph>
  </Section>
  <Section position="5" start_page="79" end_page="83" type="metho">
    <SectionTitle>
3 Structural Similarities
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="79" end_page="80" type="sub_section">
      <SectionTitle>
3.1 Nodes and Concepts
</SectionTitle>
      <Paragraph position="0"> If we look at examples of TR and MultiNet structures, at rst sight we can see that the nodes of TR mostly correspond to concepts in MultiNet.</Paragraph>
      <Paragraph position="1"> However, there is a major difference: TR does not include the concept encapsulation. The encapsulation in MultiNet serves for distinguishing definitional knowledge from assertional knowledge about given node, e.g., in the sentence The old man is sleeping , the connection to old will be in the de nitional part of man, while the connection to the state is sleeping belongs to the assertional  part of the concept representing the man. In TR, these differences in content are represented by differences in Topic-Focus Articulation (TFA) of corresponding words.</Paragraph>
      <Paragraph position="2"> There are also TR nodes that correspond to no MultiNet concept (typically, the node representing the verb be ) and TR nodes corresponding to a whole subnetwork, e.g., Fred in the sentence Fred is going home. , where the TR node representing Fred corresponds to the subnetwork1 in gure 3.</Paragraph>
    </Section>
    <Section position="2" start_page="80" end_page="80" type="sub_section">
      <SectionTitle>
3.2 Edges, relations and functions
</SectionTitle>
      <Paragraph position="0"> An edge of TR between nodes that have their conceptual counterparts in MultiNet always corresponds to one or more relations and possibly also some functions. In general, it can be said that MultiNet representation of a text contains significantly more connections (either as relations, or as functions) than TR, and some of them correspond to TR edges.</Paragraph>
    </Section>
    <Section position="3" start_page="80" end_page="80" type="sub_section">
      <SectionTitle>
3.3 Functors and types of relations and
</SectionTitle>
      <Paragraph position="0"> functions There are 67 functor types in TR (see Haji cov*a et al. (2000) for description), which correspond to 94 relation types and 19 function types in MultiNet (Helbig, 2006). The mapping of TR functions  There are also MultiNet relations and functions with no counterpart in TR (stars at the beginning denote a function): ANLG, ANTO, CHEA, CHPA, CHPE, CHPS, CHSA CHSP, CNVRS, COMPL, CONTR, CORR, DISTG, DPND, EQU, EXT, HSIT, MAJ, MIN, PARS, POSS, PRED0, PRED, PREDR, PREDS, SETOF, SUB, SYNO, VALR, *FLPJ and *OP.</Paragraph>
      <Paragraph position="1"> From the tables 1 and 2, we can conclude that although the mapping is not one to one, the preprocessing of the input text to TR highly reduces the problem of the appropriate text to MultiNet transformation. However, it is not clear how to solve the remaining ambiguity.</Paragraph>
    </Section>
    <Section position="4" start_page="80" end_page="80" type="sub_section">
      <SectionTitle>
3.4 Grammatemes and layer information
</SectionTitle>
      <Paragraph position="0"> TR has at its disposal 15 grammatemes, which can be conceived as node attributes. Note that not all grammatemes are applicable to all nodes.</Paragraph>
      <Paragraph position="1"> The grammatemes in TR roughly correspond to layer information in MultiNet, but also to speci c MultiNet relations.</Paragraph>
      <Paragraph position="2">  1. NUMBER. This TR grammateme is transformed to QUANT, CARD, and ETYPE attributes in MultiNet.</Paragraph>
      <Paragraph position="3"> 2. GENDER. This syntactical information is not transformed to the semantic representation with the exception of occurences where the grammateme distinguishes the gender of an animal or a person and where MultiNet uses SUB relation with appropriate concepts.</Paragraph>
      <Paragraph position="4"> 3. PERSON. This verbal grammateme is reected in cognitive roles connected to the event or state and is semantically super uous. 4. POLITENESS has no structural counterpart in MultiNet. It can be represented in the conceptual hierarchy of SUB relation.</Paragraph>
      <Paragraph position="5"> 5. NUMERTYPE distinguishing e.g. three from third and one third is transformed to corresponding number and also to the manner this number is connected to the network.</Paragraph>
      <Paragraph position="6"> 6. INDEFTYPE corresponds to QUANT and VARIA layer attributes.</Paragraph>
      <Paragraph position="7"> 7. NEGATION is transformed to both FACT layer attribute and *NON function combined with modality relation.</Paragraph>
      <Paragraph position="8"> 8. DEGCMP corresponds to *COMP and *SUPL functions.</Paragraph>
      <Paragraph position="9"> 9. VERBMOD: imp value is represented by  MODL relation to imperative, cdn value is ambiguous not only with respect to facticity of the condition but also with regard to other criteria distinguishing CAUS, IMPL, JUST and COND relatinos which can all result in a sentence with cdn verb. Also the FACT layer attribute of several concepts is affected by this value.</Paragraph>
      <Paragraph position="10"> 10. DEONTMOD corresponds to MODL relation. null 11. DISPMOD is semantically super uous.</Paragraph>
      <Paragraph position="11"> 12. ASPECT has no direct counterpart in Multi-Net. It can be represented by the interplay of temporal speci cation and RSLT relation connecting an action to its result.</Paragraph>
      <Paragraph position="12"> 13. TENSE is represented by relations ANTE, TEMP, DUR, STRT, and FIN.</Paragraph>
      <Paragraph position="13"> 14. RESULTATIVE has no direct counterpart and must be expressed using the RSLT relation. null 15. ITERATIVENESS should be represented by a combination of DUR and TEMP relations where some of temporal concepts have QUANT layer information set to several.</Paragraph>
    </Section>
    <Section position="5" start_page="80" end_page="83" type="sub_section">
      <SectionTitle>
3.5 TFA, quantifiers, and encapsulation
</SectionTitle>
      <Paragraph position="0"> In TR, the information structure of every utterance is annotated in terms of Topic-Focus Articulation  (TFA): 1. Every autosemantic word is marked c, t, or  f for contrastive topic, topic, or focus, respectively. The values can distinguish which part of the sentence belongs to topic and which part to focus.</Paragraph>
      <Paragraph position="1">  2. There is an ordering of all nodes according to communicative dynamism (CD). Nodes with lower values of CD belong to topic and nodes  with greater values to focus. In this way, the degree of aboutness is distinguished even inside topic and focus of sentences. MultiNet, on the other hand, doesn't contain any representational means devoted directly to representation of information structure. Nevertheless, the differences in the content of sentences differing only in TFA can be represented in MultiNet by other means. The TFA differences can be reected in these categories: * Relations connecting the topic of sentence with the remaining concepts in the sentence are usually a part of de nitional knowledge about the concepts in the topic, while the relations going to the focus belong to the assertional part of knowledge about the concepts in focus. In other words, TFA can be re ected in different values of K TYPE attribute.</Paragraph>
      <Paragraph position="2"> * TFA has an effect on the identi cation of presuppositions (Peregrin, 1995a) and allegations (Haji cov*a, 1984). In case of presupposition, we need to know about them in the process of assimilation of new information into the existing network in order to detect presupposition failures. In case of allegation, there is a difference in FACT attribute of the allegation.</Paragraph>
      <Paragraph position="3"> * The TFA has an in uence on the scope of quanti ers (Peregrin, 1995b; Haji cov*a et al., 1998). This information is fully transformed into the quanti er scopes in MultiNet.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="83" end_page="83" type="metho">
    <SectionTitle>
4 Related Work
</SectionTitle>
    <Paragraph position="0"> There are various approaches trying to analyze text to a semantic representation. Some of them use layered approach and others use only a single tool to directly produce the target structure. For German, there is the above mentioned WOCADI parser to MultiNet, for English, there is a Discourse Representation Theory (DRT) analyzer (Bos, 2005), and for Czech there is a Transparent Intensional Logic analyzer (Hor*ak, 2001). The layered approaches: DeepThought project (Callmeier et al., 2004) can combine output of various tools into one representation.</Paragraph>
    <Paragraph position="1"> It would be even possible to incorporate TR and MultiNet into this framework. Meaning-Text Theory (Bolshakov and Gelbukh, 2000) uses an approach similar to Functional Generative Description ( Zabokrtsk*y, 2005) but it also has no layer corresponding to MultiNet.</Paragraph>
    <Paragraph position="2"> There were attempts to analyze the semantics of TR, namely in question answering system TIBAQ (Jirk u and Haji c, 1982), which used TR directly as the semantic representation, and Kruijff-Korbayov*a (1998), who tried to transform the TFA information in TR into the DRT framework.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML