File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1059_metho.xml

Size: 15,227 bytes

Last Modified: 2025-10-06 14:11:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-1059">
  <Title>PARSING GERMAN</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
GENERAL APPROACH
</SectionTitle>
    <Paragraph position="0"> A sentence is parsed word per word, from left to right. The parser is largely a data-driven production system. Productions involve the use of syntactic and semantic information at all major stages of the process. Noun phrases, for example, are recognized by an ATN which verifies the result of syntactic analysis semantically. It returns semantically valid NPs only. The parser belongs to the class of semantic parsers as suggested by \[I\], \[4\], \[7\]. It has two main sources of information: one is a semantic net, which propagates the information about selectional restrictions, the other is the parsing-lexicon, which for each word contains different senses associated with the information necessary to distinguish one sense from the others. Information includes syntactic features of the sentence (infinitive, surface-cases of dependent noun phrases .... ), semantic restrictions and words that occur together with the input-word.</Paragraph>
    <Paragraph position="1"> The productions make use of a correspondence between syntactic information in the sentence and the roles of the net (see chapter internal representation for an explanation of roles). Productions are used not only for generating the internal representation of constituents but also as expectations that guide the analysis of the rest of the sentence.</Paragraph>
    <Paragraph position="2"> The generation of the internal structure corresponding to the sentence is centered around the verb. Since the representation of other constituents can be initiated independently of the verb, the parser builds a semantic structure immediately after a constituent is recognized. These structures are stored in a list, until the main verb of the sentence has been found. Then the parser tries to fill the case-slots of the verb with the given structures. The semantic categories of the structures have to be matched against the value restrictions of the roles of the verb.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="365" type="metho">
    <SectionTitle>
INTERNAL REPRESENTATION
</SectionTitle>
    <Paragraph position="0"> The source of semantic information Net \[2\]. This net formalism has epistemologically clear and explicit.</Paragraph>
    <Paragraph position="1"> is a Structural Inheritance the advantage of being SI-Nets are based on a strict  366 I. STEINACKER and H. TROST discrimination between few structural components, and their content (what is represented). Real world knowledge is represented in the form of concepts and roles. Roles explain relationships between concepts. A concept is defined by its attributes which consist of two parts: the role and the value restriction. The value restriction is a concept which defines the range of possible fillers for the attribute, the role defines the function of a filler with regard to the concept being defined. Role-filler concepts can be regarded as semantic categories.</Paragraph>
    <Paragraph position="2"> Generic concepts are organized in a hierarchy of super- and subconcepts. A subconcept inherits the attributes of the superconcept. If a concept has more than one superconcept it inherits the combined set of attributes. When processing an input individuals of the addressed concepts are instantiated. These individuals constitute the episodic layer of the net.</Paragraph>
    <Paragraph position="3"> A word sense addresses either a concept or the attribute of a concept. If an input word relates to a concept, as most nouns and verbs do, that concept is instantiated. If it corresponds to a role both the concept and the attribute are instantiated, i. e. the generic concept, the role defining the attribute and the value restriction. Most adjectives and most prepositions are mapped into roles (size, colour, location, time, .... ) but also some nouns (e.g. father is the role of a person in the concept family). The net is structured in a way that facilitates the incorporation of results gained in linguistics: attributes of actions are defined in a way corresponding to cases of a case grammar. This can best be illustrated by an example: Actions are represented as net-concepts, e.g. DO. The concept DO is defined by attributes with roles like AGENT, OBJECT, GOAL, RESULT, that are restricted by adequate role-filler concepts. By defining attributes in this way a correspondence between surface cases in a sentence and roles of the net can easily be established.</Paragraph>
  </Section>
  <Section position="4" start_page="365" end_page="365" type="metho">
    <SectionTitle>
THE PARSING-LEXICON
</SectionTitle>
    <Paragraph position="0"> In the parsing-lexicon each word-sense is associated with productions. These productions reflect the correspondence between surface cases of the sentence and semantic cases within the net.</Paragraph>
    <Paragraph position="1"> The number of tests in a production correlates to the number of senses of a word. By executing these tests the parser gains the information necessary to choose the correct reading of a word.</Paragraph>
    <Paragraph position="2"> Tests check the syntactic and the semantic context in which an input word is found. Sometimes morphological information and the occurrence of certain words have to be taken into consideration as well. The range of tests reflects our general approach to parsing: combining syntax and semantics at all stages of the parsing process \[8\].</Paragraph>
    <Paragraph position="3"> Depending on the stage of the process the failure of a test is interpreted in two ways. If the end of the sentence has been encountered the result is taken as false, if parsing is in progress the test is repeated at later stages of the process.</Paragraph>
    <Paragraph position="4"> Actions associated with the tests mostly deal with semantic structure-building procedures. Some actions are used to control the PARSING GERMAN 367 parsing process. Usually the semantic structure for a constituent of the sentence is built after the constituent is recognized but actions can delay the creation of net-structures. The reasons for such a delay are explained in the following chapter.</Paragraph>
    <Paragraph position="5"> A verb-sense is recognized by taking into consideration the syntactic surroundings of the verb and the semantic categories that match the selectional restrictions defined by the verb. After a verb-sense has been chosen expectations are built up regarding missing constituents. The occurrence of certain surface-structures also leads to the formation of expectations. Therefore tests that are associated with verbs first check the surface structure of the sentence (cases, prepositions...). The constituents that satisfy these syntactic tests have to fulfill semantic selectional restrictions. After having passed these tests, actions create the semantic representation for the verb and fill its roles with the selected constituents.</Paragraph>
    <Paragraph position="6"> Unless an entry in the lexicon includes a test regarding subject and object of a sentence the following default actions are executed automatically: the subject of a sentence is mapped onto the AGENT and the object (accusative) is mapped onto the OBJECT of the action. A Detailed Example The two senses of 'gehen' in the following example can be disambiguated by using the entries in the parsing-lexicon listed below (parts of the entry which are irrelevant to the example are left out). These sample entries include important kinds of tests and actions.</Paragraph>
    <Paragraph position="7">  (i) 'Ich gehe in den Park.' (I walk into the garden.) (2) 'Der Bus geht nach Wien.' (The bus is bound for Vienna.)  In the example the '+' parameter is an individual of the concept PUBL.-TRANSPORT, the '*' parameter is the location expressed by the prepositional phrase, namely Vienna. The nounphrase 'Ich' (I) fulfills the restriction ANIMATE, because speakers are always interpreted as humans.</Paragraph>
    <Paragraph position="8"> Surface-tests: Case-tests search for an NP of the surface-case indicated by the second parameter. If an NP is found that satisfies the condition, the tests that are connected by AND or OR to the case-test are executed. The constituent of the sentence which satisfies the tests is referred to with an asterix '*' in the associated action(s). The test PLOC refers to a prepositional phrase that indicates some location. It is a test which uses syntactic and semantic 368 I. STEINACKER and H. TROST information.</Paragraph>
    <Paragraph position="9"> Restriction-tests: These semantic tests are used to check selectional restrictions. They are often used in combination with syntactic tests. If both tests are met by a constituent this is a significant indicator, that the correct interpretation has been selected.</Paragraph>
    <Section position="1" start_page="365" end_page="365" type="sub_section">
      <SectionTitle>
Structure-building Actions:
</SectionTitle>
      <Paragraph position="0"> The action CRI(concept) creates an individual of the concept. The action CRV(pl,p2,p3) individuates an attribute of the concept pl.</Paragraph>
      <Paragraph position="1"> The concept pl, the role p2 and the concept P3 as value-restriction are instantiated. If pl or p3 are addressed by '+' the parameter refers to the first concept that was individuated when processing this particular entry in the parsing-lexicon. A '*'-parameter refers to the semantic representation of the constituent which satisfies the first test of the production.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="365" end_page="365" type="metho">
    <SectionTitle>
SPECIAL FEATURES OF GERMAN
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="365" end_page="365" type="sub_section">
      <SectionTitle>
Morphological Ambiguities
</SectionTitle>
      <Paragraph position="0"> We believe that making use of the interaction between syntax and semantics has many advantages over a strictly sequential approach to parsing. Introducing semantic information helps to resolve some ambiguities at an early stage of the analysis and thus to avoid unnecessary backtracking. Typically, morphological ambiguities can be resolved by such an interaction.</Paragraph>
      <Paragraph position="1"> The German language is rich in inflectional forms, therefore the morphological component often comes up with more than one possible stem for an input word. These stems usually belong to different categories of words, e.g. 'meinen' can be interpreted as a verb (to suppose) or it can be reduced to the possessive pronoun 'mein' (my).</Paragraph>
      <Paragraph position="2"> Syntax restricts the type of a constituent, which is expected at a given point in the analysis. Usually it is sufficient to use syntactic information to disambiguate morphological ambiguities of this kind.</Paragraph>
      <Paragraph position="3"> If a word is reduced to two different stems of the same category of words, selectional restrictions in the semantic net are used to choose one stem. The parsing-lexicon relates surface cases to semantic restrictions of the attributes of the action. In most cases this informaton is sufficient for disambiguation.</Paragraph>
      <Paragraph position="4"> The inflected form 'gehoert' is reduced to the two verbs 'hoeren' (to hear) and 'gehoeren' (to belong to).</Paragraph>
      <Paragraph position="5">  (3) Dieses Buch gehoert mir. (This is my book.) (4) Hast du dieses Geraeusch gehoert?  (Did you hear that noise?) In (3) the subject of the sentence has to be a 'POSSESSIBLE OBJECT', in (4) the object of has to be a subconcept of 'SOUND'. A violation of selectional restrictions, is a clear indicator that the wrong interpretation of the verb has been chosen.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="365" end_page="365" type="metho">
    <SectionTitle>
PARSING GERMAN 369
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="365" end_page="365" type="sub_section">
      <SectionTitle>
Disconnected Constituents
</SectionTitle>
      <Paragraph position="0"> Another characteristic feature of the German language is the verb second phenomenon. In German a verb can occupy three different positions within a sentence: the first in questions and commands, the second in main clauses, and the last in subordinate clauses.</Paragraph>
      <Paragraph position="1"> Compound predicates are divided into two parts. The auxiliary or the modal verb hold %he place of the verb, and the rest of the predicate is put at the end of the sentence. One has to deal with a two-piece predicate whenever compound tenses are used, in structures involving the infinitive etc.</Paragraph>
      <Paragraph position="2"> For a parser that uses a traditional approach of sequential syntactic and semantic processing these features cause extensive backtracking. The method of combinig syntactic and semantic analysis does not avoid backtracking completely but it makes re-interpretation easier. This claim is supported in the following paragraph using the example of a compound predicate.</Paragraph>
      <Paragraph position="3"> (5) Mein Bruder hat das Buch, yon dem du mir erzaehlt hast, schon gelesen.</Paragraph>
      <Paragraph position="4"> (My brother already read the book, which you told me about.) In (5) the object and a relative clause separate the two parts of the predicate. One possible reading of the verb 'haben' is to possess. The object 'das Buch' satisfies the semantic restriction 'POSSESSIBLE-OBJECT', therefore 'hat' is taken as the predicate and a possess relation is established between the representations for subject and object. When the past participle 'gelesen' is encountered at the end of the sentence this decision has to be revised in favour of the compound predicate 'hat gelesen'.</Paragraph>
      <Paragraph position="5"> The possess relation which was established has to be replaced by the concept that is addressed by 'lesen, namely 'INFORMATION-TRANSFER'. The semantic representations of the object book and the relative clause are not afflicted by this change. Book also fits into the hierarchy of 'INFORMATION-SOURCE' and therefore satisfies the selectional restrictions for the object of 'INFORMATION-TRANSFER' also.</Paragraph>
      <Paragraph position="6"> Separable prefixes also add to the problem of finding the right verb. Syntactically verbadjuncts are particles, that are part of the verb. In some tenses a verbadjunct becomes separated from the verb and is put at the end of the clause. Verbadjuncts can specify the verb, but sometimes they change its sense completely (aufhoeren = to stop, hoeren = to hear).</Paragraph>
      <Paragraph position="7">  (6) Das Kind hoert nach einer Stunde endlich zu weinen auf.</Paragraph>
      <Paragraph position="8">  (After an hour the child finally stops crying.) Such features either cause delay in th~ construction of the internal representation for a sentence, or they result in backtracking because the correct meaning of the verb becomes apparant at the end of the sentence.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML