XML Viewer - w99-0614

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0614_intro.xml
Size: 23,333 bytes
Last Modified: 2025-10-06 14:07:01
<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0614">
  <Title>Hybrid Disambiguation of Prepositional Phrase Attachment and Interpretation</Title>
  <Section position="3" start_page="0" end_page="117" type="intro">
    <SectionTitle>
2 Data
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="111" type="sub_section">
      <SectionTitle>
2.1 PP interpretation rules
</SectionTitle>
      <Paragraph position="0"> One central component for the disambiguation method presented in this paper are semantic interpretation rules for PPs. A PP interpretation rule consists of a premise and a conclusion. The premise of an interpretation rule describes under which conditions the PP interpretation specified by the rule's conclusion can be valid. Two example rules for the local and contents interpretation of 'fiber' ('about'/'above'/'on'/'over'/'via'/...) are shown in Figure 1. As (at least) five more interpretations of 'fiber' are possible, the ambiguity degree for the interpretation of such a PP is (at least) seven.</Paragraph>
      <Paragraph position="1"> The premise of a rule is a set of feature structure constraints (including negated and disjunctive constraints and defining an underspecified feature structure) that refer to the following features of the preposition's sister NP (nominal phrase) and the preposition's mother NP or V (verb). (The features that are only refered to for the sister NP are marked by an S.) case (S) syntactic case: genitive, dative, and accusative for German PPs num (S) syntactic number: singular and plural in German sort a semantic sort value (atomic or disjunctive value) from a predefined ontology (see (Helbig and Schulz, 1997)) comprising 45 sorts. The most important  id fiber.loc id fiber.mcont explanation cl is/happens above the location of explanation cl contains information about the c2. topic described by c2. examples 'Flugzeuge fiber Seen' ('air planes examples 'Bficher fiber Seen' ('books on above lakes'), ... lakes'), ...</Paragraph>
      <Paragraph position="2"> premise cl (sort (dis object situation)) premise cl (sort (dis object situation)) (info +) c2 (case dat) (sort concrete-object) c2 (case acc) (sort object) conclusion net (loc cl c3) (*ueber c3 c2) conclusion net (mcont cl c2) The semantic network node cl corresponds to the mother, the node c2 to the sister, and c3 etc. are additional nodes. A disjunction of feature values is introduced by dis.  sorts for nouns axe object and its subsorts con-object (concrete object, with subsorts dis-object (discrete object) and substance) and abs-object (abstract object, with subsorts tem-abstractum (temporal abstractum), abs-situation (abstract situation), attribute, etc.). Verbs can belong to sort stat-situation (static situation) or sort dyn-situation (dynamic situation, with subsorts action and event). A disjunctive value represents a concept family (as introduced by (Bierwisch, 1983); closely related axe dotted types, see for example (Buitelaax, 1998)), e.g., the noun 'book' comprises a physical object variant and an abstract information variant.</Paragraph>
      <Paragraph position="3"> etype extension type for distinguishing individuals ('child', 'table'), sets of individuals ('men', 'group', 'people'), etc.</Paragraph>
      <Paragraph position="4"> The rest of the features are semantic Boolean features as shown in Table 1. 2 The conclusion of a rule is a semantic interpretation of the PP, which can be valid if the premise is satisfied by the sister and the mother. The rules' semantic representation uses a multilayered extended semantic network formalism (MESNET, see for example (Helbig and Schulz, 1997)), which has been successfully applied in various areas (e. g., in the Virtual Knowledge Factory, see (Knoll et al., 1998)).</Paragraph>
      <Paragraph position="5"> Besides the premise and the conclusion, 2Of course, other sets of such features are possible; the choice was made by selecting relevant features from the set of semantic features in an existent German inheritance lexicon (see (Haxtrumpf and Schulz, 1997)), which contains 7000 lexemes and is used by the disambiguation method.</Paragraph>
      <Paragraph position="6"> each rule contains a mnemonic identifier like in.loc (which consists of the preposition's orthographic form followed by an abbreviation derived from the semantic interpretation in the conclusion), a short explanation, and a set of example sentences that can be interpreted using this rule.</Paragraph>
      <Paragraph position="7"> From a set of rules for 160 German prepositions collected by (Tjaden, 1996), all rules for six important (i. e., frequent) prepositions were taken as a starting point for development and evaluation of a hybrid disambiguation method.</Paragraph>
      <Paragraph position="8"> Sentences were retrieved from a development test corpus to refine these rules.</Paragraph>
    </Section>
    <Section position="2" start_page="111" end_page="114" type="sub_section">
      <SectionTitle>
2.2 Corpus
</SectionTitle>
      <Paragraph position="0"> While PP interpretation rules form the rule component of the hybrid disambiguation method, an annotated corpus serves as the source of the statistical component. For each preposition under investigation, a number of candidate sentences that possibly show attachment ambiguity for this preposition were automatically extracted from a corpus. This corpus is based on the online version of the Sfiddeutsche Zeitung, starting from August 1997. The corpus is marked up according to the Corpus Encoding Standard (see (Ide et al., 1996)) and word, sentence, and paragraph identifiers are assigned.</Paragraph>
      <Paragraph position="1"> The preposition in a candidate sentence is semiautomatically annotated with five attributes: null sister The position of the right-most word of the preposition's sister NP. Postnominal genitive NPs modifying the main sister NP are included in this annotation.</Paragraph>
      <Paragraph position="2">  feature name description of entities with positive (+) value examples</Paragraph>
      <Paragraph position="4"> mother The position of the syntactic head word of the mother NP or V.</Paragraph>
      <Paragraph position="5"> amother The list: of alternative mothers represented byilthe position of the syntactic head word of an NP or V. An alternative mother is a syntactically possible mother distinct from the (correct) mother. All alternative mothers plus the (correct) mother form the set of candidate mothers for PP attachment.</Paragraph>
      <Paragraph position="6"> c-id A character string that identifies the semantic reading of the preposition and corresponds to the identifier in a PP interpretation rule (sea Figure 1).</Paragraph>
      <Paragraph position="7"> c A character string for comments and documentation purposes.</Paragraph>
      <Paragraph position="8"> The preposition in corpus sentence (1) is annotated as shown by the SGML element in (2). The meaning of this annotation can be illustrated as in (3): th'e PP's sister ends at 'Seite'; the PP attaches to: 'gebaut', and could syntactically also be attached to the NP with head 'Depot' or the NP ~ith head 'Museums'; the interpretation of the ~PP is a local one (auf.loc). 3  gebaut, nachdem die Planungen fiir built, after the plannings for die Thiiringer Talseite schon the Thuringian valley-side already fertig waren? ready were? 'And why is the new depot of the German-German Museum built on the Bavarian side, after the planning for the Thuringian side of the valley has already  been completed?' (2) 19971002bay_c.p3.s2.w10 (article bay_c, 1997-10-02, paragraph 3, sentence 2, word 10): (w c-id=&amp;quot;auf.loc&amp;quot; sister=&amp;quot; 12&amp;quot; mother=&amp;quot; 13&amp;quot; amother--&amp;quot;6/9&amp;quot;)auf(/w) (3) Und wieso wird das neue Depot al des Deutsch-Deutschen Museums a2  auf auf'ldegc bayerischer Seite s gebaut m, nachdem die Planungen ffir die Thfiringer Talseite schon fertig waren? The annotation process is semiautomatic: the machine guesses the attribute values following some heuristics; these guesses have to be checked and possibly extended or corrected by a human annotator. This kind of annotation, of course, is labor-intensive. But due to the development of an Tcl/Tk annotation tool optimized for manual annotation speed, the average annotation time per candidate sentence dropped under 30 seconds. Furthermore, the following sections show that a small set of annotated sentences achieves promising results for PP attachment and interpretation. The lexicon (see footnote 2) had to be extended for the nouns and  verbs annotated as head words of sisters or candidate mothers that were not in the lexicon and could not be analyzed by a compound analysis module.</Paragraph>
      <Paragraph position="9"> Some candidate sentences were excluded from the investigation because the PP involves a problem that is supposed to be solved by other NLP modules 4 and could disturb the evaluation of the PP disambiguation module (e. g., by producing noise for the statistical part). All exclusion criteria are listed in Table 2 with percentages of instances of such exclusions relative to the number of candidate sentences. In short, sentences are excluded when their PP ambiguity problem * can be solved by separate components (for support verb constructions and idioms) or * can only be solved if the PP attachment and interpretation is supported by another component (for complex named entities, ellipsis resolution, and foreign language expressions). null The first 120 non-excluded candidate sentences for each preposition were chosen and randomally split into eight parts for cross validation. Eight evaluations were carried out with one part being the evaluation test corpus and the remaining seven parts being the evaluation training corpus.</Paragraph>
      <Paragraph position="10"> Sometimes, it makes no semantic difference whether a PP in a sentence attaches to an NP or a V. This is known as systematic ambiguity (or systematic indeterminacy, see (Hindle and Rooth, 1993, p. 112)). Two subtypes of this phenomenon are systematic locative ambiguity (see corpus sentence (4)) and systematic contents ambiguity.</Paragraph>
      <Paragraph position="11">  (4) Bis ein Bescheid ml aus aus'degrigl Until a notification from  such modules solve these problems.</Paragraph>
      <Paragraph position="12"> The frequency of such ambiguities depends heavily on the preposition; on the average, there were 4.3% cases of systematic ambiguity. 5 For English, (Hindle and Rooth, 1993, p. 116) report that 77 out of 880 sentences (8.75%) were systematically ambiguous. In such sentences, an attachment can be considered correct if it is one of the two attachments connected by systematic ambiguity; both parsing results will lead to identical results in an NLP application if it contains sufficiently developed inference components. Table 3 shows for the evaluation corpus (720 sentences 6) where the PP attaches to (columns V, NP1, NP2 (the second closest NP), NP3, NP4), how many attachments are syntactically possible (number of candidate mothers; columns labeled 1 to 5), and how frequent sys- null lems in NLP. But where a PP attaches to, is only half of the story of the PP's contribution to an utterance; the other half is how it is to be interpreted. And clearly, these two questions are not independent. So, why not tackle both problems at once, trying to achieve for both problems results that are better than the results obtained by an isolated PP attachment component and an isolated PP interpretation component? As both problems depend on each other, there is the strong hope that this is the case. To investigate this hypothesis, such a disambiguation method was developed and evaluated.</Paragraph>
      <Paragraph position="13"> The input to the disambiguation method is the feature structure p for the preposition, the feature structure s for the parse of the preposition's sister NP, and the feature structures cmi for the (trivial) parses of the syntactic head words of all candidate mothers. The output is the mother the PP is to be attached to and the * interpretation the preposition plus the sister NP contribute to the meaning of the enclosing sentence. null The overall structure of this disambiguation method comprises three steps. First, all sets  short name description % of tokens cne-amother amother is a complex named entity (titles of books, etc.) 0.1 cne-mother mother is a complex named entity (titles of books, etc.) 0.4 cne-sister sister is a complex named entity (titles of books, etc.) 0.6 ell-amother amother is elliptic 0.1 ell-mother mother is elliptic 0.1 ell-sister I sister is elliptic 0.5 fle-amother amother is a foreign language expression 0.1 fie-mother mother is a foreign language expression 0.1 idi-amother amother is an idiom (or part of an idiom) 0.1 idi-moth~r mother is an idiom 0.4 idi-pp PP is an idiom 3.6 idi-pp-mother PP plus mother is an idiom 0.9 idi-pp-v. PP plus verb is an idiom 0.5 problem unclassified problem 0.7 svc PP is part of a support verb construction 0.5 svc-amo~her amother of the PP is a support verb construction 0.3 svc-mother mother of the PP is a support verb construction 1.0 sum 10.1  of possible interpretations PIi of the PP plus a given candidate mother cmi are determined by applying the PP interpretation rules. Second, for each set of possible interpretations PIi, one interpretation sii is selected using interpretation statistics (on semantics). Third, among all selected sii, one interpretation is chosen based on attachment statistics (on semantics and syntax) and additional factors. These steps will be presented in more detail in the following three subsections.</Paragraph>
    </Section>
    <Section position="3" start_page="114" end_page="115" type="sub_section">
      <SectionTitle>
3.2 Application of interpretation rules
</SectionTitle>
      <Paragraph position="0"> Step 1 of the disambiguation method (determining possible interpretations PIi) is driven by testing the premises of PP interpretation rules. From the set of interpretations PIt whose rule premises are satisfied, interpretations are removed that violate adjunct constraints from the lexicon or constraints from the underlying semantic formalism 7 (see step 1 in Figure 2).</Paragraph>
      <Paragraph position="1"> ~Of course, constraints from the semantic formalism could be added to the rules. But this would introduce redundancy which would make the rules difficult to develop and maintain.</Paragraph>
      <Paragraph position="2">  n is the number of possible attachments (cml, ..., cram). m is the number of rules for preposition p (rl, ..., rm). 1. for each candidate mother cmi (a) PIt : {(p, 8, cmi, rj) I 1 ~ j _&lt; m, premise of rule rj is satisfied by sister s and cmi} (b) PIi = set of all (p, s, cmi, r) E PIt which fulfill the following conditions: * Semantic relations in the conclusion of r are licensed by compatible relations listed in the feature structure cmi, which come from lexical entries (or lexical defaults). * Semantic relations in the conclusion of r do not violate the signature constraints that are defined for these relations in the underlying semantic network formalism. 2. for each candidate mother cmi with nonempty PIi (a) sii = arg max~ rf(r, {rj 13(p, s, cmi, rj) e PIi}), where pi = (p, s, cmi, r) E PIi 3. for each candidate mother cmi with nonempty PIi (a) d = distance in words between candidate mother cmi and the PP (p plus s) (b) scoresi~ = rf((r, cat(cmi)), {(rj, cat(cmk)) I 1 &lt; k &lt; n, P!k C/ ~, Sik = (p, S, cmk, rj)}) + scoredist(d), where sii = (p, s, cmi, r) si = arg maxsi~ scoresi~, where 1 &lt; i &lt; n, PIi ~  To simplify Figure 2, the treatment of complements is excluded. Interpretations that are licensed by lexical complement information for candidate mothers are also determined in step 1.</Paragraph>
      <Paragraph position="3"> Experiments showed that it is a good strategy to prefer complement interpretations over adjunct interpretations, which are described in the following steps, s Attachment cases where prepositional objects as complements are involved are the easy ones for statistical disambiguation techniques (see for example (Hindle and Rooth, 1993)); in a hybrid system, one can expect such complement information to be in the lexicon, at least in part. The problem is alleviated as the interpretation rules (which are developed for adjuncts)produce correct results for many complements; but this topic needs further research.</Paragraph>
    </Section>
    <Section position="4" start_page="115" end_page="116" type="sub_section">
      <SectionTitle>
3.3 Interpretation disambiguation
</SectionTitle>
      <Paragraph position="0"> The result of step 1 can be viewed as an attachment-interpretation matrix (aii,j) with size nxm. A matrix element aii,j corresponds to attaching the PP to candidate mother cmi Sin the rare case of two possible complement interpretations, the verbal one is prefered.</Paragraph>
      <Paragraph position="1"> under interpretation rj and represents some kind of preference score.</Paragraph>
      <Paragraph position="2"> To solve the attachment and interpretation problem (i.e., to select the right matrix element), statistics can be used. There are numerous statistical approaches (see section 1), but in the presented approach a statistical component is combined with a rule component (see step 1).</Paragraph>
      <Paragraph position="3"> This rule component reduces the degree of ambiguity (i. e., marks elements in matrix (aii,j) as possible or impossible) and delivers high-level semantic information (the possible semantic interpretations of the PP for a given candidate mother) for statistical disambiguation.</Paragraph>
      <Paragraph position="4"> The strategy adopted in this disambiguation method is to do the remaining disambiguation in two steps: first disambiguate the interpretations for each attachment possibility, then disambiguate the attachments based on the first step's result. So, in step 2 of the disambiguation method, one interpretation for each candidate mother is chosen. As Table 4 shows, most of the time the correct rule fires (given the correct mother; see recall column), but false rules fire too (see precision column) because interpretation rules refer only to a limited depth  tation and attachment of semantics, which can be delivered by realistic parsers for nontrivial domains. Therefore, there is the need to disambiguate for interpretation.</Paragraph>
      <Paragraph position="5"> Here statistics derived from the annotated corpus come into play: relative frequencies are calculated, which serve as estimated probabilities. As usual in statistical methods for disambiguation, there is a trade-off between depth of learned information (e. g., number and type of features) and non-sparseness of the resulting matrix-like structure representing the learning results: the deeper the information, the sparser the matrix. A good compromise for the problem at hand is to regard only the interpretation (identified by therule id) and to establish a limit nint for the number of interpretations. Empirical results showed that three is a reasonable choice for nint. An example of an entry in the interpretation statistics is given in the first line of Figure 3 and can be paraphrased as follows: The interpretation aus.pars wins in 100% of the learned cases if the interpretations aus.origl and aus.sourc are possible too.</Paragraph>
      <Paragraph position="6"> If there are more than three possible interpretations, standard techniques for reducing to several triples can be used (backed-off estimation, see for example (Katz, 1987), (Collins and Brooks, 1995)). The relative frequency of rule ri being the correct interpretation among I = {rl, r2,..., rn) is estimated for n &gt; ni t as in equation (5): rf(ri, c) (5) if(r, I) .- c c, Ic, I where Ci is the set of all subsets of I with ni~t elements that contain ri.</Paragraph>
      <Paragraph position="7"> In step 2 of the disambiguation algorithm (see middle of Figure 2), the rule that maximizes the (estimated) relative frequency must be found for each candidate mother.</Paragraph>
    </Section>
    <Section position="5" start_page="116" end_page="117" type="sub_section">
      <SectionTitle>
3.4 Attachment disambiguation
</SectionTitle>
      <Paragraph position="0"> After step 2, the attachment-interpretation matrix (aQ,j) contains in each row (attachment) one element marked as selected. 9 What remains to be done is to choose among all attachments with selected interpretation sii one interpretation si.</Paragraph>
      <Paragraph position="1"> For this disambiguation task, attachment statistics are employed. This time the compromise between depth of learned information and non-sparseness can contain more information than just the interpretation id as experiments showed. A three-valued syntactic-semantic feature cat is added. It describes the candidate mother with three possible values: v a verb nps an NP that describes a situation (at least partially), e.g., 'continuation' np an NP that does not describe a situation, e.g., 'house' The second line of Figure 3 contains an example that expresses the fact that if the interpretation aus.temp for a nominal candidate mother and the interpretation aus.cstr for a verbal candidate mother compete then the first is correct (in the training corpus) with relative frequency  1. If one adds even more information to attachment statistics (e. g., the position of NP candi- null date mothers like np2 for the second closest NP) the attachment data for the annotations in this paper becomes too sparse.</Paragraph>
      <Paragraph position="2"> 9There might be rows where no element is marked because none of the rules fired and passed filtering (see section 3.2).</Paragraph>
      <Paragraph position="3">  As for the interpretation statistics in step 2, standard techniques can reduce tuples that are longer than 2 (hart) to several shorter ones. The relative frequency of (ri, cat(cmi)) belonging to the correct attachment among A = {(rl, cat(cml)),..., (rn,cat(cmn))} is estimated for n &gt;natt as in equation (6):</Paragraph>
      <Paragraph position="5"> where Ci is the set of all subsets of A with natt elements that contain (ri, cat(cmi) ).</Paragraph>
      <Paragraph position="6"> These relative frequencies for the selected interpretations sii serve as initial values for an attachment score. Other factors can add to this score, so that the attachment decision should improve; of course, the value is only a score, not a relative frequency any more. Different factors (e. g., distance between candidate mother and the PP; in this way, one can simulate the rightassociation principle, see (Kimball, 1973)) were evaluated. The following distance scoring function scoredist turned out to be useful: (7) d is the number of words between the candidate mother and the PP. md is an upper limit for distances. Longer distances are reduced to md. (10 is a reasonable choice for md.)</Paragraph>
      <Paragraph position="8"> for V mothers Good values for the parameters distw (weight of the distance factor) and distv (modification for verbal mothers) depend on the preposition at hand and are learned by testing pairs of values from the range 0.0 to 2.0 (see Table 5). 1deg The last step of the disambiguation algorithm is summarized at the bottom of Figure 2.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML