File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2153_metho.xml
Size: 17,770 bytes
Last Modified: 2025-10-06 14:14:12
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2153"> <Title>Semantic Construction from Parse Forests</Title> <Section position="4" start_page="0" end_page="907" type="metho"> <SectionTitle> 2 Outline of the System </SectionTitle> <Paragraph position="0"> Let us begin wil;h a rough sketch of the arctfiteclure of the systmn. The semantic cons|;rllcl;ion module works on parse forests and presut)t)oses a semantic grammar of a certain kind (see chapter 6). The semantic grammar must be correlated with the syntactic grammar so that there is a one-to-one mapping between lexical entries and rules.</Paragraph> </Section> <Section position="5" start_page="907" end_page="907" type="metho"> <SectionTitle> 3 Packed Shared Forests </SectionTitle> <Paragraph position="0"> In this section a formal description of packed shared forests in the sense of Tomita (Tomita, 1985) is given.</Paragraph> <Paragraph position="1"> Let a context-free grammar G be a quadruple < N, T, R, S > where N and T are finite disjoint sets of nonterminal symbols and terminal symbols, respectively, R is a set of rules of the form A -+ a (A is a nonterminal and a a possibly empty string of nonterminal or terminal symbols), S is a speciM nontermin~l, called start symbol.</Paragraph> <Paragraph position="2"> An ordered directed graph marked according to grammar ~ is a triple < V,E,m > so that V is a finite set of vertices or nodes, E a finite set of edges e of the form (vl, (v2,... ,vn)) (vi C V,n > 2, e starts at vl, vl is the predecessor of v2,..., vn), m is the marking function which associates with each vertex a terrainai or nonterminai symbol or the special symbol e. m is restricted so that the vertices on each edge are marked with the,symbols of a rule in 6, the empty string being represented by the additional sym- null bol ~. A parse tree is an ordered directed acyclic graph (DAG) satisfying the following constraints. 1. There is exactly one vertex without predecessors, called the top vertex or root. The root is marked with the start symbol.</Paragraph> <Paragraph position="3"> 2. For every vertex there is at most one edge starting at the vette+-. Vertices that do not begin edges are called leaves, such that do are called inner nodes.</Paragraph> <Paragraph position="4"> 3. Every vertex except the root has exactly one predecessor.</Paragraph> <Paragraph position="5"> A DAG satisfying the constraints (1-2) is called Shared Forest, a DAG only satisfying (1) is a Packed Shared Forest or parse forest (see figure 1). A packed shared forest for an input string a obeys the further constraint that, there must be at most one vertex for each grammar symbol and substring of a. Thus, if a consists of n words, there will be at most k * n 2 vertices in the parse forest for it (k being constant). Parse forests can be efficiently constructed using conventional parsing algorithms (Tomita, 1985), (Earley, 1970).</Paragraph> </Section> <Section position="6" start_page="907" end_page="910" type="metho"> <SectionTitle> 4 Determining Tree Readings from </SectionTitle> <Paragraph position="0"> a Forest A tree reading of forest F is a tree in F that; contains the root and all leaves. Tree readings are treated as objects. An edge is used in a tree reading if it is one of the tree's edges. Let us now define a disambiguated parse forest (DPF for short). A DPF 79 is a quadruple < V,D,E',m > such that * V and D are finite disjoint sets. V is the set of vertices and D is the set of tree readings.</Paragraph> <Paragraph position="1"> * E' is a finite set of edges of the form (vl, {v2,..., vn), {dl,..., d,~}). The third element is a set of tree readings (C D) and encodes the tree readings in which the edge is used.</Paragraph> <Paragraph position="2"> * m is a marking function from vertices to grammar symbols.</Paragraph> <Paragraph position="3"> To derive a DPF from a parse forest every edge must be assigned a set of tree readings. There is no simple way to determine from a parse forest the number of its tree readings. So instead of postulating a fixed set of readings the present approach uses pointers (implemented as Prolog variables) to refer to sets of tree readings. Two operations disjoint union and multiplication are defined for these sct pointers. Both operations are monotonic in the sense that the pointers are not altered, their value is only specified. Let si be a set of tree readings. * 81 0 82 The operator tJ differs from the set-theoretic notion of disjoint union in that it is neither commutative nor associative. This is so because on the implementational level commutativity and associativity would necessitate an abstract data type, thus a costly overhead.</Paragraph> <Paragraph position="5"> In general, sl and s2 correspond to formulae invo!ving atomic sets and 0 operators\[ Sl = Sll U ... U 81m and s2 = s21 U ... U 82n.</Paragraph> <Paragraph position="6"> The operation x introduces m* n new atomic sets s~j and splits the former atomic sets such that Vi: 1 <i < m : Sli = s~l 0 ... 0 s~,, and Vj : 1 <<_ j < n : s2j = s~j 0 &quot; ' ... U Smj. The sets Sl and s2 are now equal modulo associativity and commutativity. Consider the following example:</Paragraph> <Paragraph position="8"> We begin by associating a particular set pointer sl with the root vertex. Sl refers to the, total set of tree readings of the forest; since the root vertex figures in all trees derivable from the forest. We then traverse the graph in top-down fashion applying to each new vertex v the following procedure: Let ei be the set of tree readings at edge i ending in v, and b# the set of tree readings at edge j starting in v. Then the following actions must be performed.</Paragraph> <Paragraph position="9"> * Apply the procedure to all successors of v. This step yields for each edge j starting in v and for each vertex u at the end of j a set of tree readings b~,,.</Paragraph> <Paragraph position="10"> * bj = b;1 X ... X b},, for each edge j starting in v * (bl 0 ... 0 bn) x (el 0 ... 0 era) If a vertex v h~s already been encountered the only action required is to connect the edge information on v's predecessor w with the edge information already present on vertex v. In particular, the successors of v need not be checked again.</Paragraph> <Paragraph position="11"> Let k be the edge ()vet' which the vertex v was reached from another vertex w in the top-down traversal. Let C'k,t, be the set of tree readings determined for edge k at vertex w and ek,, the set of tree readings determined for the edge at vertex v.</Paragraph> <Paragraph position="12"> In this section an extension to UDRSs (Reyle, 1993) to express referentially underspecified semantic representations is presented.</Paragraph> <Paragraph position="13"> First a detinition of UDRSs is given. A UDRS/J is a quadruple < L,R,C, <_> where L and R are disjoint finite sets of labels and discourse referents, respectively. The order relation < forms a semilattice ow, r L with one-element lq. C is a set of conditions of the following form * 1 : x, where l C 12,x E :R..</Paragraph> <Paragraph position="14"> * 1 : p(xl,...,x,,), where I E 12,xi,...,:r,~ C Tt, and p is an n-place predicate</Paragraph> <Paragraph position="16"> In UDRSs 12 = L and &quot;~ = R.</Paragraph> <Paragraph position="17"> To get packed UDRSs the UDRS language is extended by adding reified contexts (semantic readings) to it. The idea of using context variat)les to represent ambiguous structures originally stems fl'om the literature on constraint-based formalisms (DSrre and Eisele, 1990). A packed UDRS is a quintuple < L,R,D,C',5> where L, R, < are the same as in UDRSs, D is a finite set of contexts which is disjoint from L and R. C' is defined as in UDRSs except that (1) any condition may also be prefixed by a context set, and (2) label arguments may~ not, only be labels but also functions from contexts to labels (PS = L U (D --~ L)), and the same holds for discourse referents (7PS = RtA (D ~ R)). If a function {A ~ Xl,B ~ x2} replaces a discourse referent in a packed UDRS, this intuitively means that the argument slot is filled by xt in reading A and by x2 in reading B.</Paragraph> <Paragraph position="18"> As an example for a packed UDRS consider the following representation for I saw every man with a telescope.</Paragraph> <Paragraph position="20"> In the implementation contexts are represented by Prolog variables. In this way disambiguation is ensured to be monotonic1: A context d can be cancelled by grounding the Prolog variable representing d to a specific atom &quot;no&quot;. The formalism also allows any kind of partially disambiguated structures since thc variables for the readings do not interact.</Paragraph> <Paragraph position="21"> In the above version of UDRS packing, disjuncts are re\[fled. Another way to represent referential ambiguities is to retry argument slots using additional variable names (L and X below, not to be mistaken as discourse referents). Disjunctions are then represented directly.</Paragraph> <Paragraph position="23"/> <Section position="1" start_page="909" end_page="910" type="sub_section"> <SectionTitle> Representations </SectionTitle> <Paragraph position="0"> UDRS construction (Frank and Reyle, 1992), (Bos, 1995) is different from conventional semantic construction in that embedding is not represented directly but by means of labels. Tile only semantic composition operation is concatenation. In addition labels and discourse referents are matched as specified in the semantic part; of the grammar rules (the &quot;semantic grammm'&quot;). In the semantic grmnnmr every nonterminal is assigned a list of arguments. For every operator (e.g. an NP) a lower label and a series of upper labels must be given. The lower label points to material which must be in the scope of the operator (e.g. tile verb). The upper labels refer to thc minimal scope domain tile operator must occur in. This domain differs for indefinite NPs and quantifier NPs since these types of NPs are sub-ject to different island constraints (only indefinites can be raised over clause boundaries). Furthermore, the semantic grammar specifies the UDRS conditions introduced by lexical items and rules and determines the arguments to be matched in rules and lexical items. It also gives the direction of this matching by fixing in which lexical item an argument originates (see last slot of lcxical entries). If an argument originates in an item (becmlsc it is e.g. its instance discourse referent or label) then the value of this argument is unambigous for the item 2. In adjunct\[on structures, the modified constituent assigns and the modifier receives the shared discourse referent. Consider the following example grammar 3.</Paragraph> <Paragraph position="1"> start symbol (s/\[_Event, _VerbL ,Top ,Top\] ,</Paragraph> <Paragraph position="3"> eA similar train of thought lies behind the notion of &quot;focus&quot; proposed by Tomita (Tomita, 1985). A &quot;focus&quot; in a rule is the constituent which gets assigned all argument fi'om the &quot;ba(:kground&quot; constituents of the rule. Ill general this notion of focus must be feint\[vised to individual argmnents. Constituent 1 can be focus with respect to argument i while constituent 2 is focus for argument j in a rule.</Paragraph> <Paragraph position="4"> aThc Prolog symbol leq represents the UDRS subordination relation <.</Paragraph> <Paragraph position="5"> lex (every, de\] / \[X, ResL, VerbL ,DomL, _TopL\] ,</Paragraph> <Paragraph position="7"> Let us turn now t;o tim semanl;ie construction component;. The tree readings of the DPF correspond to the contexts of tim packed UDRS. The motivation behind this layout is that; in most eases syntaclic ambiguity has some impact on the semantic readings 4. The construction algorithm traverses the DPF and assigns to each vertex the argument list associated with its category in the semantic grammar. The arguments on this list are not arguments proper as they would be if only pm'se trees were considered, but fl, nctions fl'om contexts to arguments proper. These flmctions at'(.' total only tbr the root and the leaves, for inner nodes v they are restricted to the union D1 of the. context sets at \];he edges st;arl,ing at v. A predicate match1 matches arguments proper as given in the lexical e.ntries and the starl;symbol de(:laratkm onto tim(lions as used in the rules.</Paragraph> <Paragraph position="8"> Let D1 be a context set {dl,...,dn}, let LexArg be an argument as provided l)y a lexicM item or startsymbol declaration i, let Art be mt argument as occurring attached to a nonterminal on the right-hand side of a grammar rule.</Paragraph> <Paragraph position="9"> Then the predicate match1 unifies LexArg with Art if LexArg does not originate in I. If LexArg does, Arg is unified with the fimction {dl -9 LexArg, . . . , dn -+ LcxAr.q}.</Paragraph> <Paragraph position="10"> Let us assume a boi,toln-ut) traversal of the parse forest and let e be the edge fi'om v to one of il;s successors w. Then the arguments already presenl; a at; w must be matched with the arguments predicted for w by the semantic rule corresponding to e (predicate match2). Let D2 be the context set, assigned to e. Then only the argunmnt values of the contexl, s in D2 are unified. In \],his way it is guaranteed l;hat argument matching is done 4If several tree readings correspond to a single context (semantic reading) this is reeognised in the la.st step (determining unambiguous arguments) where the tree readings are merged.</Paragraph> <Paragraph position="11"> '~The boLl;ore-up ;kssuml)tion makes sure that vertex w has t)een treated.</Paragraph> <Paragraph position="12"> as it would be done in the underlying trees: Tile contexts clearly separate the information flow.</Paragraph> <Paragraph position="13"> Let D2 be the context set {dl,...,d,~} at e, let UpperArg be an argument as provided by the semantic rule corresponding to edge e, let LowerArg be an argument as attached to the vertex w.</Paragraph> <Paragraph position="14"> Then the predicate match2 unifies UpperArg with the restriction of the flmction LowerArg to the context;s in D2 {dl -~ vl,... ,d, ~ v,} (a subset of LowerArg).</Paragraph> <Paragraph position="15"> In the linM step the packed UDRS is traversed att(t flmetions whet'(; all eontexl;s point to a single value are replaced by this vahm.</Paragraph> </Section> </Section> <Section position="7" start_page="910" end_page="4862" type="metho"> <SectionTitle> 7 Comparison with Other Approaches </SectionTitle> <Paragraph position="0"> This seel;ion discusses two evaluation criteria for approaches to semantic underspecification. The present, proposal is measured against the criteria, and so are the Minimal I/,eeursion Semantics approach (Egg and Lebelh, 1995), the Radical Underspeeifieation approach (Pinkal, 1995), and the Core Language Engine approach (Alshawi, 1992).</Paragraph> <Paragraph position="1"> The first criterion is coverage. Several types of syntactic ambiguities can be distinguished.</Paragraph> <Paragraph position="2"> words (A subcase of this type of ambiguity is tit(; treatment of mlknown input words.) The MRS approach is restricted to adjunct\]on ambiguities, while the othex approaches are applicable to all the kinds of ambiguities mentioned. A drawback of the MRS approach might be that it generates semantic readings which are not licensed by the syntactic structure. To give an example consider the sentence l saw a 'man in the apartmerit with a telescope. MRS produces a spurious reading in which the PP with a telescope adjoins to the NP a man while the PP in the apartment modifies the hill sentence. Remember that MRS does not use. a parse forest as input structure but an arbitrary parse tree, i.e. one speeitic syntactic reading. MRS re-ambiguates the parse tree only afi;erwards within semantic constrn~:tion. At this point information about positions in the input; string is lost.</Paragraph> <Paragraph position="3"> Another test is the uschtlness of the representation for fllrther processing. Such processes at'(.' All these processes can successfully handle scopally underspecified structures (for sortal disambiguation and transfer see the Core Language Engine (Alshawi, 1992), for theorem proving see the Underspecified DRS formalism (Reyle, 1993)). In the Core Language Engine approach to syntactic underspecification the representation must be unpacked to perform disambiguation by sorts. This seems to be true for any approach relying on delay of semantic construction operations: In order to apply the sortal restrictions of, e.g., a verb to one of its argument discourse referents it must be known which discourse referents could possibly fill the argument slot. Moore and Alshawi (AIshawl, 1992) explain their reluctance to apply sort restrictions already in the packed structure with the maintenance overhead in associating semantic records with vertices of the forest. In the packed UDRS approach the problem is handled by explicitly enumerating all possible readings. Then, the maintenance effort is reduced to the effort of extrapolating the tree readings from the parse forest. None of the compared approaches makes any claims about theorem proving and transfer. In the packed UDRS approach it is conceivable to delay actual disambiguation as long as possible: Apart from the potential representation of referentim ambiguities by functions packed UDRSs look exactly like UDRSs. So if only referentially unambiguous conditions must be consulted in a proof, a UDRS theorem prover may be used.</Paragraph> </Section> class="xml-element"></Paper>