XML Viewer - p93-1014

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/p93-1014_metho.xml
Size: 18,476 bytes
Last Modified: 2025-10-06 14:13:30
<?xml version="1.0" standalone="yes"?>
<Paper uid="P93-1014">
  <Title>A UNIFICATION-BASED PARSER FOR RELATIONAL GRAMMAR</Title>
  <Section position="4" start_page="0" end_page="97" type="metho">
    <SectionTitle>
STRATIFIED FEATURE GRAM-
MAR
</SectionTitle>
    <Paragraph position="0"> SFG's key innovation is the generalization of the concept \]eature to a sequence of so-called relational signs (R-signs). The interpretation of a stratified feature is that each R-sign in a sequence denotes a primitive relation in different strata. 1 For instance, in Joe gave Mary tea there are, at the clause level, four sister arcs (arcs with the same source node), as shown in Figure h one arc labeled \[HI with target gave, indicating gave is the head of the clause; one with label \[1\] and target Joe, indicating Joe is both the predicateargument, and surface subject, of the clause; one with label \[3,2\] and target Mary, indicating that l We use the following R-signs: 1 (subject), 2 (direct object), 3 (indirect object), 8 (chSmeur), Cat (Category), C (comp), F (flag), H (head), LOC (locative), M (marked), as well as the special Null R-signs 0 and/, explainedbelow.  Mary is the predicate-argument indirect object, but the surface direct object, of the clause; and one with label \[2,8\] and target tea, indicating tea is the predicate-argument direct object, but surface ch6meur, of the clause. Such a structure is called a stratified feature graph (S-graph).</Paragraph>
    <Paragraph position="1"> This situation could be described in SFG logic with the following formula (the significance of the different label delimiters (,), \[, \] is explained below): null</Paragraph>
  </Section>
  <Section position="5" start_page="97" end_page="98" type="metho">
    <SectionTitle>
RI:-- \[Hi:gave A \[1):Joe
</SectionTitle>
    <Paragraph position="0"> A \[3, 2): Mary A \[2, 8): tea .</Paragraph>
    <Paragraph position="1"> In RG, the clause-level syntactic information captured in R1 combines two statements: one characterizing gave as taking an initial 1, initial 2 and initial 3 (Ditransltive); and one characterizing the concomitant &amp;quot;advancement&amp;quot; of the 3 to 2 and the &amp;quot;demotion&amp;quot; of the 2 to 8 (Dative). In SFG, these two statements would be:</Paragraph>
    <Paragraph position="3"> Dative :---- (3, 2): T ~ (2, 8_): T.</Paragraph>
    <Paragraph position="4"> Ditransitive involves standard Boolean conjunction (A). Dative, however, involves an operator, &amp;, unique to SFG. Formulas involving ~ are called e~tension formulas and they have a more complicated semantics. For example, Dative has the following informal interpretation: Two distinct arcs with labels 3 and 2 may be &amp;quot;extended&amp;quot; to (3,2) and (2,8) respectively. Extension formulas are, in a sense, the heart of the SFG description language, for without them RG analyses could not be properly represented. 2 2We gloss over many technicalities, e.g., the SFG notion data justification and the formal semantics of stratified features; cf. \[2\].</Paragraph>
    <Paragraph position="5"> RG-style analyses can be captured in terms of rules such as those above. Moreover, since the above formulas state positive constraints, they can be represented as S-graphs corresponding to the minimal satisfying models of the respective formulas. We compile the various rules and their combinations into Rule Graphs and associate sets of these with appropriate lexical anchors, resulting in a lexicalized grammar, s S-graphs are formally feature structures: given a collection of sister arcs, the stratified labels are required to be functional. However, as shown in the example, the individual R-signs are not. Moreover, the lengths of the labels can vary, and this crucial property is how SFG avoids the &amp;quot;carry over&amp;quot; problem. S-graphs also include a strict partial order on arcs to represent linear precedence (cf. \[3\], \[9\]). The SFG description language includes a class of linear precedence statements, e.g., (1\] -4 (Hi means that in a constituent &amp;quot;the final subject precedes the head&amp;quot;.</Paragraph>
    <Paragraph position="6"> Given a set 7Z,9 of R-signs, a (stratified) feature (or label) is a sequence of R-signs which may be closed on the left or right or both. Closed sides are indicated with square brackets and open sides with parentheses. For example, \[2, 1) denotes a label that is closed on the left and open on the right, and \[3, 2, 1, 0\] denotes a label that is closed on both sides. Labels of the form \[-.-\] are called (totally) closed; of the form (...) (totally) open; and the others partially closed (open) or closed (open) on the right (left), as appropriate.</Paragraph>
    <Paragraph position="7"> Let BPS denote the set of features over 7Z *. BPS is partially ordered by the smallest relation C_ permitting eztension along open sides. For example, (3) ___ (3,2) U \[3,2,1) C \[3,2, 1,0\].</Paragraph>
    <Paragraph position="8"> Each feature l subsuming (C) a feature f provides a partial description of f. The left-closed bracket \[ allows reference to the &amp;quot;deepest&amp;quot; (initia~ R-sign of a left-closed feature; the right-closed bracket \] to the &amp;quot;most surfacy&amp;quot; (fina~ R-sign of a right-closed feature. The totally closed features are maximal (completely defined) and with respect to label unification, defined below, act like ordinary (atomic) features.</Paragraph>
    <Paragraph position="9"> Formal definitions of S-graph and other definitions implicit in our work are provided in \[2\]. s We ignore negative constraints here.</Paragraph>
  </Section>
  <Section position="6" start_page="98" end_page="98" type="metho">
    <SectionTitle>
AN EXAMPLE
</SectionTitle>
    <Paragraph position="0"> Figure 2 depicts the essential aspects of the S-graph for John seemed ill. Focus on the features \[0,1\] and \[2,1,0\], both of which have the NP John as target (indicated by the ~7's). The R-sign 0 is a member of Null, a distinguished set of R-signs, members of which can only occur next to brackets \[ or \]. The prefix \[2,1) of the label \[2,1,0\] is the SFG representation of RG's unaccusative analysis of adjectives. The suffix (1,0\] of \[2,1,0\]; the prefix \[0,1) of the label \[0,1\] in the matrix clause; and the structure-sharing collectively represent the raising of the embedded subject (cf. Figure 3).</Paragraph>
    <Paragraph position="1"> Given an S-graph G, Null R-signs permit the definitions of the predicate-argument graph, and the surface graph, of G. The predicate-argument graph corresponds to all arcs whose labels do not begin with a Null R-sign; the relevant R-signs are the first ones. The surface graph corresponds to all arcs whose labels do not end with a Null R-sign; the relevant R-signs are the final ones. In the example, the arc labeled \[0,1\] is not a predicate-argument arc, indicating that John bears no predicate-argument relation to the top clause. And the arc labeled \[2,1,0\] is not a surface arc, indicating that John bears no surface relation to the embedded phrase headed by ill.</Paragraph>
    <Paragraph position="2"> The surface graph is shown in Figure 4 and the predicate-argument graph in Figure 5. Notice that the surface graph is a tree. The treehood of surface graphs is part of the definition of S-graph and provides the foundation for our parsing algorithm; it is the SFG analog to the &amp;quot;context-free backbone&amp;quot; typical of unification-based systems \[11\].</Paragraph>
  </Section>
  <Section position="7" start_page="98" end_page="100" type="metho">
    <SectionTitle>
LEXICALIZED SFG
</SectionTitle>
    <Paragraph position="0"> Given a finite collection of rule graphs, we could construct the finite set of S-graphs reflecting all consistent combinations of rule graphs and then associate each word with the collection of derived graphs it anchors. However, we actually only construct all the derived graphs not involving extractions. Since extractions can affect almost any arc, compiling them into lexicalized S-graphs would be impractical. Instead, extractions are handled by a novel mechanism involving multi-rooted graphs (of. Concluding Remarks).</Paragraph>
    <Paragraph position="1"> We assume that all lexically governed rules such as Passive, Dative Advancement and Raising are compiled into the lexical entries governing them.</Paragraph>
    <Paragraph position="2">  Thus, given has four entries (Ditransitive, Ditransitive + Dative, Passive, Dative + Passive). This aspect of our framework is reminiscent of LFG \[4\] and HPSG \[7\], except that in SFG, relational structure is transparently recorded in the stratified features. Moreover, SFG relies neither on LFG-style annotated CFG rules and equation solving nor on HPSG-style SUBCAT lists.</Paragraph>
    <Paragraph position="3"> We illustrate below the process of constructing a lexical entry for given from rule graphs (ignoring morphology). The rule graphs used are for Ditransitive, Dative and (Agentless) Passive constructions. Combined, they yield a ditransitivedative-passive S-graph for the use of given occurring in Joe was given ~ea (cf. Figure 6).</Paragraph>
    <Paragraph position="5"> The idea behind label unification is that two compatible labels combine to yield a label with maximal nonempty overlap. Left (right) closed labels unify with left (right) open labels to yield left (right) closed labels. There are ten types of label unification, determined by the four types of bracket pairs: totally closed (open), closed only on the left (right). However, in parsing (as opposed to building a lexicalized grammar), we stipulate that successful label unification must result in a ~o~ally closed label. Additionally, we assume that all labels in well-formed lexicalized graphs (the input graphs to the parsing algorithm) are at least partially closed. This leaves only four cases:  Case 1. \[or\] Ll \[o~1 = \[Or\] Case 2. \[~) u \[~#\] = \[~#1 Case 3. (o~\] LI \[~\] : \[~c~\] Case 4. \[+#) u (#+\] = \[+#+\] Note: c~, fl, 7 @ T~S+ and/3 is the longest common, nonempty string.</Paragraph>
    <Paragraph position="6">  The following list provides examples of each.  1. \[1,0\] U \[1,0\] = \[1,0\] 2. \[1) U \[1,0\] = \[1,0\] 3. (~,0\] U \[2,1,0\] = \[2,1,0\] 4. \[2,1) U (1,0\] = \[2,1,0\]  Case 1 is the same as ordinary label unification under identity. Besides their roles in unifying rule-graphs, Cases 2, 3 and 4 are typically used in parsing bounded control constructions (e.g., &amp;quot;equi&amp;quot; and &amp;quot;raising&amp;quot;) and extractions by means of &amp;quot;splicing&amp;quot; Null R-signs onto the open ends of labels and closing off the labels in the process. We note in passing that cases involving totally open labels may not result in unique unifications, e.g., (1, 2) U (2, 1) can be either (2,1,2) or (1,2,1). In practice, such aberrant cases seem not to arise. Label unification thus plays a central role in building a lexicalized grammar and in parsing.</Paragraph>
  </Section>
  <Section position="8" start_page="100" end_page="102" type="metho">
    <SectionTitle>
THE PARSING ALGORITHM
</SectionTitle>
    <Paragraph position="0"> S-unification is like normal feature structure unification (\[1\], \[11\]), except that in certain cases two arcs with distinct labels 1 and l' are replaced by a single arc whose label is obtained by unifying  2. Unify-Arcs(A,A') consists of the steps: a. Unify label(A) and label(A') b. Unify-Nodes(target (A),target (A')) 3. Unify-Sets-of-Arcs(SeQ, Set2), where Sett = {Aj,...,A~} and Set2 = {Am,..., An}, returns a set of arcs Set3, derived as follows: a. For each arc Ai * SeQ, attempt to find some arc A~ * Set2, such that Step 2a of Unify-arcs(Ai,A~) succeeds. If Step 2a succeeds, proceed to Step 2b and remove A~ from Sets. There are three possibilities: null i. If no A~ can be found, Ai * Set3.</Paragraph>
    <Paragraph position="1"> ii. If Step 2a and 2b both succeed, then</Paragraph>
    <Paragraph position="3"> iii. If Step 2a succeeds, but Step 2b fails, then the procedure fails.</Paragraph>
    <Paragraph position="4"> b. Add each remaining arc in Set2 to Set3. We note that the result of S-unification can be a set of S-graphs. In our experience, the unification of linguistically well-formed lexical S-graphs has never returned more than one S-graph. Hence, S-unification is stipulated to fail if the result is not unique. Also note that due to the nature of label unification, the unification procedure does not guarantee that the unification of two S-graphs will be functional and thus well-formed. To insure functionality, we filter the output.</Paragraph>
    <Paragraph position="5"> We distinguish several classes of Arc: (i) Surface Arc vs. Non-Surface, determined by absence or presence of a Null R-sign in a label's last position; (ii) Structural Arc vs. Constraint Arc (stipulated by the grammar writer); and (iii) Relational Arc vs. Category Arc, determined by the kind of label (category arcs are atomic and have R-signs like Case, Number, Gender, etc.). The parser looks for arcs to complete that are Surface, Structural and Relational (SSR).</Paragraph>
    <Paragraph position="6"> A simplified version of the parsing algorithm is sketched below. It uses the predicates Left- null A. Left-Precedence(A, n~) is true iff: a. All surface arcs which must follow F are incomplete.</Paragraph>
    <Paragraph position="7"> b. A can precede F.</Paragraph>
    <Paragraph position="8"> c. All surface arcs which must both precede F and follow A are complete. null B. Right-Precedence(A, n~) is true iff: a. All surface arcs which must precede F are complete.</Paragraph>
    <Paragraph position="9"> b. A can follow F.</Paragraph>
    <Paragraph position="10"> c. All surface arcs which must both  follow F and precede A are complete. null  2. Complete : A node is complete if it is either a lexical anchor or else has (obligatory) out-going SSR arcs, all of which are complete. An arc is complete if its target is complete.  The algorithm is head-driven \[8\] and was inspired by parsing algorithms for lexicalized TAGs (\[6\], \[10\]).</Paragraph>
    <Section position="1" start_page="101" end_page="102" type="sub_section">
      <SectionTitle>
Simplified Parsing Algorithm:
</SectionTitle>
      <Paragraph position="0"> Input: A string of words Wl,..., w~.</Paragraph>
      <Paragraph position="1"> Output: A chart containing all possible parses. Method:  A. Initialization: 1. Create a list of k state-sets $1,..., Sk, each empty.</Paragraph>
      <Paragraph position="2"> 2. For c = 1,...,k, for each Graph(hi) of Wc, add \[ni, c - 1, c\] to Se.</Paragraph>
      <Paragraph position="3"> B. Completions:  For c = 1,..., k, do repeatedly until no more states can be added to Se:  To illustrate, we step through the chart for John seemed ill ( cf. Figure 7). In the string 0 John 1 seemed 2 ill 3, where the integers represent string positions, each word w is associated via the lexicalized grammar with a finite set of anchored Sgraphs. For expository convenience, we will assume counterfactually that for each w there is only one S-graph G~ with root r~ and anchor w. Also in the simplified case, we assume that the anchor is always the target of an arc whose source is the root. This is true in our example, but false in general.</Paragraph>
      <Paragraph position="4"> For each G~, r~ has one or more outgoing SSR arcs, the set of which we denote SSR-Out-Arcs(r~). For each w between integers x and y in the string, the Initialization step (step A of the algorithm) adds \[n~, x, y\] to state set y. We denote state Q in state-set Si as state i:Q. For an input string w = Wl,...,w,~, initialization creates n state-sets and for 1 &lt; i &lt; n, adds states i : Qj,1 _&lt; j &lt; k, to Si , one for each of the k S-graphs G~. associated with wi. After initialization, the example chart consists of states 1:1, 2:1, 3:1.</Paragraph>
      <Paragraph position="5"> Then the parser traverses the chart from left to right starting with state-set 1 (step B of the algorithm), using left and right completions, according to whether left or right precedence conditions are used. Each completion looks in a state-set to the left of Sc for a state meeting a set of conditions. In the example, for c = 1, step B of the algorithm does not find any states in any state-set preceding S1 to test, so the parser advances c to 2. A left completion succeeds with Qi = state 2:1 = \[hi, 1, 2\] and Qj = state 1:1 = \[nj, 0, 1\]. State 2:2 = \[n~, 0, 2\] is added to state-set $2, where n~ = Unify-at-end-of-Path(n,, nj, \[0, 1)). Label \[0, 1) is closed off to yield \[0, 1\] in the output graph, since no further R-signs may be added to the label once the arc bearing the label is complete.</Paragraph>
      <Paragraph position="6"> The precedence constraints are interpreted as strict partial orders on the sets of outgoing SSR arcs of each node (in contrast to the totally ordered lexicalized TAGs). Arc \[0, 1) satisfies leftprecedence because: (i) \[0, 1) is an incomplete terminal arc, where a terminal arc is an SSR arc, the target of which has no incomplete outgoing surface arcs; (ii) all surface arcs (here, only \[C\]) which must follow the \[H\] arc are incomplete; (iii) \[0 1) can precede \[H\]; and (iv) there are no (incomplete) surface arcs which must occur between \[0 1) and \[H\]. (We say can in (iii) because the parser accomodates variable word order.) The parser precedes to state-set $3. A right completion succeeds with Q~ = state 2:2 = \[n~, 0, 2\] and Q~ = state 3:1 = \[n~,2,3\]. State 3:2 \[n~', 0, 3\] is added to state set $3, n~' = Unify-at- null end-of-Path(n~, n~, \[C\]). State 3:2 is a successful parse because n~' is complete and spans the entire input string.</Paragraph>
      <Paragraph position="7"> To sum up: a completion finds a state Qi = \[hi, L,, R~\] and a state Qj = \[nj, Lj, Rj\] in adjacent state-sets (Li = Rj or P~/ = Lj) such that ni is incomplete and nj is complete. Each successful completion completes an arc A E SSR-Out-Arcs(n~) by unifying nj with the target of A. Left completion operates on a state Qi = \[ni,Li, c\] in the current state-set Sc looking for a state Qj = \[nj, Lj, L~\] in state-set SL, to complete some arc A E SSR-Out-Arcs(ni). Right completion is the same as left completion except that the roles of the two states are reversed: in both cases, success adds a new state to the current state-set So. The parser completes arcs first leftward from the anchor and then rightward from the anchor.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML