XML Viewer - p97-1060

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1060_metho.xml
Size: 26,560 bytes
Last Modified: 2025-10-06 14:14:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1060">
  <Title>Representing Constraints with Automata</Title>
  <Section position="3" start_page="468" end_page="470" type="metho">
    <SectionTitle>
2 Defining Automata with
Constraints
</SectionTitle>
    <Paragraph position="0"> Tree automata. For completeness, we sketch the definitions of trees and tree automata here. An introduction to tree automata can be found in G~cseg and Steinby (1984), as well as in Thatcher and Wright (1968) and Doner (1970).</Paragraph>
    <Paragraph position="1"> Assume an alphabet E = E0 LJ E2 with Eo = {A} and E2 being a set of binary operation symbols. We think of (binary) trees over E as just the set of terms Tr. constructed from this alphabet. That is, we let A be the empty tree and let a(tl,t2), for a E E2 and tl,t2 E T~., denote the tree with label a and subtrees tl, t2. Alternatively, we can think of a tree t as a function from the addresses in a binary tree domain T to labels in E. 3 A deterministic (bottom-up) tree automaton .4 on binary trees is a tuple (A, E, a0, F, c~ / with A the set 3The first approach is developed in Thatcher and Wright (1968), the second in Doner (1970). A tree domain is a subset of strings over a linearly ordered set which is closed under prefix and left sister.</Paragraph>
    <Paragraph position="2"> of states, a0 E A the initial state, F C_ A the final states and a : (A x A x E) -+ A the transition function. The transition function can be thought of as a homomorphism on trees inductively defined as: h~(~) : a0 and h~(a(tl, t2)) = a(h~(tl), ha(t2), a).</Paragraph>
    <Paragraph position="3"> An automaton .4 accepts a tree t iff ha (t) E F. The language recognized by A is denoted by T(A) = {tlh,(t) E F}.</Paragraph>
    <Paragraph position="4"> Emptiness of the language T(,4) is decidable by a fixpoint construction computing the set of reachable states. The reachability algorithm is given below in Figure 1. R contains the reachable states constructed so far, and R' contains possibly new states constructed on the current pass through the loop.</Paragraph>
    <Paragraph position="5">  T(A) is empty if and only if no final state is reach1. R := {ao}, R' := 0.</Paragraph>
    <Paragraph position="6"> 2. For all (ai,aj) E R x R, for all a E E, R' := R'U {c~(ai,aj,a)}.</Paragraph>
    <Paragraph position="7"> 3. If R r- R = 0 then return R, else R := R U R', go to step 2.</Paragraph>
    <Paragraph position="8">  able. Naturally, if we want to test emptiness, we can stop the construction as soon as we encounter a final state in R r. Note that, given an automaton with k states, the algorithm must terminate after at most k passes through the loop, so the algorithm terminates after at most k 3 searches through the transition table. null Sets of trees which are the language of some tree automaton are called recognizable. 4 The recognizable sets are closed under the boolean operations of conjunction, disjunction and negation, and the automaton constructions which witness these closure results are absolutely straightforward generalizations of the corresponding better-known constructions for finite state automata. The recognizable sets are also closed under projections (mappings from one alphabet to another) and inverse projections, and again the construction is essentially that for finite state automata. The projection construction yields a nondeterministic automaton, but, again as for FSA's, bottom-up tree automata can be made deterministic by a straightforward generalization of the subset construction. (Note that top-down tree automata do not have this property: deterministic top-down tree automata recognize a strictly narrower family of tree sets.) Finally, tree automata can  string languages, so MSO logics are limited to context free power. However, the CLP extension discussed below can be used to amplify the power of the formalism where necessary.</Paragraph>
    <Paragraph position="9">  be minimized by a construction which is, yet again, a straightforward generalization of well known FSA techniques.</Paragraph>
    <Paragraph position="10"> The weak second order theory of two successor functions. One attraction of monadic second order tree logics is that they give us a principled means of generating automata from a constraint-based theory. The connection allows the linguist to specify ideas about natural language in a concise manner in logic, while at the same time providing a way of &amp;quot;compiling&amp;quot; those constraints into a form which can be efficiently used in natural language processing applications.</Paragraph>
    <Paragraph position="11"> The translation is provided via the weak monadic second order theory of two successor functions (WS2S). The structure of two successor functions, H2, has for its domain (N2) the infinite binary branching tree. Standardly the language of WS2S is based on two successor functions (left-daughter and right-daughter), but, as Rogers (1994) shows, this is intertranslatable with a language based on dominance and precedence relations. Because we choose the monadic second order language over whichever of these two signatures is preferred, we can quantify over sets of nodes in N2. So we can use these sets to pick out arbitrarily large finite trees embedded in N2. Second order variables can also be used to pick out other properties of nodes, such as category or other node-labeling features, and they can be used to pick out higher order substructures such as :~ projections or chains.</Paragraph>
    <Paragraph position="12"> As usual, satisfiability of a formula in the language of WS2S by Af2 is relative to an assignment function, mapping individual variables to members of N2 (as in first order logic) and mapping monadic predicate variables to subsets of N2. Following Biichi (1960), Doner (1970) and Thatcher and Wright (1968) show that assignment functions for such formulas can be coded by a labeling of the nodes in N2 in the following way. First, we treat individual variables as set variables which are constrained to be singleton sets (we can define the singletonhood property in MSO tree logic). So, without loss of generality, we can think of the domain of the assignment function as a sequence Xz,... , X~ of the variables occurring in the given formula. We choose our labeling alphabet to be the set of length n bit strings: (0, 1} ~. Then, for every node n E N2, if we intend to assign n to the denotation of Xi, we indicate this by labeling n with a bit string in which the ith bit is on. (In effect, we are labelling every node with a list of the sets to which it belongs.) Now every assignment function we might need corresponds uniquely to a labeling function over N2. What Doner, and Thatcher and Wright (and, for strong $2S, Rabin) show is that each formula in the language of WS2S corresponds to a tree automaton which recognizes just the satisfying &amp;quot;assignment labelings&amp;quot;, and we can thereby define a notion of &amp;quot;recognizable relation&amp;quot;. So the formula is satisfiable just in case the corresponding automaton recognizes a nonempty language. Note that any language whose formulas can be converted to automata in this way is therefore guaranteed to be decidable, though whether it is as strong as the language of WS2S must still be shown.</Paragraph>
    <Paragraph position="13"> This approach to theorem-proving is rather different from more general techniques for higher-order theorem proving in ways that the formalizer must keep in mind. In particular, we are deciding membership in the theory of a fixed structure, Af2, and not consequence of an explicit set of tree axioms.</Paragraph>
    <Paragraph position="14"> So, for example, the parse tree shows up in the formalization as a second order variable, rather than simply being a satisfying model (cf. Johnson (1994), on &amp;quot;satisfiability-based&amp;quot; grammar formalisms). As an example consider the following formula denoting the relation of directed asymmetric c-command 5 in the sense of Kayne (1994). We use the tree logic signature of Rogers (1994), which, in a second order setting, is interdefinable with the language of multiple successor functions. Uppercase letters denote second order variables, lowercase ones first order variables, &lt;~* reflexive domination, &lt;~+ proper domination and -4 proper precedence:</Paragraph>
    <Paragraph position="16"> The corresponding tree automaton is shown in Figure 2. On closer examination of the transitions, we note that we just percolate the initial state as long as we find only nodes which are neither xl nor x2. From the initial state on both the left and the right subtree we can either go to the state denoting &amp;quot;found xl&amp;quot; (al) if we read symbol 10 or to the state denoting &amp;quot;found x2&amp;quot; (a2) if we read symbol 01. We can then percolate a2 as long as the other branch does not immediately dominate xl. When we have</Paragraph>
    <Paragraph position="18"> al on the left subtree and a2 on the right one, we go to the final state aa which again can be percolated as long as empty symbols are read. Clearly, the automaton recognizes all trees which have the desired c-command relation between the two nodes. It compactly represents the (infinite) number of possible satisfying assignments.</Paragraph>
    <Paragraph position="19"> The proof of the decidability of WS2S furnishes a technique for deriving such automata for recognizable relations effectively. (In fact the above automaton was constructed by a simple implementation of such a compiler which we have running at the University of Tiibingen. See Morawietz and Cornell (1997).) The proof is inductive. In the base case, relations defined by atomic formulas are shown to be recognizable by brute force. Then the induction is based on the closure properties of the recognizable sets, so that logical operators correspond to automaton constructions in the following way: conjunction and negation just use the obvious corresponding automaton operations and existential quantification is implemented w~th the projection construction. The inductive nature of the proof allows us a fairly free choice of signature, as long as our atomic relations are recognizable. We could, for example, investigate theories in which asymmetric c-command was the only primitive, or asymmetric c-command plus dominance, for example.</Paragraph>
    <Paragraph position="20"> The projection construction, as noted above, yields nondeterministic automata as output, and the negation construction requires deterministic automata as input, so the subset construction must be used every time a negated existential quantifier is encountered. The corresponding exponential blowup in the state space is the main cause of the non-elementary complexity of the construction. Since a quantifier prefix of the form 3-.. 3V...V3... is equivalent to 3... 373---373.-- we see that the stack of exponents involved is determined by the number of quantifier alternations.</Paragraph>
    <Paragraph position="21"> It is obviously desirable to keep the automata as small as possible. In our own prototype, we minimize the outputs of all of our automata constructions. Note that this gives us another way of determining satisfiability, since the minimal automaton recognizing the empty language is readily detectable: its only state is the initial state, and it is not final.</Paragraph>
  </Section>
  <Section position="4" start_page="470" end_page="471" type="metho">
    <SectionTitle>
3 Defining Constraints with
Automata
</SectionTitle>
    <Paragraph position="0"> An obvious goal for the use of the discussed approach would be the (offline) generation of a tree automaton representing an entire grammar. That is, in principle, if we can formalize a grammar in an MSO tree logic, we can apply these compilation techniques to construct an automaton which recognizes all and only the valid parse trees. 6 In this setting, the parsing problem becomes the problem of conjoining an automaton recognizing the input with the grammar automaton, with the result being an automaton which recognizes all and only the valid parse trees. For example, assume that we have an automaton Gram(X) such that X is a well-formed tree, and suppose we want to recognize the input John sees Mary. Then we conjoin a description of the input with the grammar automaton as given be-</Paragraph>
    <Paragraph position="2"> The recognition problem is just the problem of determining whether or not the resulting automaton recognizes a nonempty language. Since the automaton represents the parse forest, we can run it to generate parse trees for this particular input.</Paragraph>
    <Paragraph position="3"> Unfortunately, as we have already noted, the problem of generating a tree automaton from an arbitrary MSO formula is of non-elementary complexity. Therefore, it seems unlikely that a formalization of a realistic principle-based grammar could be compiled into a tree automaton before the heat death of the universe. (The formalization of ideas from Relativized Minimality (Pdzzi 1990) presented in Rogers (1994) fills an entire chapter without specifying even the beginning of a full lexicon, for example.) Nonetheless there are a number of ways in which these compilation techniques remain useful. First, though the construction of a grammar automaton is almost certainly infeasible for realistic grammars, the construction of a grammar-andinput automaton--which is a very much smaller  machine--may not be. We discuss techniques based on constraint logic programming that are applicable to that problem in the next section.</Paragraph>
    <Paragraph position="4"> Another use for such a compiler is suggested by the standard divide-and-conquer strategy for problem solving: instead of compiling an entire grammar formula, we isolate interesting subformulas, and attempt to compile them. Tree automata represent properties of trees and there are many such properties less complex than global well-formedness which are nonetheless important to establish for parse trees. In particular, where the definition of a property of parse trees involves negation or quantification, including quantification over sets of nodes, it may be easier to express this in an MSO tree logic, compile the resulting formula, and use the resulting automaton as a filter on parse trees originally generated by other means (e.g., by a covering phrase structure grammar).</Paragraph>
    <Paragraph position="5"> At the moment, at least, the question of which grammatical properties can be compiled in a reasonable time is largely empirical. It is made even more difficult by the lack of high quality software tools.</Paragraph>
    <Paragraph position="6"> This situation should be alleviated in the near future when work on MONA++ at the University of Aarhus is completed; the usefulness of its older sister MONA (Henriksen et al. 1995), which works on strings and FSA's, has been well demonstrated in the computer science literature. In the meantime, for tests, we are using a comparatively simple implementation of our own. Even with very low-power tools, however, we can construct automata for interesting grammatical constraints.</Paragraph>
    <Paragraph position="7"> For example, recall the definition of asymmetric c-command and its associated automaton in Figure 2.</Paragraph>
    <Paragraph position="8"> In linguistic applications, we generally use versions of c-command which are restricted to be local, in the sense that no element of a certain type is allowed to intervene. The general form of such a locality condition LC might then be formalized as follows.</Paragraph>
    <Paragraph position="10"> Here property P is meant to be the property identifying a relevant intervener for the relation meant to hold between x and y. Note that this property could include that some other node be the left successor of z with certain properties, that is, this general scheme fits cases where the intervening item is not itself directly on the path between x and y. This formula was compiled by us and yields the automaton in Figure 3. Here the first bit position indicates membership in P, the second is for x and the third</Paragraph>
    <Paragraph position="12"> This automaton could in turn be implemented itself as Prolog code, and considered to be an optimized implementation of the given specification.</Paragraph>
    <Paragraph position="13"> Note in particular the role of the compiler as an optimizer. It outputs a minimized automaton, and the minimal automaton is a unique (up to isomorphism) definition of the given relation. Consider again the definition of AC-Command in the previous section.</Paragraph>
    <Paragraph position="14"> It is far from the most compact and elegant formula defining that relation. There exist much smaller formulas equivalent to that definition, and indeed some are suggested by the very structure of the automaton. That formula was chosen because it is an extremely straightforward formalization of the prose definition of the relation. Nonetheless, the automaton compiled from a much cleverer formalization would still be essentially the same. So no particular degree of cleverness is assumed on the part of the formalizer; optimization is done by the compiler. 7</Paragraph>
  </Section>
  <Section position="5" start_page="471" end_page="473" type="metho">
    <SectionTitle>
4 MSO Logic and Constraint Logic
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="471" end_page="473" type="sub_section">
      <SectionTitle>
Programming
</SectionTitle>
      <Paragraph position="0"> The automaton for a grammar formula is presumably quite a lot larger than the parse-forest automaton, that is, the automaton for the grammar conjoined with the input description. So it makes sense to search for ways to construct the parse-forest automaton which do not require the prior construction of an entire grammar automaton. In this section we consider how we might do this by by the embedding 7The structure of the formula does often have an effect on the time required by the compiler; in that sense writing MSO formalizations is still Logic Programming.</Paragraph>
      <Paragraph position="1">  of the MSO constraint language into a constraint logic programming scheme. The constraint base is an automaton which represents the incremental accumulation of knowledge about the possible valuations of variables. As discussed before, automata are a way to represent even infinite numbers of valuations with finite means, while still allowing for the efficient extraction of individual valuations. We incrementally add information to this constraint base by applying and solving clauses with their associated constraints. That is, we actually use the compiler on line as the constraint solver. Some obvious advantages include that we can still use our succinct and flexible constraint language, but gain (a) a more expressive language, since we now can include inductive definitions of relations, and (b) a way of guiding the compilation process by the specification of appropriate programs.</Paragraph>
      <Paragraph position="2"> We define a relational extension TC(WS2S) of our constraint language following the HShfeld and Smolka scheme (HShfeld and Smolka 1988). From the scheme we get a sound and complete, but now only semi-decidable, operational interpretation of a definite clause-based derivation process. The resulting structure is an extension of the underlying constraint structure with the new relations defined via fixpoints.</Paragraph>
      <Paragraph position="3"> As usual, a definite clause is an implication with an atom as the head and a body consisting of a satisfiable MSO constraint and a (possibly empty) conjunction of atoms. A derivation step consists of two parts: goal reduction, which substitutes the body of a goal for an appropriate head, and constraint solving, which means in our case that we have to check the satisfiability of the constraint associated with the clause in conjunction with the current constraint store. For simplicity we assume a standard left-to-right, depth-first interpreter for the execution of the programs. The solution to a search branch of a program is a satisfiable constraint, represented in &amp;quot;solved form&amp;quot; as an automaton. Note that automata do make appropriate solved forms for systems of constraints: minimized automata are normal forms, and they allow for the direct and efficient recovery of particular solutions.</Paragraph>
      <Paragraph position="4"> Intuitively, we have a language which has an operational interpretation similar to Prolog with the differences that we interpret it not on the Herbrand universe but on N2, that we use MS0 constraint solving instead of unification and that we can use defined (linguistic) primitives directly.</Paragraph>
      <Paragraph position="5"> The resulting system is only semi-decidable, due to the fact that the extension permits monadic second order variables to appear in recursively defined clauses. So if we view the inductively defined relations as part of an augmented signature, this signature contains relations on sets. These allow the specification of undecidable relations; for example, Morawietz (1997) shows how to encode the PCP. If we limit ourselves to just singleton variables in any directly or indirectly recursive clause, every relation we define stays within the capacity of MSO logic, s since, if they are first order inductively definable, they are explicitly second order definable (Rogers 1994). Since this does not take us beyond the power of MSO logic and natural language is known not to be context-free, the extra power of TC(WS2S) offers a way to get past the context-free boundary.</Paragraph>
      <Paragraph position="6"> To demonstrate how we now split the work between the compiler and the CLP interpreter, we present a simple example. Consider the following naive specification of a lexicon: 9 Lexicon(x) ~:~ (x E Sees A x E V A . . . )</Paragraph>
      <Paragraph position="8"> We have specified a set called Lexicon via a disjunctive specification of lexical labels, e.g. Sees, and the appropriate combination of features, e.g.V. Naively, at least, every feature we use must have its own bit position, since in the logic we treat features as set variables. So, the alphabet size with the encoding as bitstrings will be at least 2 IAlphabet\[. It is immediately clear that the compilation of such an automaton is extremely unattractive, if at all feasible.</Paragraph>
      <Paragraph position="9"> We can avoid having to compile the whole lexicon by having separate clauses for each lexical entry in the CLP extension. Notational conventions will be that constraints associated with clauses are written in curly brackets and subgoals in the body are separated by &amp;'s. Note that relations defined in TC(WS2S) are written lowercase.</Paragraph>
      <Paragraph position="10"> lexicon(x) t--- {x E Sees A x E V A . . . } lexicon(x) +-- {x E John A x E N A . . . } lexicon(x) e-- {xEMaryAxENA...} This shifts the burden of handling disjunctions to the interpreter. The intuitive point should be clear: it  being stored in a global table so that we do not have to present them in each and every constraint. In particular, without this lexicon would have the additional arguments Sees, V, John, N, Mary and all free variables appearing in the other definitions.</Paragraph>
      <Paragraph position="11">  is not the case that every constraint in the grammar has to be expressed in one single tree automaton.</Paragraph>
      <Paragraph position="12"> We need only compile into the constraint store those which are really needed. Note that this is true even for variables appearing in the global table. In the CLP extension the appearance in the table is not coupled to the appearance in the constraint store.</Paragraph>
      <Paragraph position="13"> Only those are present in both which are part of the constraint in an applied clause.</Paragraph>
      <Paragraph position="14"> We can also use offline compiled modules in a T~(WS2S) parsing program. As a source of simple examples, we draw on the definitions from the lectures on P&amp;P parsing presented in Johnson (1995).</Paragraph>
      <Paragraph position="15"> In implementing a program such as Johnson's simplified parse relation--see Figure 4--we can in principle define any of the subgoals in the body either via precompiled automata (so they are essentially treated as facts), or else providing them with more standard definite clause definitions.</Paragraph>
      <Paragraph position="16"> parse(Words, Tree)  In more detail, Words denotes a set of nodes labeled according to the input description. Our initial constraint base, which can be automatically generated from a Prolog list of input words, is the corresponding tree automaton. The associated constraint Tree is easily compilable and serves as the initialization for our parse tree. The yield and ecp predicates can easily be explicitly defined and, if practically compilable (which is certainly the case for yield), could then be treated as facts. The xbar predicate, on the other hand, is a disjunctive specification of licensing conditions depending on different features and configurations, e.g., whether we are faced with a binary-, unary- or non-branching structure, which is better expressed as several separate rules. In fact, since we want the lexicon to be represented as several definite clauses, we cannot have xbar as a simple constraint. This is due to the limitation of the constraints which appear in the definite clauses to (pure) MSO constraints.</Paragraph>
      <Paragraph position="17"> We now have another well-defined way of using the offiine compiled modules. This, at least, separates the actual processing issues (e.g., parse) from the linguistically motivated modules (e.g., ecp). One can now see that with the relational extension, we can not only use those modules which are compilable directly, but also guide the compilation procedure. In effect this means interleaving the intersection of the grammar and the input description such that only the minimal amount of information to determine the parse is incrementally stored in the constraint base.</Paragraph>
      <Paragraph position="18"> Furthermore, the language of 7~(WS2S) is sufficiently close to standard Prolog-like programming languages to allow the transfer of techniques and approaches developed in the realm of P&amp;P-based parsing. In other words, it needs only little effort to translate a Prolog program to a T~(WS2S) one.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML