File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/91/e91-1037_abstr.xml

Size: 28,629 bytes

Last Modified: 2025-10-06 13:47:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="E91-1037">
  <Title>Computational Aspects of M-grammars</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> In this paper M-grammars that are used in the Rosetta translation system will be looked at as the specification of attribute grammars. We will show that the attribute evaluation order is such that instead of the special-purpose parsing and generation algorithms introduced for M-grammars in Appelo et al.(1987), also Earley-like context-free parsing and ordinary generation strategies can be used. Furthermore, it is illustrated that the attribute grammar approach gives an insight into the weak generative capacity of M-grammars and into the computational complexity of the parsing and generation process. Finally, the attribute grammar approach will be used to reformulate the concept of isomorphic grammars.</Paragraph>
    <Paragraph position="1"> M-grammars In this section we will introduce, very globally, the grammars that are used in the Rosett, machine translation system which is being developed at Philips Research Laboratories in Eindhoven. The original Rosetta grammar formalism, called M-grammars, was a computational variant of Montague grammar. The formalism was introduced in Landsbergen(1981). Whereas rules in Montague grammar operate on strings, M-grammar rules (M-rules) operate on labelled ordered trees, called S-trees. The nodes of S-trees are labelled with syntactic categories and attribute-value pairs. Because of the reversibility of M-rules, it is possible&amp;quot; to define two algorithms: M-Parser and M-Generator . The M-Parser algorithm starts with a surface: structure in the form of an S-tree and breaks it down into basic expressions by recursive application of reversed M-rules. The result of the M-Parser algorithm is a syntactic derivation tree which reflects the history of the analysis process. The leaves of the derivation tree are names of basic expressions. The M-Generator algorithm generates a set of S-trees by bottom-up application of M-rules, the names of which are mentioned in a syntactic derivation tree.</Paragraph>
    <Paragraph position="2"> Analogous to Montague Grammar, with each M-rule a rule is associated which expresses its meaning. This allows for the transformation of a syntactic derivation tree into a semantic derivation tree by replacing the name of each M-rule by the name of the corresponding meaning rule. In Landsbergen (1982) it was shown that the formalism is very well fit to be :used in an interlingual machine translation system in which semantic derivation trees make up the interlingua. In the analysis part of the translation system an S-tree of the source language is mapped onto a set of semantic derivation trees. Next, each semantic derivation tree is mapped onto a set of S-trees of the target language. In order to guarantee that for a sentence which can be analysed by means of the source language grammar a translation can always be generated using the target language grammar, source and target grammars in the Rosetta system are attuned.</Paragraph>
    <Paragraph position="3"> Grammars, attuned in the way described in Landsbergen (1982), are called isomorphic.</Paragraph>
    <Paragraph position="4"> Appelo et al.(1987) introduces some extensions of the formalism, which make it possible to assign more structure to an M-grammar. The new formalism was called controlled M-grammars. In this new approach a grammar consists of ~ set of subgrammars. Each of the sub-grammars contains a set of M-rules and a regular expression over the alphabet of rule names. The set of M-rules is subdivided into meaningful rules and transformations. Transformations have no semantic relevance and will therefore not occur in a derivation tree. The regular expression can be looked at as a prescription of the order in which the rules of the subgrammar have to be applied. Because of these changes in the formalism, new versions of the M-Parser and M-Generator algorithm were introduced which were able to deal with subgrammars. These algorithms, however, are complex and result in a rather cumbersome implementation. In this paper we will show that they can be replaced by normal context-free parse and generation algorithms if we interpret an M-grammar as the specification of an attribute grammar (Knuth (1968), Deransart et al.(1988)).</Paragraph>
    <Paragraph position="5"> M-grammars as attribute grammars The control expression which is used in the definition of a Rosetta subgrammar specifies a regular language over the alphabet of rule names. Another way to define such a language is by means of a regular grammar. Let control expression cei of subgrammar i define the regular language PS(i). Then we can construct a minimal regular grammar rgi which defines the same language. The grammar rgi will have the following form:  * A set of non-terminals Ni = {~/ ..... I/M' } * A set of terminals Ei. Ei is the smMlest set such that there is a terminal f EEi for e~u:h M-rule r .</Paragraph>
    <Paragraph position="6"> * Start symbol I deg * 210 * A set of production rules P~ containing the following type of rules: - I~ &amp;quot;* ~I~, where f E El</Paragraph>
    <Paragraph position="8"> We will use the regular grammar defined above as a starting point for the construction of an attributed subgrammar. An elegant view of attribute grammars can be found in Hemerik (1984). Hemerik defines an attribute grammar as a context free grammar with parametrized non-terminals and production rules. In general, non.</Paragraph>
    <Paragraph position="9"> terminals may have a number of parameters, attributes - associated with them. Production rules of an attribute grammar are pairs (rule form, rule condition). From a rule form, production rules can be obtained by means of substitution of values for the attribute variables that satisfy the rule condition. In the grammars presented in this paper, non-terminals have only one attribute of type S-tree. The attribute grammar rules that are used throughout this paper also have a very restricted form.</Paragraph>
    <Paragraph position="10"> A typical attribute grammar rule r with context free skeleton A -. BC will look like:</Paragraph>
    <Paragraph position="12"> Here, A &lt; o &gt;--. B &lt; p &gt; C &lt; q &gt; is the rule form, o,p, q are the attributes and (o, (p,q)) E ~ is the rule condition, g defines a relation between the attributes at the left-hand side and the attributes at the right-hand side of the rule form.</Paragraph>
    <Paragraph position="13"> For each subgrammar rgi, (1 &lt; i &lt; M) we will construct an attributed subgrammar agi. Each constructed attributed subgrammar agi will have a start symbol J'T/. First, however, we define two new attributed subgrammars that have no direct relation with a subgrammar of a given M-grammar: the start subgrammar and the terminal subgrammar. The terminal subgrammar agt with start symbol ~ contains a rule of the form</Paragraph>
    <Paragraph position="15"> for each basic expression z of the M-grammar. The start subgrammar ago with start symbol S contains a rule of the form</Paragraph>
    <Paragraph position="17"> for the start symbol of each attributed subgrammar.</Paragraph>
    <Paragraph position="18"> The attribute condition in this rule means that S~trees that are exported by subgrammar i have a syntactic category which is in the set ezportcats(i).</Paragraph>
    <Paragraph position="19"> For each subgrammar rgi specified by the M-grammar we can construct an attributed subgrammar agi being the 5-tuple (/~, U {S), { I&gt;, ra } U g , , Pi , \]~i , ( T , Fi ) ) as follows: null * ag~ has 'domain' (T, Fi), where T is the set of possible S-trees and F~ is a collection of relations of type T m x T, m &gt; 0. F~ contains all relations defined by the M-rules of subgrammar i.</Paragraph>
    <Paragraph position="20"> s The set of production rules of a9i can be constructed as follows: - If r9i contains a rule of the form I~ --* fI~, where f corresponds with an n-ary meaningful M-rule r, agi contains the following attribute grammar rule:</Paragraph>
    <Paragraph position="22"> (o,(P, .... ,P.)) e Rr Here, ~ and \[/k are non-terminals of the attributed sugrammar agi, S is the start symbol of the complete grammar, the terminal is the name of the M-rule and Rr is the binary relation between S-trees amd tuples of S-trees which is defined by M-rule t. The terminal symbol I:&gt; marks the end of the scope of the production rule in the strings generated by the grammar. The variables o,pl...p, are the attributes of the rule. All attributes are of type S-tree.</Paragraph>
    <Paragraph position="23"> One possible interpretation of the attribute grammar rule is that the S-tree o is received from non-terminal ~'~ of the current subgrammar. According to the relation defined by M-rule r, the S-tree o corresponds to the S-trees pl, ...,Pn. S-tree pl is passed to another non-terminal of the current subgrammar, whereas p2, ..., pn are offered to the start symbol of the attribute grammar.</Paragraph>
    <Paragraph position="24"> - If rgi contains a rule of the form I~ --* ~I~ where e corresponds with unary transformation r, agi contains the following attribute  grammar rule: \[ ii &lt; &lt;p&gt; (o,p) e lz, Notice that an attribute rule corresponding with a transformation r does not produce the terminal f.</Paragraph>
    <Paragraph position="25"> - If rgi contains a rule of the form lJl --. I~, the agl contains the following attribute grammar rule: \[ &lt;p&gt; omp -- If rgi contains a rule of the form I~ -. * then ags contains the following rule:</Paragraph>
    <Paragraph position="27"> Rules of this form mark the beginning of a subgrammar. The terminal symbol O is used for this purpose. The attribute relation is a restriction on the kind of S-trees that is allowed to enter the subgrammar. Only S-trees with a syntactic category in the set headcats(i) are accepted.</Paragraph>
    <Paragraph position="28"> -211 The set of all attributed subgrammars can be joined to one single attribute grammar (N, ~, P, S, (T, F)) as follows: * The non-terminal set of the attribute grammar is the union of all non-terminals of all subgrammars, M i.e. N = U~=0 ~i.</Paragraph>
    <Paragraph position="29"> * The terminal set E of the attribute grammar is the union of all terminals of all subgrammars (including the terminal subgrammar): E = { I&gt;, 13} U U~0 ~i. * The set of production rules is the union of all pro- M - duction rules of the subgrammaxs, P = Ui=0 P~. * The startsymbol of the composed grammar is identical to the the startsymbol S of the start subgrammar. The attribute of the start symbol of an attribute grammar is called the designated attribute (Engelfriet (1986)) of the attribute grammar. The output set of an attribu(e grammar is the set of all possible values of its designated attribute.</Paragraph>
    <Paragraph position="30"> * The composed grammar ha.s: domain (T, F) where M F = Ui=0 Fi and T is the set of all possible S-trees. In the rest of the paper we call an attribute grammar which has been derived from an M-grammar in this way an attributed M-grammar or amg.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Computational Aspects
</SectionTitle>
      <Paragraph position="0"> Because each meaningful attributed rule r produces the terminal symbol ~ and because each terminal rule x produces terminal symbol ~, the strings of PS(X), the language defined by an arag X, will contain the derivational history of the string itself. :The history is partial, because the grammar rules for transformations do not produce a terminal. Moreover, the form of the grammar rules is such that each string is a prefix representation of its own derivational history.</Paragraph>
      <Paragraph position="1"> Given an amg X, with function of type PS(X) MGen(d)----ac! {t a set of terminals ~, a recognition ---, 2 T can be defined as: IS&lt;t&gt;~x dAdEE*} The reverse of MGen is the generation function of type</Paragraph>
      <Paragraph position="3"> These functions can of course be defined for each attribute grammar in this form. However, in the case of amg's the MPars and MGen functions are both computable because each M-rule r defines both a computable function and its reverse:  (o,(p, .... ,v.)) ~ :~.</Paragraph>
      <Paragraph position="4"> o~f~(p, ..... p.) ~.</Paragraph>
      <Paragraph position="5"> (p,, ...,v.) ~ f;-'(o)  Because of this property of the M-rules the grammar has two possible interpretations: * one for recognition purposes with only synthesized attributes, in which the rules can be written as:</Paragraph>
      <Paragraph position="7"> This interpretation is to be used by MGen in the generation phase of the Rosetta system.</Paragraph>
      <Paragraph position="8"> * one for generation purposes with only inherited attributes containing the following type of rules:</Paragraph>
      <Paragraph position="10"> The generative interpretation of the rules will be used by MPars in the analysis phase of the Rosetta translation system.</Paragraph>
      <Paragraph position="11"> From the definitions of MPars and MGen the reversibility property of the grammar follows immediately: d E MPars(t) 4, t E MGen(d) The reversibility property which has always been one of the tenets of the Rosetta system (Landsbergen (1982)) has recently received the appreciation of other researchers in the field of M.T. as well (Isabelle (1989), Rohrer (1989), van Noord (1990)).</Paragraph>
      <Paragraph position="12"> In order to give the M-grammar formalism a place in the list of other linguistic formalisms like LFG, FUG, TG, TAG and GPSG x, we will investigate some computational aspects of amg's in this section. Given an amg grammar X, we can calculate the value of the designated attribute for an element of PS(X). For this calculation an ordinary context free recognition algorithm (Earley(1970), Leermakers(1.991)) can be used. Because the grammar may contain cycles of the form \[ rJ&lt;o&gt;--.l~&lt;p&gt; \[o,p) e its context-free backbone is not finitely ambiguous.</Paragraph>
      <Paragraph position="13"> Hence, an amg is not necessarily off-line parsable ( Pereira and Warren (1983), Haas (1989)). The term off-line parsable is somewhat misleading because a two-stage parse process for grammars which ate infinitely ambiguous is very well feasible. In the first stage of the parse process, in which the context free backbone is used, a finite representation of the infinitely many parse trees, e.g. in the form of a parse matrix, is determined. Next, in the second stage, the attributes ate calculated. However, measure conditions on the attributes are necessary to guarantee termination of the parse process. These measure conditions are constraints on the size (according to a Certain measure) of the attribute values that occur in each cycle of the underlying context free grammar.</Paragraph>
      <Paragraph position="14"> The generative interpretation of amg X can be used in a straight-forward language generator which generates all corresponding elements of PS(X) for a given value of the designated attribute. Obviously, it can only be guaranteed that the generation process will always terminate if lcf. Perrault (1984) for a comparison of the mathematical properties of these formalisms.</Paragraph>
      <Paragraph position="15"> - 212 the grammar satisfies some restrictions. Suggestions for grammar constraints in the form of termination conditions for parsing and generation are given in Appelo et al.(1987).</Paragraph>
      <Paragraph position="16"> For an insight into the weak generative capacity of the formalism we have to examine the set of yields of the S-trees in the output set of an amg. Let us call this set the output language defined by an amg. It is not possible to characterize exactly the set of output Inn. guages that can be defined by an amg without defining what the termination conditions are. The precise form of the termination conditions, however, is not imposed by the M-grammar formalism. The formalism merely demands that some measure on the attribute values is defined which garantuees termination of the recognition and generation process. In order to get an idea of the weak generative capacity of the formalism, we assume, for the moment, the weakest condition that guarantees termination. It can be shown that each deterministic Turing Machine can be implemented by means of an amg such that the language defined by the TM is the output language of that amg. Not all grammars that can be constructed in this way satify the termination condition, however. The termination condition is only satisfied by Turing Machines that halt on all inputs, which is exactly the class of machines that define the set of all recursive languages. Consequently, the output languages that can be defined by amg's or M-grammars, in principle, are the languages that can be recognized by deterministic Taring Machines in finite time.</Paragraph>
      <Paragraph position="17"> At this point it is appropriate to mention the bifurca~ tion of grammatical formalisms into two classes: the formalisms designed as linguistic tools (e.g. PATR-II, FUG, DCG) and those intended to be linguistic theories (e.g. LFG, GPSG, GB) (cf. Shieber (1987) for a motivation of this bifurcation). The goals of these formalisms with respect to expressive power are, in general, at odds with each other. While great expressive power is considered to be an advantage of tool-oriented formalisms, it is considered to be an undesirable property of formalisms of the theory type. The M-grammax formalism clearly belongs to the category of linguistic tools.</Paragraph>
      <Paragraph position="18"> By strengthening the termination conditions it is possible to restrict the class of output languages that can be defined by an amg. For instance, the class of output languages can be restricted to the languages that are recognizable by a deterministic TM in 2 c&amp;quot; time a if we assume that the termination conditions imposed on an amg are the weakest conditions that satisfy the constralnts formulated in Rounds (1973). A reformulation of these constraints for amg's is as follows: , The time needed by an attribute evaluating function is proportional to somepolynomial in the sum of the size of its arguments.:  set which is characterized by transformational grammars (as presented in Chomsky (1965)) satisfying the termiaad-length non-decreasing condition.</Paragraph>
      <Paragraph position="19"> T~C/~ power of the formalism with respect to generative capacity has of course its consequences for the computtttoaa\] complexity of the generation and recognition ~prQeess, Here too, the exact form of the termination condition is important. Obeying the termination conditions that we adhere to in the current Rosetta system, it can be proved that the recognition and the generation problems axe NP-hard, which makes them computation.</Paragraph>
      <Paragraph position="20"> ally intractable. In comparison with other formalisms, M-grammaxs axe no exception with respect to the complexity of these issues. LFG recognition and FUG generation have both been proved to be NP-hard in Barton et ai, (1987) and Ritchie (1986) respectively. Recognition in GPSG has even been proved to be EXP-POLY-haxd (Barton et a\]. 1987). We should keep in mind, however, that the computational complexity analysis is a worstease analysis. The average-case behaviour of the parse and generation algorithm that we experience in the dally use of the Rosetta system is certainly not exponential.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Isomorphic Grammars
</SectionTitle>
      <Paragraph position="0"> The decidability of the question whether two M-grammars axe isomorphic is another computational aspect related to M-grammars. Although this mathematical issue appears not to be very relevant from a practical point of view, it enables us to show what grammar isomorphy means in the context of stag's.</Paragraph>
      <Paragraph position="1"> According to the Rosetta Compositionality Principle (Landsbergen(1987)) to each meaningful M-rule r a meaning rule mr corresponds which expresses the semantics of r. Furthermore, there is a set of basic meanings for each basic expression of an M-grammar. We ea~ easily express this relation of M-grammar rules and basic expressions with their semantic counterparts in an a~ag, Instead of incorporating the M-rule name e in the gttributed production rule as we did in the previous s~tlons, we now include the name of the corresponding meaning rule 6~r as follows:</Paragraph>
      <Paragraph position="3"> The terminal subgrammar must be adapted in order to generate basic meanings instead of basic expressions. If basic expression m corresponds with the basic meanings m~ ..... mJ= .... , mz&amp;quot; then we replace the original rule in the terminal subgrammar for z by n rules of the form: W~ will call a gra~mmar that has been derived in this way from azt amg a semantic amg, or suing. The strings - 213, of the language defined by an samg are prefix representations of semantic derivation trees. The language defined by an samg is called the set of strings which are well-\]ormed with respect to X.</Paragraph>
      <Paragraph position="4"> Let us repeat here what it means for two M-grammars to be isomorphic: &amp;quot;...Two grammars are isomorphic iff each semantic derivation tree which is welbformed with respect to one grammar is also well-formed with respect to the other grammar...&amp;quot; (Landsbergen (1987)). We can reformulate the original definition of isomorphic M-grammars in ~. very elegant way for samg's: Definition: Two samg's X~ and X2 are isomorphic iff they are equivalent, that is iff PS(XI) = PS(X2) This definition says that writing isomorphic grammars comes down to writing two attribute grammars which define the same language. From formal language theory (e.g. Hopcroft and Ullman (1979)) we know that there is no algorithm that can test an arbitrary p~ir of context-free grammars G1 and G2 to determine whether PS(G~) = PS(G2). It can also be shown that samg's can define any recursive language. Consequently, checking the equivalence of two arbitrary samg's will be an un. decidable problem. Rosetta grammars that are used for translation purposes, however, are not arbitrary samg's: they are not created completely independently. The strategy followed in Rosetta to accomplish the definition of equivalent grammars, that is, grammars that define identical languages, is to attune two samg's to each other. This grammar attuning strategy is extensively described in Appelo et al.(1987), Landsbergen (1982) and Landsbergen (1987) for ordinary M-grammars. Here, we will show what the attuning strategy means in the context of samg's, together with a few extensions.</Paragraph>
      <Paragraph position="5"> The attuning measures below must not be looked at as the weakest possible conditions that guarantee isomorphy. The list merely is an enumeration of conditions which together should help to establish isomorphy. If two samg's Xa and X2 have to be isomorphic, the following measures are proposed: , The production rules of both samg's must be consistent. ;.</Paragraph>
      <Paragraph position="6"> If both grammars have a production rule ii~ Which the name of the meaning rule m appears, then the right-hand side of the rules should contain the same number of non terminals, since m is a function with a fixed number of arguments, independent of the grammar it is used in.</Paragraph>
      <Paragraph position="7"> , The terminal sets o\] both samg's should be ~uaP.</Paragraph>
      <Paragraph position="8"> In the context of the ordin~y M-grammar formalism this condition is formulated as: - for each basic expression in one M-grammar there has to be at least one basic expression in the other M-grammar with the same meaning (which comes aThis condition is equivalent to the attuning measures described in Appelo et al. (1987), Landsbergen (1982)and Landsbergen(1987).</Paragraph>
      <Paragraph position="9"> down to the condition that the terminal set of the terminal subgrammars should be identical) - for each meaningful rule in one M-grammar there has to be at least one meaningful rule in the other M-graanmar which has the same meaning.</Paragraph>
      <Paragraph position="10"> * The underlying contezt Jree grammars oJ both samg's should be equivalent.</Paragraph>
      <Paragraph position="11"> Equivalence of the underlying context free grammars can be established by putting an equivalenee condition on the underlying grammar of corresponding subgrammars of the samg's in question.</Paragraph>
      <Paragraph position="12"> Suppose that for each subgrammar of an samg * X1 a subgrammar of another samg 3(2 would exist that performs the same linguistic task and vice versa. Such an ideal situation could be expressed by a relation g on the sets of subgrammars of both samg's. Let i and j be subgrammars of the samg's X1 and Xa respectively, such that (i, j) E g, then the underlying grammars 4 Bi and B i have to be constructed in such a way that they define the same language. ( Notice that Bi and B i are regular grammars.) More formally: v(i,i) e g: c(B,) = ~(oi). ~ The three attuning conditions above guarantee that the underlying context free grammars of two attuned samg's are equivalent. However, the language defined by an samg is a subset of the language defined by its underlying grammar. The rule conditions determine which elements are in the subset and which are not. Because of the great expressive power of M-rules, the attuning measures place no effective restrictions on the kind of languages an samg can define. Hence, it can be proved that: Theorem: The question whether two attuned samg's are isomorphic is undecidable.</Paragraph>
      <Paragraph position="13"> Because of the equivalence between samg's and M-grammars this also applies to arbitrary attuned Mgr~nmars. Future research is needed to find extensions for the attuning measures in a way that guarantees isom0tphy if grammar writers adhere to the attuning condil~ions. The extensions will probably include restrictions on the form of the underlying grammar and on the expressive power of M-rules. Also formal attuning measures between M-rules or sets of M-rules of different grammars are conceivable.</Paragraph>
      <Paragraph position="14"> 4Because we are dealing with a subgrammar, the non-terminal S is discarded from the production rules of the underlying grammar.</Paragraph>
      <Paragraph position="15"> SThis attuning measure sketches an ideal sittmtion. In practice for each subgrarnmar of an samg there is not a corresponding fully isomorphic subgrammar but only a partially isomorphic subgranunar of the other suing. However, the requirement of fully isomorphic subgranunars is not the weakest attuning condition that guarantees the equivalence of the underlying context free grammars. F_,quivalence can also be guaranteed if XI and X~ satisfy the following condition which expresses partial isomorphy between subgranunars:</Paragraph>
      <Paragraph position="17"> The current Rosetts grammars obey the three previously mentioned attuning measures. In practice these measures provide a good basis to work with. Therefore, the undecidability of the isomorphy question is not an urgent topic at the moment.</Paragraph>
      <Paragraph position="18"> Conclusions In thib paper we presented the interpretation of an M-grammar as a specification of an attribute grammar. We showed that the resulting attribute grammar is reversible and that it can be used in ordinary context free recognition and generation algorithms. The generation algorithm is to be used in the analysis phase of Rosetta, whereas the recognition algorithm should be used in the generation phase. With respect to the weak generative capacity it has been concluded that the set of languages that can be generated and recognized depends on the termination conditions that are imposed on the grammar. If the weakest termination condition is assumed, the set of languages that can be defined by an M-grammar is equivalent to the set of languages that can be recognized by a deterministic Turin8 Machine in finite time. Using more realistic termination conditions, the computational complexity of the recognition and generation problem can still be classified as NP-hard and, consequently, as computationally intractable. Finally, it was concluded that the question whether two attuned M-grammars are isomorphic, is undecidable.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML