File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1037_metho.xml
Size: 21,039 bytes
Last Modified: 2025-10-06 14:07:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1037"> <Title>Formal Syntax and Semantics of Case Stacking Languages</Title> <Section position="3" start_page="0" end_page="253" type="metho"> <SectionTitle> 2 Syntax </SectionTitle> <Paragraph position="0"> In this section a perfectly case marked tbrmal language will be defined and investigated. The definition of this language is based on terms consisting of functors and argmnents and tiros cases will be taken to mark arguments.</Paragraph> <Paragraph position="1"> In the tbllowing we let N denote the set; of non-negative integers and ~ the concatenation of strings, which is often onfitted. We shall use typewriter font to denote true characters in print tbr a tbrmal language.</Paragraph> <Section position="1" start_page="0" end_page="250" type="sub_section"> <SectionTitle> 2.1 Basic Definitions </SectionTitle> <Paragraph position="0"> An abstract definition of terms runs as tbllows.</Paragraph> <Paragraph position="1"> Let F be a set of symbols and ~: F ~ N a flmction. The pair {F, ~2} is called a signature.</Paragraph> <Paragraph position="2"> We shall ofl;en write ~ in place of {F, f~}. An element .f E F is called a functor and It(f) the arity off. We let w := max{a(.f) I f E F} denote the maximal arity. 'l~rms are denoted here by strings in Polish Notation, tbr simplicity.</Paragraph> <Paragraph position="3"> Definition 1. Let ~ be a signature. A term over ~ is inductively dc~fined as .follows.</Paragraph> <Paragraph position="4"> /. {1' ~(.1) = o, tl,.c,. S i.,. a t('.,&quot;,,.. 2. tl' ~(.1&quot;) > 0 a,.~ t~, i ~ ',: ~ ~(.1&quot;), a,'~; terms, so 'i.'~ fl.\] .../.~(j.).</Paragraph> <Paragraph position="5"> .l.h(. set C of case mart,;ers will be the sel; {l,... , co}, which we assume t() 1)e disjoint fronl F. (-liven a term, each t'uncl;or will 1)e case mm'kext a(:(:or(ling to l;h(: ;~rgum(ml, t)osilJon it occul)ie, s. This is achieved throug\]~ tim notion of :~ unit, which consists of n fun(:tor ;rod ~ sequence of case m:u'kers (:;dled case stack.</Paragraph> <Paragraph position="6"> Definition 2. Let t be, a term over a siflnat'arc fL .77to corresponding bag, A(t.), is ind'actively dc/inc, d as ./b/lows.</Paragraph> <Paragraph position="7"> 1. tlt = .f, ~.l,.,,,. a(-~.):= {t}.</Paragraph> <Paragraph position="8"> 2. s.f t -- .lt,...t,,., hl,.<..,t A(t) ::: {.I'} U U'/-, {s&quot;,; I ~ ~ A(td}.</Paragraph> <Paragraph position="9"> A'n oh:my.hi .f7 ~ A(t) is calh'.d a unit and q' C* its case stack.</Paragraph> <Paragraph position="10"> For exmnl)h:, if .l',g~ im(l x a.re flint:tots of arity 2, 1 mtd 0, rcs\])e(:l;ively, l;he hag A(f:l:g:c) is {:f, xl, g2, xl2}. The lllGtllillg ()t: :1. unit Xi2 (:ould t)e (les(:rib(:(l t)y ':c is the fllllCt;Or of (;he tirsl: ~rgmnelll; t,J' the se(:on(l ;u:gmn(mI; of tim |;e t.lil ~ .</Paragraph> <Paragraph position="11"> Definition 3. Let t be a tcr'm ov.r a siflnat'ur, ~.. A(t) t/,., ~:o,&quot;,',..Wo',,,li',,..,Z Z,a:t a,t(t ~X(t): : {~1 i < 'n} (tn avbitraT'y c.nv, memlio.n of its 'a',,its. 77tcn the st'rinfl dl ~'d2&quot;~... ~-'5._.1 &quot;-'5. is s..id to be a A(t)-string.</Paragraph> <Paragraph position="12"> Some of the A(.fxgx)-strings are (:.g.</Paragraph> <Paragraph position="13"> fxig2xl2 and g2xl:fxi2. We, m:e now l)rel)ared to d(:fine n tbrmM bmgm~ge over the alt)habet F U C t)y collecting all A(t)-si;rings for n given signature: Definition 4. Let ~ be a signat'm'e. The ideal ease marking language ZUAdPS ~ over this signature consists of all A(t)-strings s'ach k/taR t is a term o've'r ~.</Paragraph> </Section> <Section position="2" start_page="250" end_page="250" type="sub_section"> <SectionTitle> 2.2 Trees and Unique Readability </SectionTitle> <Paragraph position="0"> There is a strong corresl)ondence between bags and lat)elled trees sin(:e (:ase stncks can t)e identiffed with tree addrc.sscs: Definition 5. A nonempty .linite set I) C N~_ 'is a tree domain ~f the .lbllowing h, ohh 1. cc-D.</Paragraph> <Paragraph position="1"> 2. if dl d2 c- D then dl c- D.</Paragraph> <Paragraph position="2"> 2. lJ'di ~ D, i c- N then dj c D for all j < i. Th, e. eh:me.nts qf a t're(: domain arc called tree addresses. A ~-labelled tree is a pai'r (1), 7) s'uch, that 1) is a tree domain and r: D -+ l&quot; a labelling ./?re.orion s'ach, th.at th.c n'ambc.r of 'l&quot;',:t/',t&quot;,&quot;.~' 4 d C J) i.~ ,',:t:acth/ ~(~(d) ). To formalize the corresl)ondence we define a function 77' that assigns every b~g A(t) a ~-</Paragraph> <Paragraph position="4"> The function 7' reverses l;he c;l.se sl;a(;ks of ~tll units to get a set of tree addresses. Then the flmctor of the mill; is assigned to the (;rec a.ddress. E.g. if the b:tg cont~dns i~ refit g32J_ the.</Paragraph> <Paragraph position="5"> resull;ing tree dolna.in will contain :L tree address 2123 and the bflmlling flmction will ~msign 9 to iL Similarly one can define an inverse flmction assig\]dng a 1)ag |:o each ~Mal)ellexl tree. Thus l;h(:re is a l)ije(:l;ion t)etween ~-lat)(:lled trees and bngs. 'l'\]mrelbre difthrent 1):~gs (:orr(:spond to (titfer(:nt ord(:red la,1)ell(:d trees. This shows (;lint we h:w(: mfi(lUe r(widal)ility fi)r l)ags and sin(:(: every ZC.A4PS ~ string (:ira 1)(: mfiquely de(:()mposed into its milts we may sl;;~|;(~ the following l)rol)osition.</Paragraph> <Paragraph position="6"> Proposition 6. Let ~ bc a signature. Then every ZC2vfPS ~ strinfl is 'aniq'acly readable.</Paragraph> </Section> <Section position="3" start_page="250" end_page="252" type="sub_section"> <SectionTitle> 2.3 Pmnpability and Semilinearity </SectionTitle> <Paragraph position="0"> We will first consider the prol)erty of being finitely pumpabh,, its detined in (Oroenink, 1997).</Paragraph> <Paragraph position="1"> Definition 7. A hm, g'aage L is finitely pumpable 'ill there is a constant c such th, at for any w C L with, \['w\[ > c, there arc a finite number k and strings uo,... ~'tt k and vt,... ,'ok s?tclt that w -- uov\]u\] ?J2?t2&quot;&quot;Ok'lt k and for each i, l < Iv/\[ < (: and for any p > 0 the string</Paragraph> <Paragraph position="3"> Pry@ It is easy to observe that the puml)able parts cannot contain a functor since that would lead to I)mnt)ed strings containing the same units more than once. Hence the number of units cannot be increased by pumping and all pumpable parts must consist of case markers solely* But since the length of an ZC.Ad/2 f~ string consisting of a fixed munber of units is l)ounded each pumpable string could be pumped up such that it exceeds this bound. Thus ZC.MPS a is not finitely 1)umpable at all. \[\] Now we are concerned with semilinearity.</Paragraph> <Paragraph position="4"> Definition 9. Let M C N n. Then M is a 1. linear set, {f for some k C N there are u0,...,u~ ~ N '~, such, that M = {u0 + k N}, ~i=i niui \[ ni C 2. semilineav set, if for&quot; some t~: C N there are linear sets M\], *. * , Mk C _ N 'z , such th, at M=U i=1 A lan.quage L over&quot; an alphabet E = {wi I 0 _< i < n} is called a semilinear language if its image under the Parikh mapping is a semilinear set, where the PariMt mapping ~IJ :</Paragraph> <Paragraph position="6"> wh, ere e (i) is the i + 1-ttL 'unit vector', wh, ich, consists of zeros except for the i-th component, wh, ich, is 1.</Paragraph> <Paragraph position="7"> Note that - given a term t - the Parikh image of all A(t)-strings is the same since these are just concatenations of difthrent permutations of the units in A(t).</Paragraph> <Paragraph position="8"> In the tbllowing we make use of a proof technique used in (Michaelis and Kracht, 1997) to show that Old Georgian is not a semilinear language. We cite, a special instance of a proposition given therein: Proposition 10.</Paragraph> <Paragraph position="9"> M be a subset of the properties 1. For&quot; any I~: E l~) 1(k) Let P(k) = ,~k 2 + 2~-----~-k and N n, where n > 2, which has N+ there are some numbers E N for wh, ich the n-tuple (k, P(k),l~ k) I (k) \ belon(ls to M. ~''&quot; ~ '~--11 2. For&quot; any k C N+ th, e value P(k) provides an upper bound .for th, e second component l\] of any n-tuple {k, ll,... ,l~z-1) E M (that means ll _< P(k) ./'or&quot; any such n-tuple). Then M is not semilinear.</Paragraph> <Paragraph position="10"> In order to investigate the semilinearity of ZC,A'I/PS a we choose distinct symbols f, x C F, such that f/(f) = w and ~2(x) = 0. We shall construct terms si by the following inductive definition: 1. s0:=x 2. Sn := f(sn-l,x,... ,:c) fbr n > 0 It is easy to observe that by virtue of construction sn consists of n leading functors f and that in each iteration the number of x increases t)y - 1).</Paragraph> <Paragraph position="11"> Lemma 11. Let F U C = {f,\],:c,2,... ,w, fl,-.. ,.flFI-2} be an enumeration of the alphabet underlying ZCA,4PS ~, where f\],... ,f1I,'1-2 are the remainin.q fl, nctor's in F - {.f, z} Then the Parikh image of some A(sn)-strin9 5n is</Paragraph> <Paragraph position="13"> bound on the second component oof vIl(Sn).</Paragraph> <Paragraph position="14"> Proof. The first part of the leunna can be proved in a straightforward way by induction on n. The claim on the upper bound ibllows ii'om the observation that the nmnber of occurrences of case marker 1 can be maximized t)y repeated embedding of terms in the first arguinent position. \[\] Proposition 12. Let ft be a signature. Th, en</Paragraph> <Paragraph position="16"> Proof. Let n = w + IF I and consider the linear, and hence semilinear, set R :=</Paragraph> <Paragraph position="18"> Then (;lit fllll pre, iniage Lh> of 17, under the Parikh niap consists of all strings which contain nu((w -1) + 1) occurrences of the symbol x (where 'n,2 is any number) and any number of occurrences of the symbols f, l,... ,w, and no other symbols. We define the language Lj7 as the set of all strings belonging to Lst and the ideal case marking languages. Then LM contains all A(s,)-strings.</Paragraph> <Paragraph position="19"> Considtring the Parikh iniage M of LAf we</Paragraph> <Paragraph position="21"> because of the (lefinition of Llz as the flfll preimage of 1{. But then the set A// fultills the conditions of Prot)osition l 0 (tuc to \]xeillilla 1\].</Paragraph> <Paragraph position="22"> Hence M is not sere|linear. Since 17, is sere|linear 1)y definition and semilinearity is closed under intersection ZCAdPS t? is not sere|linear. \[\]</Paragraph> </Section> <Section position="4" start_page="252" end_page="252" type="sub_section"> <SectionTitle> 2.4 Computational Complexity </SectionTitle> <Paragraph position="0"> In this sul)section the COml)utai;ional (:onll)lexity of ZCJ%4PS ~ is (:onsidere(t. r\]Jh(', results are achieved by defining a 3-tat)e-rl)uring machine accet)tor (det)ending on a given signature) that l'roo\]: In the following we lc, t 'n denote the hmgth of the inlmt string. The 9~u:ing machilm algoril;hln can be subdivided into three main parts: 1. The intmt string is segmentext into its units: The algorithm steps through the input and adds set)aration markers in 1)etween two units. This can be done in O(n) time.</Paragraph> <Paragraph position="1"> 2. The llllits are sorted according to their case stacks: More tbrmally a 2-way straight rnc,#c sort is pertbrnmd. This sorting algorithm is known for its worst case optimal complexity: it peril)tins the sort of ti: keys in O(klogk) steps. In our case the keys are milts and thus their mnnl)er is clearly tmml(led by ~t.. Tim additional square root factor comes from the comparison stel).</Paragraph> <Paragraph position="2"> One can show tha|; the maximal length of a (:ase sta('k occuring in an ZOJ~PS ~I string of Mlgth 'n is l)ounded above by O(v/77,).</Paragraph> <Paragraph position="3"> Hence a comparison of two units takes at most O(v/77,) steps. Thus the overall comt)lexity of the sorting part is O(nv/77 log 'n). 3. The sorte<l stquen<:e of units is (:hecked: The algorithm successively generates case stacks according to the fimctors it has read.</Paragraph> <Paragraph position="4"> Each case stack is compared to the refit of the inlmt. If they coincide the algorithm advances to the next unit on the input and generates the next case stack. After all case stacks have been gel/erated the whole int)ut string must have been worked through, in this case the algorithm a(:cet)ts. This (:an l)e done in O('n,) time.</Paragraph> <Paragraph position="5"> Summing u t) the COml)lexities of these tln'ee l)arts shows that the time COml)lexity is as claimed in the proposition. 1;'urthel'nmre, the algorithm uses only the cells needed by the inlint plus at most t;:- l (:ells tbr additional set)aration markers (due to the first part), where t,: is the nunlber of units the inlmt string consists of. This shows thai; |;he space (:omph~xity ix linenr. \[\]</Paragraph> </Section> <Section position="5" start_page="252" end_page="253" type="sub_section"> <SectionTitle> 2.5 Discussion </SectionTitle> <Paragraph position="0"> A first ('on('lusion we lllay draw ix that cases btve the ability to (:onstruct the context they apl)em: in. ZCML ~ strings encode the same structural intbrmation as ordered labelled trees do thereby allowing unconstrained order of milts. Additionally each such string can be read unambigously. This was shown by means of a bijtction l)etween bags and ordered labelled trees.</Paragraph> <Paragraph position="1"> The fact that ideal case marking hmguages are neither finitely punq)at)le nor sere|linear means that they fall out of a lot of hierarchies of formal languages. As (Weir, 1988) shows, multi-component trcc adjoining 9ramm, aTw a generate only sere|linear languages. Consequently, ideal case marking languages are not MCTALs. However, (Groenink, 1.997) defines a class of grammars, called simple literal movement grammars, aand henc.c line.at con|ca;t-free rewrite systems, which are shown to l)e weakly equivalent to MCTAGs in (Weir, 1988) which generate all and ouly the PTIME recognizable languages. Ideal case marking languages should therefore be generated by some simple literal movement grammar.</Paragraph> <Paragraph position="2"> We note fltrthermore that the (theoretical) time complexity is significantly better than the best known for recognizing context-free grammars. In fact, we implemented a practically applicable algorithm which constructs the corresponding tree out of a given IC.AdPS n string in linear time (in average).</Paragraph> </Section> </Section> <Section position="4" start_page="253" end_page="255" type="metho"> <SectionTitle> 3 Semantics </SectionTitle> <Paragraph position="0"> We are now going to propose a semmltics tbr languages with stacked cases. The basic principle is rather easy: we are going to identify variables by case stacks thereby making use of referent systems.</Paragraph> <Section position="1" start_page="253" end_page="253" type="sub_section"> <SectionTitle> 3.1 Referent Systems </SectionTitle> <Paragraph position="0"> The semantics uses two levels: a DRS-level, which contains DRSs, and a referent level, which talks about the names of the refbrents used l)y the DRS. Referent systems were introduced in (Vermeulen, 1995). We keep the idea of a referent system as a device which adnfinistrates the variables (or referents) under merge. Tile technical apparatus is however quite difli;rent. In particular, the referent systems we use define exl)licit global string sul)stitutions over the referent names.</Paragraph> <Paragraph position="1"> There is one additional symbol o. It; is a variable over names of referents. If we &SSUlne that a flmctor g has meaning g a simple lexical entry for g looks like this:</Paragraph> <Paragraph position="3"> Here, the upper part is the ret~rent system, and the lower part an ordinary DRS, with a head section, containing a set of referents, and a body section, containing a set of clauses. This means that the semantics of a functor g is given by the application of g to its arguments. However, instead of variables z, !/, etc. we find 1~o, 2~'o, etc. The semantics of a 0-ary functor z and a case marker, say 2, are: /2/ \[o2-o When two such structures come together they will be mcr.qcd. The merge operation (9 takes two structures and results in a new one thereby using the retbrent systems to substitute the nmnes of referents if necessary and then taking the union of the sets of clauses. E.g. the result of the merge/g/ (9 /2/ is</Paragraph> <Paragraph position="5"> Tile meaning of o : 2r'o is as follows. If some structure A is merged with one bearing that referent system, then all occurrences of the variable o in A are replaced by 2~o. As the resulting rcti;rent system we get o : o. This is exactly what is done in the merge shown above.</Paragraph> <Paragraph position="6"> We shall call a structure with referent system o : o plain. Merge is only defined if at least on structure is t)lain.</Paragraph> </Section> <Section position="2" start_page="253" end_page="255" type="sub_section"> <SectionTitle> 3.2 Semantics for .ZC.AdPS </SectionTitle> <Paragraph position="0"> To see how the semantics works we shall reproduce an earlier example ~md take the ZC3dZ; f~ string g2xlfxl2. Motivated by the definition of the ideal case marking language we shall agree to the conventions that 1. Case markers may only be suffixes 2. Case markers may only be attached to flmctors or case marked flmctors By these conventions the string under consideration must be parsed as (g2)(xl)(f)((xl)2). They force us to combine tile fimctors with their case stacks first and afterwards combine the units. We shall understand that this is a syntactic restriction and not due to any semantics. null The composition of g and 2 was already shown above and is repeated on the left hand side, using that ft(g) = 1. The result of composing x and 1 is shown to the right.</Paragraph> <Paragraph position="2"> By coml)osing the strucl;ures for x an(1 1 we get; the structm:e /xl/shown above. We merge this one with that for 2 and get We shall verify that the wflue of o is a(:tually the same as the value of f(x, g(x)). Notice first that in the body of the DRS we find that 12~o and 1~o have the same value as x. We may theretbre reduce the body of this structure to</Paragraph> <Paragraph position="4"> Finally we m~\y replace 2~o by g(x) in the second line. We gel; then o - f(x, g(x)) whi(:h is the intended result.</Paragraph> <Paragraph position="5"> After |;lit semantics of the mdts has been comlmted the order of merge is mdml)ortant. If we (:hoose to merge these semantics in an order dif ti;r(;nt from the one above, we get the same result. null a.a An example from Warlpiri To show how this proposal may work for naturM languages we give an example fi'om Warlt)iri 4 in which (:as(; stacking oc(:urs. We have to deal wit, h l;he t bur case markers ergative (ERe), past tense (PST), absolutive (ABS), and locative ,lat)anangka shot the kangaroo (while) on the ro ck \Y=e extend the t)roposal by taking into account that cases may not only flmction as argumind; markers but have a semantics, too. This actually does not make much of a diflhrence for l;his calculus. We propose the tbllowing semantics for the locative and the past tense case</Paragraph> <Paragraph position="7"> St), when the locative is attached, it; says that the thing to which it; attaches is located somewhere. Here, o represents the thing that is locaLe(l, while I,OC~o is the location. The past tense semantics simply says that the thing which it attaches to happened in the past.</Paragraph> <Paragraph position="8"> We construe the meaning of the ergative as being the actor and the meaning of the absolutive as being the theme s.</Paragraph> <Paragraph position="9"> dThis examl)h; is taken front (Nordlinger, 1997), p.171 '~In fact, ergative and absolutive should mark for grammatical flmctions, but since linking of grammatical functions and actants is quite a complicated matter (see (Kracht, 1999)) we make this simplification.</Paragraph> <Paragraph position="10"> nl using the conventions stated in subsection 3.2. First we have to attach the case markers to compose the resulting structures afterwards. The semantics of the proper noun Japanangka is taken to be a plain structure with body o &quot;-- japanangka'. The composition of this structure and the ergative semantics yields The semantics of the verb is shown on the left hand side and its composition with /PST/ on the right hand side.</Paragraph> <Paragraph position="11"> It says that there was an event of shooting in the past, whose actor is 3apanangka and whose theme ix something that ix a kangaroo, and that there is a rock, such that Japanangka is located on it. Note that the only syntactic restriction were the conventions stated in subsection 3.2 and thai; we (lid not make any fllrther assumptions on syntactic structure or word order.</Paragraph> </Section> </Section> class="xml-element"></Paper>