File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2149_metho.xml
Size: 10,272 bytes
Last Modified: 2025-10-06 14:07:16
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2149"> <Title>Context-Free Grammar Rewriting and the Transfer of Packed Linguistic Representations</Title> <Section position="3" start_page="1017" end_page="1017" type="metho"> <SectionTitle> 4 Formal aspects </SectionTitle> <Paragraph position="0"> The cotnmutative monoid over an alphabet A is denoted by C(~*), and its words are represented by vectors of N A, indexed by ..4 and with entries ill N. For each w E N A, the c(mlponent indexed by a C ..4 is denoted by w\[,\] and tells how many a's occur in w. The product (concatenation) ofwl and 'w2 in C(.A*) is the vector w E N A s.t. Vet C A: w\[,\] = wl \[,\] + &quot;w2\[, 1. A language of the commutative monoid is a subset of C(A*).</Paragraph> <Paragraph position="1"> The subword relation is denoted by --<. For a language L, we write: v--<L iff there exists w E L s.t. v-<w.</Paragraph> <Paragraph position="2"> The rewriting is performed from a sourcc language PSs over an alphabet Es to a target language/27, over an al-Es) w.r.t, a phabet ET (disjoint fi'om set of rewriting rules 7~ C P,s + x P'T* (rules have the form A-+p). We assume in the sequel that any a G ES appears at most once in any left-hand side of each rule of &quot;R. and also at most once in any word of PSs. This property is preserved by all the rewritings that we are going to int,'oduce.</Paragraph> <Paragraph position="3"> Let's deline LItS(A-+p) = A. For R C ~., we define L.,~&,(R) = {a E P's' \[ 3r e 17, s.t. et-e, LHS(,') }.</Paragraph> <Paragraph position="4"> Tim rewriting is a l'unction qSre. taking PS,9 and yielding L;T, delined as:</Paragraph> <Paragraph position="6"> kl--~'pl G &quot;J~ A ... A Ap'-q, flp G &quot;R.}.</Paragraph> </Section> <Section position="4" start_page="1017" end_page="1018" type="metho"> <SectionTitle> 5 Algorithm </SectionTitle> <Paragraph position="0"> In order to implement the function ()n, it is useful to introduce rewriting functions q~--+t, and q~?r. They apply to any language L over C(E*), where E = Es, tO ET.</Paragraph> <Paragraph position="1"> They are detined as:</Paragraph> <Paragraph position="3"> The ~x-~p functions are applied so that source symbols are guaranteed to be removed one by one from PS.s': we consider E.s' is totally ordered by < and we write E.5' = \[(/,1, a2, ..., aN\], with ai < eti+l ; then consider the partition of 7~.: 7Zl, J~2 ..... T~N s.t.R.1 contains all ~.</Paragraph> <Paragraph position="4"> rules with al in LHS, &quot;R.2 contains all 7C/ rules with a9 but not al in LHS, etc, &quot;R.N contains all 7Z rules with only aN in LHS. Then we deline a third rewriting function q')7C/~ : ~l,,e, (L) = qSv(L) U U,.eT~, 4,.(c).</Paragraph> <Paragraph position="5"> Lemma. PS7' can be obtained l;'om PSs by applying the T~i iteratively in the following manner:</Paragraph> <Paragraph position="7"> At'&quot; .Apa;,Vk < p Ak-4 Ph. G Oi<_jT~i,Vi ~ j etiT~Z}.</Paragraph> <Paragraph position="8"> It is cleat&quot; that PSN = PST- Furthermore, we have L;1 = (/)'\]~1 (PSS), and it is easy to show that, for 2 _.5_ n < N,</Paragraph> <Paragraph position="10"> In order to obtain PSr, we will start from PS.s' and actually apply the ~bTa~'s not on languages directly but on rain g,'otmd, rules, as the ones we are considering, a simple preproces,s ing, step is necessary.</Paragraph> <Paragraph position="11"> tim grmnnmrs that deline them. Tiffs computation is performed by the algorithm that we now present.</Paragraph> <Paragraph position="12"> Let/2~, be detined by the CFG Go = ()2, Ale, 7)o, So). For A G iV'o, the set o1' all rules having A as I.HS is notated A--> ~A-~a.<;% (t- This additive notation is a for111111 represenlion of A-~cq I ct2 \] ... ltcnce A-+0 means that no rule delines A.</Paragraph> <Paragraph position="13"> First ()7,'.1 is @plied on Go, which builds G1 = (~l, Af:I, 791, ,91 ), Ihen C/-~= is applied on G1 to produce G:, and so forth. Each time, new non-terminals are introduced: of the form (A)-~,, (A),x-4o or (A)~r, where A ~ N'i-~, A G Ns +, p E NT*, and a G S,s. Each one is defined by a formal sum as we saw above.</Paragraph> <Paragraph position="14"> The order of symbols in the RHSs of grammar rules is irrelevant since we consider commutative languages.</Paragraph> <Paragraph position="15"> Hence the RHSs ot' grammar rt,les can be denoted by :c/3 s.t. x ~ C(~*) and/':/ E C(N'*), where iV&quot; is the set o1' all non-terminals considered.</Paragraph> <Paragraph position="16"> The algorithm consists of the procedure and functions described below and uses an agenda which contains Dew i~on-terminals to be defined in Gi. The agenda is handled with a table: each hen-terminal is treated once.</Paragraph> <Paragraph position="17"> procedure main is when (A),: add to &quot;Pi end case; until Agenda is empty Reduce Gi whose axiom is Si = (Si-~)n~ ; /:t:I'ClllOVC non-terminals that are non-praductive (PS (A) = ~) or inaccessihle fi'om Si. */ end for; end procedure; flmction R'7,', (x{4) is// fl = At...Ak if ~j ~ {1, ..., k} s.t. Va E L,.,,&,('R.i), a-<PS(Aj) //if all rewritings in &quot;R.i can only q/.'/bct A.4 then add (Aj)vv.i to Agenda; retnrn xAI&quot; * &quot;Aj-1 (Aj)vz~Aj+I * * .A~; el,;e return ,~(a'fl) + y~,,.~x., ~,,. (a'/~); end function; (1) (2) flmction 'I,~(xfl) is//fl = At-.-A~.</Paragraph> <Paragraph position="18"> if~j < {J,..., a:} s.t. a-<Z;(&) then/*j is unique, see below*/add (Aj)~ t() Agenda; return :rA~-. &quot;Aj-1 (Aj)~rAj+1-..A~.; (3) el,;e if a-qx then return O; else return xfl; end fimction; flmction ff'X-+p(a;fl) i~/ fl = A1. . .Ak //A is seamhed within a:AI. * .-4k if ~j < {1, ..., k} s.t. Va-<k, a-<12(Aj) ~~if A falls //entirely within PS( Aj ) then the rewriting applies only to A.i then add (Aj)>,--,,p to Agenda; return xA~.. &quot;Aj-1 (Aj),,,~oAj+I * - -Ak; (4) else//A is searched wilhin several symboLv Consider A = y,wa w.2.. &quot;Wk s.t.</Paragraph> <Paragraph position="19"> - the longest common subword ot' x and ~ is y, -- V(t-d, Wj , (t...~PS( Aj ) // wj is Aj contribution to A if such decomposition ofA exists//that is, it is //entirely covered by x and some Aj 's then/* it is mffque: see below */add to Agenda all (Aj)wj---}~ S.I. Wj ~ e; l/all those that contribute ,-et,,,-,,.,./:j (FI,,,~#~ (Aj),~,-~) (Fiw,=, A;)/,;(5) //77te rewriting is actually cqqdied: y is deleted.fiom a:; //each contributing (i.e. non e) wj is to be deleted //(i.e. rewritten to e it, Aj); non-contributing Aj's //Jvntain tmtoudted; attd p is inserted.</Paragraph> <Paragraph position="20"> else//A cannot be pJvdtu'ed by xfl</Paragraph> <Paragraph position="22"> Unicity ofj in ffhr, and unicity of the sequence in ff,),+p: consider A-~a:XY7 C 79i-1 ; as each source symbol occurs at most once in every word of PS(Si-1), the same holds for/_2(A) hence the sets of source symbols occurring in PS(X) an0 PS(Y) are disioint.</Paragraph> </Section> <Section position="5" start_page="1018" end_page="1019" type="metho"> <SectionTitle> 6 Example </SectionTitle> <Paragraph position="0"> Consider ~2,s, = \[i~, green It, grccn27, seed .... \[ so that &quot;R.</Paragraph> <Paragraph position="1"> is partitioned in ~.1 = {ij ~jel }, &quot;R.2 = {green 17-+ VCl't7, grccnlr mod.27 light2-~lbu2 rood27 verl7}, etc. Each other &quot;R.i contains a single rule.</Paragraph> <Paragraph position="2"> The lirst iteration of the algorithm computes the grammar Gt = ff&quot;R., (Go). The resuh is: We see that the only nonterminals which have been redefined are ,5' = ,5'o and Saw. The computation of (,5'o)~ has been done through step (I) in the algorithm. This is because the terminals in lefbhand skies of 77q, nmnely the single terminal i~, are all &quot;concentrated&quot; on the single nonterminal Saw on the right-hand side of St}. This leads in turn to a requirement for a definition of (Saw)hi, which is fulfilled by step (5) in the algorithm, at which time the rewriting of il intojel is performed.</Paragraph> <Paragraph position="3"> For any group of rules &quot;R.i, as long as all terminals in the left-hand sides of rules ol&quot;R.i a,'e thus concentrated on at most one nonterminal in a right-hand side, no expansion of rules is necessary. It is only when the terminals start to be distributed on several RHS terminals or non-terminals that an expansion is required.</Paragraph> <Paragraph position="4"> This situatien is illustrated by the second iteration which maps G1 into (7, 2 = ,I,~,,. e ((71). The result is: This time, the terminals in left-hand sides of T~2 are grcen l7, nlod27 and light> We first need to compute ((S0)~)Tz=. Again, our three terminals are all concentrated on (SAw)Tz~. We thus only have to definc ((Saw)Tq)g2. Once again, the three terminals are concentrated on L,~;,r,', and we have to define (L~.,)rC/~. At this point, something interesting happens. It is not the case any more that one nonterminal on the right-hand side of the rule defining L~H.r concentrates all our terminals. In fact, G~H~ only &quot;touches&quot; grcen lT, but not the other two terminals. The algorithm then has recourse to step (2), which leads it to dctine three rules for (L~C/;.,)rC/=, involving recursive calls to ~I'gr~o,,~ ~, teenlT--f~ett7, ~} feet ~ , , The fi~st g. , . g ~lTmod271&quot;gl t'2--+ feu2 mod27 vertT &quot; &quot;. of these calls involves step (3), the second, step (4), and the third, step (5), leading to the three exlmnsions shown for (L,C/~.'r)Tz~, and eventually to the definitions for the three variants of the nonterminal GEN.</Paragraph> <Paragraph position="5"> The remaining iterations of the rewriting procedures arc of the same type as the first iteration. They lead finally to a target grammar of the form: which is only slightly less compact than the source grammar. It can be checked that this grammar enumerates 30 target graphs, the difference of 10 with the source grammar being due to the addition of the French variant &quot;feu vert&quot; along with &quot;lumi~re vcrte&quot; for translating &quot;green light&quot;.</Paragraph> </Section> class="xml-element"></Paper>