File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1501_metho.xml
Size: 12,457 bytes
Last Modified: 2025-10-06 14:10:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1501"> <Title>References</Title> <Section position="4" start_page="1" end_page="1" type="metho"> <SectionTitle> 3 Linguistic Facts </SectionTitle> <Paragraph position="0"> We illustrate the differences between LA and the men do not like this work Lexically, we observe that the word for 'work' is a4a30a6a9a8a7a10a41a0 Al$gl in LA but a4a30a29a42a31a9a10a11a0 AlEml in MSA. In contrast, the word for 'men' is the same in both LA and MSA: a22a19a24a25a27a26a28a10a11a0 AlrjAl. There are typically also differences in function words, in our example a12 $ (LA) and a40 lA (MSA) for 'not'. Morphologically, we see that LA a14a9a15a21a17a19a18a21a20 byHbw has the same stem as MA a38a43a17a19a39 yHb, but with two additional morphemes: the present aspect marker b- which does not exist in MSA, and the agreement marker -w, which is used in MSA only in subject-initial sentences, while in LA it is always used.</Paragraph> <Paragraph position="1"> Syntactically, we observe three differences.</Paragraph> <Paragraph position="2"> First, the subject precedes the verb in LA (SVO order), but follows in MSA (VSO order). This is in fact not a strict requirement, but a strong preference: both varieties allow both orders, but in the dialects, the SVO order is more common, while in MSA, the VSO order is more common. Second, we see that the demonstrative determiner follows the noun in LA, but precedes it in MSA. Finally, we see that the negation marker follows the verb in LA, while it precedes the verb in MSA. (Levantine also has other negation markers that precede the verb, as well as the circum x m- -$.) The two phrase structure trees are shown in Figure 1 in the convention of the Linguistic Data Consortium (Maamouri et al., 2004). Unlike the phrase structure trees, the (unordered) dependency trees for the MSA and LA sentences are isomorphic, as shown in Figure 2. They differ only in the node labels.</Paragraph> </Section> <Section position="5" start_page="1" end_page="4" type="metho"> <SectionTitle> 4 Model </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1" end_page="4" type="sub_section"> <SectionTitle> 4.1 The synchronous TSG+SA formalism </SectionTitle> <Paragraph position="0"> Our parser (Chiang, 2000) is based on synchronous tree-substitution grammar with sister-adjunction (TSG+SA). Tree-substitution grammar (Schabes, 1990) is TAG without auxiliary trees or adjunction; instead we include a weaker composition operation, sister-adjunction (Rambow et al., 2001), in which an initial tree is inserted between two sister nodes (see Figure 4). We allow multiple sister-adjunctions at the same site, similar to how Schabes and Shieber (1994) allow multiple adjunctions of modi er auxiliary trees.</Paragraph> <Paragraph position="1"> A synchronous TSG+SA is a set of pairs of elementary trees. In each pair, there is a one-to-one correspondence between the substitution/sisteradjunction sites of the two trees, which we represent using boxed indices (Figure 5). A derivation then starts with a pair of initial trees and proceeds by substituting or sister-adjoining elementary tree pairs at coindexed sites. In this way a set of string pairs <S,Sprime> is generated.</Paragraph> <Paragraph position="2"> Sister-adjunction presents a special problem for synchronization: if multiple tree pairs sister-adjoin at the same site, how should their order on the source side relate to the order on the target side? Shieber's solution (Shieber, 1994) is to allow any ordering. We adopt a stricter solution: for each pair of sites, x a permutation (either identity or reversal) for the tree pairs that sister-adjoin there. Owing to the way we extract trees from the Treebank, the simplest choice of permutations is: if the two sites are both to the left of the anchor or both to the right of the anchor, then multiple sister-adjoined tree pairs will appear in the same order on both sides; otherwise, they will appear in the opposite order. In other words, multiple sister-adjunction always adds trees from the anchor outward. null A stochastic synchronous TSG+SA adds probabilities to the substitution and sister-adjunction operations: the probability of substituting an elementary tree pair <a,aprime> at a substitution site pair <e,eprime> is Ps(a,aprime |e,eprime), and the probability of sister-adjoining <a,aprime> at a sister-adjunction site pair <e,i,eprime,iprime> is Psa(a,aprime |e,i,eprime,iprime), where i and iprime indicate that the sister-adjunction occurs between the i and (i + 1)st (or iprime and (iprime + 1)st) sisters. These parameters must satisfy the normalization conditions</Paragraph> <Paragraph position="4"/> </Section> <Section position="2" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.2 Parsing by translation </SectionTitle> <Paragraph position="0"> We intend to apply a stochastic synchronous TSG+SA to input sentences Sprime. This requires projecting any constraints from the unprimed side of the synchronous grammar over to the primed side, and then parsing the sentences Sprime using the projected grammar, using a straightforward generalization of the CKY and Viterbi algorithms. This gives the highest-probability derivation of the synchronous grammar that generates Sprime on the primed side, which includes a parse for Sprime and, as a byproduct, a parsed translation of Sprime.</Paragraph> <Paragraph position="1"> Suppose that Sprime is a sentence of LA. For the present task we are not actually interested in the MSA translation of Sprime, or the parse of the MSA translation; we are only interested in the parse of Sprime. The purpose of the MSA side of the grammar is to provide reliable statistics. Thus, we approximate the synchronous rewriting probabilities as:</Paragraph> <Paragraph position="3"> These factors, as we will see shortly, are much easier to estimate given the available resources.</Paragraph> <Paragraph position="4"> This factorization is analogous to a hidden Markov model: the primed derivation is the observation, the unprimed derivation is the hidden state sequence (except it is a branching process instead of a chain); the Ps and Psa are like the transition probabilities and the Pt are like the observation probabilities. Hence, we call this model a hidden TAG model.</Paragraph> </Section> <Section position="3" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.3 Parameter estimation and smoothing </SectionTitle> <Paragraph position="0"> Ps and Psa are the parameters of a monolingual TSG+SA and can be learned from a monolingual Treebank (Chiang, 2000); the details are not important here.</Paragraph> <Paragraph position="1"> As for Pt, in order to obtain better probability estimates, we further decompose Pt into Pt1 and Pt2 so they can be estimated separately (as in the monolingual parsing model):</Paragraph> <Paragraph position="3"> where w and t are the lexical anchor of a and its POS tag, and -a is the equivalence class of a modulo lexical anchors and their POS tags. Pt2 represents the lexical transfer model, and Pt1 the syntactic transfer model. Pt1 and Pt2 are initially assigned by hand; Pt1 is then reestimated by EM.</Paragraph> <Paragraph position="4"> Because the full probability table for Pt1 would be too large to write by hand, and because our training data might be too sparse to reestimate it well, we smooth it by approximating it as a linear combination of backoff models:</Paragraph> <Paragraph position="6"> where each li, unlike in the monolingual parser, is simply set to 1 if an estimate is available for that level, so that it completely overrides the further backed-off models.</Paragraph> <Paragraph position="7"> The initial estimates for the Pt1i are set by hand. The availability of three backoff models makes it easy to specify the initial guesses at an appropriate level of detail: for example, one might give a general probability of some -a mapping to -aprime using Pt13, but then make special exceptions for particular lexical anchors using Pt11 or Pt12.</Paragraph> <Paragraph position="8"> Finally Pt2 is reestimated by EM on some held-out unannotated sentences of Lprime, using the same method as Chiang and Bikel (2002) but on the syntactic transfer probabilities instead of the mono-lingual parsing model. Another difference is that, following Bikel (2004), we do not recalculate the li at each iteration, but use the initial values throughout.</Paragraph> </Section> </Section> <Section position="6" start_page="4" end_page="5" type="metho"> <SectionTitle> 5 A Synchronous TSG-SA for Dialectal Arabic </SectionTitle> <Paragraph position="0"> Just as the probability model discussed in the preceding section factored the rewriting probabilities into three parts, we create a synchronous TSG-SA and the probabilities of a hidden TAG model in three steps: * Ps and Psa are the parameters of a monolingual TSG+SA for MSA. We extract a grammar for the resource-rich language (MSA) from the Penn Arabic Treebank in a process described by Chiang and others (Chiang, 2000; Xia et al., 2000; Chen, 2001).</Paragraph> <Paragraph position="1"> * For the lexical transfer model Pt2, we create by hand a probabilistic mapping between (word, POS tag) pairs in the two languages.</Paragraph> <Paragraph position="2"> * For the syntactic transfer model Pt1, we created by hand a grammar for the resource-poor language and a mapping between elementary trees in the two grammars, along with initial guesses for the mapping probabilities.</Paragraph> <Paragraph position="3"> We discuss the hand-crafted lexicon and synchronous grammar in the following subsections.</Paragraph> <Section position="1" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 5.1 Lexical Mapping </SectionTitle> <Paragraph position="0"> We used a small, hand-crafted lexicon of 100 words which mapped all LA function words and some of the most common open-class words to MSA. We assigned uniform probabilities to the mapping. All other MSA words were assumed to also be LA words. Unknown LA words were handled using the standard unknown word mechanism. null</Paragraph> </Section> <Section position="2" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 5.2 Syntactic Mapping </SectionTitle> <Paragraph position="0"> Because of the underlying syntactic similarity between the two varieties of Arabic, we assume that every tree in the MSA grammar extracted from the MSA treebank is also a LA tree. In addition, we de ne tree transformations in the Tsurgeon package (Levy and Andrew, 2006). These consist of a pattern which matches MSA elementary trees in the extracted grammar, and a transformation which produces a LA elementary tree. We perform the following tree transformations on all elementary trees which match the underlying MSA pattern. Thus, each MSA tree corresponds to at least two LA trees: the original one and the transformed one. If several transformations apply, we obtain multiple transformed trees.</Paragraph> <Paragraph position="1"> * Negation (NEG): we insert a $ negation marker immediately following each verb.</Paragraph> <Paragraph position="2"> The preverbal marker is generated by a lexical translation of an MSA elementary tree.</Paragraph> <Paragraph position="3"> and LA treebanks. But pure VSO constructions (without pro-drop) occur in the LA corpus only 10ordering in MSA. Hence, the goal is to skew the distributions of the SVO constructions in the MSA data. Therefore, VSO constructions are replicated and converted to SVO constructions. One possible resulting pair of trees is shown in Figure 5.</Paragraph> <Paragraph position="4"> * The bd construction (BD): bd is a LA noun that means 'want'. It acts like a verb in verbal constructions yielding VP constructions headed by NN. It is typically followed by an enclitic possessive pronoun. Accordingly, we de ned a transformation that translated all the verbs meaning 'want'/'need' into the noun bd and changed their respective POS tag to NN. The subject clitic is transformed into a possessive pronoun clitic. Note that this construction is a combination lexical and syntactic transformation, and thus specifically exploits the extended domain of locality of TAG-like formalisms. One possible resulting pair of trees is shown in Figure 6.</Paragraph> </Section> </Section> class="xml-element"></Paper>