File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/n01-1024_metho.xml

Size: 28,338 bytes

Last Modified: 2025-10-06 14:07:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="N01-1024">
  <Title>Knowledge-Free Induction of Inflectional Morphologies</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Previous Approaches
</SectionTitle>
    <Paragraph position="0"> Previous morphology induction approaches have fallen into three categories. These categories differ depending on whether human input is provided and on whether the goal is to obtain affixes or complete morphological analysis. We here briefly describe work in each category.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Using a Knowledge Source to Bootstrap
</SectionTitle>
      <Paragraph position="0"> Some researchers begin with some initial human-labeled source from which they induce other morphological components. In particular, Xu and Croft (1998) use word context derived from a corpus to refine Porter stemmer output. Gaussier (1999) induces derivational morphology using an inflectional lexicon which includes part of speech information. Grabar and Zweigenbaum (1999) use the SNOMED corpus of semantically-arranged medical terms to find semantically-motivated morphological relationships. Also, Yarowsky and Wicentowski (2000) obtained outstanding results at inducing English past tense after beginning with a list of the open class roots in the language, a table of a language's inflectional parts of speech, and the canonical suffixes for each part of speech.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Affix Inventories
</SectionTitle>
      <Paragraph position="0"> A second, knowledge-free category of research has focused on obtaining affix inventories. Brent, et al.</Paragraph>
      <Paragraph position="1"> (1995) used minimum description length (MDL) to find the most data-compressing suffixes. Kazakov (1997) does something akin to this using MDL as a fitness metric for evolutionary computing. DeJean (1998) uses a strategy similar to that of Harris (1951). He declares that a stem has ended when the number of characters following it exceed some given threshold and identifies any residual following semantic relations, we identified those word pairs the stems as suffixes. that have strong semantic correlations as being</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Complete morphological analysis
</SectionTitle>
      <Paragraph position="0"> Due to the existence of morphological ambiguity (such as with the word &amp;quot;caring&amp;quot; whose stem is &amp;quot;care&amp;quot; rather than &amp;quot;car&amp;quot;), finding affixes alone does not constitute a complete morphological analysis.</Paragraph>
      <Paragraph position="1"> Hence, the last category of research is also knowledge-free but attempts to induce, for each word of a corpus, a complete analysis. Since our Most of the existing algorithms described focus on approach falls into this category (expanding upon suffixing in inflectional languages (though our earlier approach (Schone and Jurafsky, 2000)), Jacquemin and DeJean describe work on prefixes). we describe work in this area in more detail. None of these algorithms consider the general 2.3.1 Jacquemin's multiword approach Jacquemin (1997) deems pairs of word n-grams as morphologically related if two words in the first n-gram have the same first few letters (or stem) as two words in the second n-gram and if there is a suffix for each stem whose length is less than k. He also clusters groups of words having the same kinds of word endings, which gives an added performance boost. He applies his algorithm to a French term list and scores based on sampled, by-hand evaluation.</Paragraph>
      <Paragraph position="2"> 2.3.2. Goldsmith: EM and MDLs Goldsmith (1997/2000) tries to automatically sever each word in exactly one place in order to establish a potential set of stems and suffixes. He uses the expectation-maximization algorithm (EM) and MDL as well as some triage procedures to help eliminate inappropriate parses for every word in a corpus. He collects the possible suffixes for each stem and calls these signatures which give clues about word classes. With the exceptions of capitalization removal and some word segmentation, Goldsmith's algorithm is otherwise knowledge-free. His algorithm, Linguistica, is freely available on the Internet. Goldsmith applies his algorithm to various languages but evaluates in English and French.</Paragraph>
      <Paragraph position="3"> 2.3.3 Schone and Jurafsky: induced semantics In our earlier work, we (Schone and Jurafsky (2000)) generated a list of N candidate suffixes and used this list to identify word pairs which share the same stem but conclude with distinct candidate suffixes. We then applied Latent Semantic Analysis (Deerwester, et al., 1990) as a method of automatically determining semantic relatedness between word pairs. Using statistics from the morphological variants of each other. With the exception of word segmentation, we provided no human information to our system. We applied our system to an English corpus and evaluated by comparing each word's conflation set as produced by our algorithm to those derivable from CELEX.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Problems with earlier approaches
</SectionTitle>
      <Paragraph position="0"> conditions of circumfixing or infixing, nor are they applicable to other language types such as agglutinative languages (Sproat, 1992).</Paragraph>
      <Paragraph position="1"> Additionally, most approaches have centered around statistics of orthographic properties. We had noted previously (Schone and Jurafsky, 2000), however, that errors can arise from strictly orthographic systems. We had observed in other systems such errors as inappropriate removal of valid affixes (&amp;quot;ally&amp;quot;G3C&amp;quot;all&amp;quot;), failure to resolve morphological ambiguities (&amp;quot;hated&amp;quot;G3C&amp;quot;hat&amp;quot;), and pruning of semi-productive affixes (&amp;quot;dirty&amp;quot;G68&amp;quot;dirt&amp;quot;). Yet we illustrated that induced semantics can help overcome some of these errors.</Paragraph>
      <Paragraph position="2"> However, we have since observed that induced semantics can give rise to different kinds of problems. For instance, morphological variants may be semantically opaque such that the meaning of one variant cannot be readily determined by the other (&amp;quot;reusability&amp;quot;G68&amp;quot;use&amp;quot;). Additionally, high-frequency function words may be conflated due to having weak semantic information (&amp;quot;as&amp;quot;G3C&amp;quot;a&amp;quot;). Coupling semantic and orthographic statistics, as well as introducing induced syntactic information and relational transitivity can help in overcoming these problems. Therefore, we begin with an approach similar to our previous algorithm. Yet we build upon this algorithm in several ways in that we: [1] consider circumfixes, [2] automatically identify capitalizations by treating them similar to prefixes [3] incorporate frequency information, [4] use distributional information to help identify syntactic properties, and [5] use transitive closure to help find variants that may not have been found to be semantically related but which are related to mutual variants. We then apply these strategies to English,  German, and Dutch. We evaluate our algorithm Figure 2). Yet using this approach, there may be against the human-labeled CELEX lexicon in all circumfixes whose endings will be overlooked in three languages and compare our results to those the search for suffixes unless we first remove all that the Goldsmith and Schone/Jurafsky algorithms candidate prefixes. Therefore, we build a lexicon would have obtained on our same data. We show consisting of all words in our corpus and identify all how each of our additions result in progressively word beginnings with frequencies in excess of some better overall solutions. threshold (T ). We call these pseudo-prefixes. We</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="222" type="metho">
    <SectionTitle>
3 Current Approach
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="123" type="sub_section">
      <SectionTitle>
3.1 Finding Candidate Circumfix Pairings
</SectionTitle>
      <Paragraph position="0"> As in our earlier approach (Schone and Jurafsky, 2000), we begin by generating, from an untagged corpus, a list of word pairs that might be morphological variants. Our algorithm has changed somewhat, though, since we previously sought word pairs that vary only by a prefix or a suffix, yet we now wish to generalize to those with circumfixing differences. We use &amp;quot;circumfix&amp;quot; to mean true circumfixes like the German ge-/-t as well as combinations of prefixes and suffixes. It should be mentioned also that we assume the existence of languages having valid circumfixes that are not composed merely of a prefix and a suffix that appear independently elsewhere.</Paragraph>
      <Paragraph position="1"> To find potential morphological variants, our first goal is to find word endings which could serve as suffixes. We had shown in our earlier work how one might do this using a character tree, or trie (as in  strip all pseudo-prefixes from each word in our lexicon and add the word residuals back into the lexicon as if they were also words. Using this final lexicon, we can now seek for suffixes in a manner equivalent to what we had done before (Schone and Jurafsky, 2000).</Paragraph>
      <Paragraph position="2"> To demonstrate how this is done, suppose our initial lexicon G2F contained the words &amp;quot;align,&amp;quot; &amp;quot;real,&amp;quot; &amp;quot;aligns,&amp;quot; &amp;quot;realign&amp;quot;, &amp;quot;realigned&amp;quot;, &amp;quot;react&amp;quot;, &amp;quot;reacts,&amp;quot; and &amp;quot;reacted.&amp;quot; Due to the high frequency occurrence of &amp;quot;re-&amp;quot; suppose it is identified as a pseudo-prefix. If we strip off &amp;quot;re-&amp;quot; from all words, and add all residuals to a trie, the branch of the trie of words beginning with &amp;quot;a&amp;quot; is depicted in Figure 2. In our earlier work, we showed that a majority of the regular suffixes in the corpus can be found by identifying trie branches that appear repetitively.</Paragraph>
      <Paragraph position="3"> By &amp;quot;branch&amp;quot; we mean those places in the trie where some splitting occurs. In the case of Figure 2, for example, the branches NULL (empty circle), &amp;quot;-s&amp;quot; and &amp;quot;-ed&amp;quot; each appear twice. We assemble a list of all trie branches that occur some minimum number of times (T ) and refer to such as potential suffixes.</Paragraph>
      <Paragraph position="4">  Given this list, we can now find potential prefixes using a similar strategy. Using our original lexicon, we can now strip off all potential suffixes from each word and form a new augmented lexicon. Then, (as we had proposed before) if we reverse the ordering on the words and insert them into a trie, the branches that are formed will be potential prefixes (in reverse order).</Paragraph>
      <Paragraph position="5"> Before describing the last steps of this procedure, it is beneficial to define a few terms (some of which appeared in our previous work): [a] potential circumfix: A pair B/E where B and E occur respectively in potential prefix and suffix lists [b] pseudo-stem: the residue of a word after its potential circumfix is removed [c] candidate circumfix: a potential circumfix which appears affixed to at least T pseudo-stems that are  shared by other potential circumfixes [d] rule: a pair of candidate circumfixes sharing at least T pseudo-stems  [e] pair of potential morphological variants (PPMV): two words sharing the same rule but distinct candidate circumfixes [f] ruleset: the set of all PPMVs for a common rule Our final goal in this first stage of induction is to find all of the possible rules and their corresponding rulesets. We therefore re-evaluate each word in the original lexicon to identify all potential circumfixes that could have been valid for the word. For example, suppose that the lists of potential suffixes and prefixes contained &amp;quot;-ed&amp;quot; and &amp;quot;re-&amp;quot; respectively. Note also that NULL exists by default in both lists as well. If we consider the word &amp;quot;realigned&amp;quot; from our lexicon G2F, we would find that its potential circumfixes would be NULL/ed, re/NULL, and re/ed and the corresponding pseudo-stems would be &amp;quot;realign,&amp;quot; &amp;quot;aligned,&amp;quot; and &amp;quot;align,&amp;quot; respectively, From G2F, we also note that circumfixes re/ed and NULL/ing share the pseudo-stems &amp;quot;us,&amp;quot; &amp;quot;align,&amp;quot; and &amp;quot;view&amp;quot; so a rule could be created: re/edG3CNULL/ing. This means that word pairs such as &amp;quot;reused/using&amp;quot; and &amp;quot;realigned/aligning&amp;quot; would be deemed PPMVs. Although the choices in T through T is  somewhat arbitrary, we chose T =T =T =10 and  T =3. In English, for example, this yielded 30535  possible rules. Table 1 gives a sampling of these potential rules in each of the three languages in terms of frequency-sorted rank. Notice that several &amp;quot;rules&amp;quot; are quite valid, such as the indication of an English suffix -s. There are also valid circumfixes like the ge-/-t circumfix of German. Capitalization also appears (as a 'prefix'), such as CG3C c in English, DG3Cd in German, and VG3Cv in Dutch. Likewise,there are also some rules that may only be true in certain circumstances, such as -dG3C-r in English (such as worked/worker, but certainly not for steed/steer.) However, there are some rules that are</Paragraph>
      <Paragraph position="7"> wrong: the potential 's-' prefix of English is never valid although word combinations like stick/tick spark/park, and slap/lap happen frequently in English. Incorporating semantics can help determine the validity of each rule.</Paragraph>
    </Section>
    <Section position="2" start_page="123" end_page="123" type="sub_section">
      <SectionTitle>
3.2 Computing Semantics
</SectionTitle>
      <Paragraph position="0"> Deerwester, et al. (1990) introduced an algorithm called Latent Semantic Analysis (LSA) which showed that valid semantic relationships between words and documents in a corpus can be induced with virtually no human intervention. To do this, one typically begins by applying singular value decomposition (SVD) to a matrix, M, whose entries M(i,j) contains the frequency of word i as seen in document j of the corpus. The SVD decomposes M into the product of three matrices, U, D, and V such</Paragraph>
      <Paragraph position="2"> diagonal matrix whose entries are the singular values of M. The LSA approach then zeros out all but the top k singular values of the SVD, which has the effect of projecting vectors into an optimal k-dimensional subspace. This methodology is well-described in the literature (Landauer, et al., 1998; Manning and Schutze, 1999).</Paragraph>
      <Paragraph position="3"> In order to obtain semantic representations of each word, we apply our previous strategy (Schone and Jurafsky (2000)). Rather than using a term-document matrix, we had followed an approach akin to that of Schutze (1993), who performed SVD on a Nx2N term-term matrix. The N here represents the N-1 most-frequent words as well as a glob position to account for all other words not in the top N-1. The matrix is structured such that for a given word w's row, the first N columns denote words that</Paragraph>
      <Paragraph position="5"> precede w by up to 50 words, and the second N columns represent those words that follow by up to 50 words. Since SVDs are more designed to work then, if there were n items in the ruleset, the with normally-distributed data (Manning and probability that a NCS is non-random is Schutze, 1999, p. 565), we fill each entry with a normalized count (or Z-score) rather than straight frequency. We then compute the SVD and keep the top 300 singular values to form semantic vectors for We define Pr (w G3Cw )=Pr(NCS(w ,w )). We each word. Word w would be assigned the semantic choose to accept as valid relationships only those vector G0D UD, where U represents the row of W=wk w U corresponding to w and D indicates that only the k top k diagonal entries of D have been preserved.</Paragraph>
      <Paragraph position="6"> As a last comment, one would like to be able to obtain a separate semantic vector for every word (not just those in the top N). SVD computations can be expensive and impractical for large values of N.</Paragraph>
      <Paragraph position="7"> Yet due to the fact that U and V are orthogonal T matrices, we can start with a matrix of reasonablesized N and &amp;quot;fold in&amp;quot; the remaining terms, which is the approach we have followed. For details about folding in terms, the reader is referred to Manning and Schutze (1999, p. 563).</Paragraph>
    </Section>
    <Section position="3" start_page="123" end_page="123" type="sub_section">
      <SectionTitle>
3.3 Correlating Semantic Vectors
</SectionTitle>
      <Paragraph position="0"> To correlate these semantic vectors, we use normalized cosine scores (NCSs) as we had illustrated before (Schone and Jurafsky (2000)).</Paragraph>
      <Paragraph position="1"> The normalized cosine score between two words w  for each word. The NCS is given to be We had previously illustrated NCS values on various PPMVs and showed that this type of score seems to be appropriately identifying semantic relationships. (For example, the PPMVs of car/cars and ally/allies had NCS values of 5.6 and 6.5 respectively, whereas car/cares and ally/all had scored only -0.14 and -1.3.) Further, we showed that by performing this normalizing process, one can estimate the probability that an NCS is random or not. We expect that random NCSs will be approximately normally distributed according to N(0,1). We can also estimate the distribution</Paragraph>
      <Paragraph position="3"> threshold. We showed in our earlier work that T =85% affords high overall precision while still  identifying most valid morphological relationships.</Paragraph>
    </Section>
    <Section position="4" start_page="123" end_page="222" type="sub_section">
      <SectionTitle>
3.4 Augmenting with Affix Frequencies
</SectionTitle>
      <Paragraph position="0"> The first major change to our previous algorithm is an attempt to overcome some of the weaknesses of purely semantic-based morphology induction by incorporating information about affix frequencies.</Paragraph>
      <Paragraph position="1"> As validated by Kazakov (1997), high frequency word endings and beginnings in inflectional languages are very likely to be legitimate affixes. In English, for example, the highest frequency rule is -sG3CG4C. CELEX suggests that 99.7% of our PPMVs for this rule would be true. However, since the purely semantic-based approach tends to select only relationships with contextually similar meanings, only 92% of the PPMVs are retained. This suggests that one might improve the analysis by supplementing semantic probabilities with orthographic-based probabilities (Pr ).</Paragraph>
      <Paragraph position="2"> orth Our approach to obtaining Pr is motivated by orth an appeal to minimum edit distance (MED). MED has been applied to the morphology induction problem by other researchers (such as Yarowsky and Wicentowski, 2000). MED determines the minimum-weighted set of insertions, substitutions, and deletions required to transform one word into another. For example, only a single deletion is required to transform &amp;quot;rates&amp;quot; into &amp;quot;rate&amp;quot; whereas two substitutions and an insertion are required to transform it into &amp;quot;rating.&amp;quot; Effectively, if Cost(G26) is transforming cost, Cost(ratesG3Crate) = Cost(sG3CG4C) whereas Cost(ratesG3Crating)=Cost(esG3Cing). More generally, suppose word X has circumfix C =B /E  and pseudo-stem -S-, and word Y has circumfix C =B /E also with pseudo-stem -S-. Then,</Paragraph>
      <Paragraph position="4"> Since we are free to choose whatever cost function we desire, we can equally choose one whose range  lies in the interval of [0,1]. Hence, we can assign Consider Table 2 which is a sample of PPMVs Pr (XG3CY) = 1-Cost(XG3CY). This calculation implies from the ruleset for &amp;quot;-sG3CG4C&amp;quot; along with their orth that the orthographic probability that X and Y are probabilities of validity. A validity threshold (T ) of morphological variants is directly derivable from the 85% would mean that the four bottom PPMVs cost of transforming C into C . would be deemed invalid. Yet if we find that the  The only question remaining is how to determine local contexts of these low-scoring word pairs Cost(C G3CC ). This cost should depend on a number match the contexts of other PPMVs having high  of factors: the frequency of the rule f(C G3CC ), the scores (i.e., those whose scores exceed T ), then  reliability of the metric in comparison to that of their probabilities of validity should increase. If we semantics (G2E, where G2E G13 [0,1]), and the frequencies could compute a syntax-based probability for these of other rules involving C and C . We define the words, namely Pr , then assuming independence  orthographic probability of validity as we would have: Figure 3 describes the pseudo-code for an We suppose that orthographic information is less (L) and right-hand (R) sides of each valid PPMV of reliable than semantic information, so we arbitrarily a given ruleset, try to find a collection of words set G2E=0.5. Now since Pr (XG3CY)=1-Cost(C G3CC ), from the corpus that are collocated with L and R but orth 1 2 we can readily combine it with Pr if we assume which occur statistically too many or too few times sem independence using the &amp;quot;noisy or&amp;quot; formulation: in these collocations. Such word sets form</Paragraph>
      <Paragraph position="6"> s-o sem orth sem orth By using this formula, we obtain 3% (absolute) more of the correct PPMVs than semantics alone had provided for the -sG3CG4C rule and, as will be shown later, gives reasonable improvements overall.</Paragraph>
    </Section>
    <Section position="5" start_page="222" end_page="222" type="sub_section">
      <SectionTitle>
3.5 Local Syntactic Context
</SectionTitle>
      <Paragraph position="0"> Since a primary role of morphology -- inflectional morphology in particular -- is to convey syntactic information, there is no guarantee that two words that are morphological variants need to share similar semantic properties. This suggests that performance could improve if the induction process took advantage of local, syntactic contexts around words in addition to the more global, large-window</Paragraph>
      <Paragraph position="2"> s-o syntax s-o syntax algorithm to compute Pr . Essentially, the syntax algorithm has two major components. First, for left a randomly-chosen set of words from the corpus as well as for each of the PPMVs of the ruleset that are not yet validated. Lastly, compute the NCS and their corresponding probabilities (see equation 1) between the ruleset's signatures and those of the tobe-validated PPMVs to see if they can be validated. Table 3 gives an example of the kinds of contextual words one might expect for the &amp;quot;-sG3CG4C&amp;quot; rule. In fact, the syntactic signature for &amp;quot;-sG3CG4C&amp;quot; does indeed include such words as are, other, these, two, were, and have as indicators of words that occur on the left-hand side of the ruleset, and a, an, this, is, has, and A as indicators of the right-hand side.</Paragraph>
      <Paragraph position="3"> These terms help distinguish plurals from singulars.</Paragraph>
      <Paragraph position="4">  Context for L Context for R agendas are seas were a legend this formula two red pads pleas have militia is an area these ideas other areas railroad has A guerrilla There is an added benefit from following this approach: it can also be used to find rules that, though different, seem to convey similar information . Table 4 illustrates a number of such agreements. We have yet to take advantage of this feature, but it clearly could be of use for part-of-speech induction.</Paragraph>
    </Section>
    <Section position="6" start_page="222" end_page="222" type="sub_section">
      <SectionTitle>
3.6 Branching Transitive Closure
</SectionTitle>
      <Paragraph position="0"> Despite the semantic, orthographic, and syntactic components of the algorithm, there are still valid PPMVs, (XG3CY), that may seem unrelated due to corpus choice or weak distributional properties.</Paragraph>
      <Paragraph position="1"> However, X and Y may appear as members of other valid PPMVs such as (XG3CZ) and (ZG3CY) containing variants (Z, in this case) which are either semantically or syntactically related to both of the other words. Figure 4 demonstrates this property in greater detail. The words conveyed in Figure 4 are all words from the corpus that have potential relationships between variants of the word &amp;quot;abuse.&amp;quot; Links between two words, such as &amp;quot;abuse&amp;quot; and &amp;quot;Abuse,&amp;quot; are labeled with a weight which is the semantic correlation derived by LSA. Solid lines represent valid relationships with Pr G070.85 and sem dashed lines indicate relationships with lower-thanthreshold scores. The absence of a link suggests that either the potential relationship was never identified or discarded at an earlier stage. Self loops are assumed for each node since clearly each word should be related morphologically to itself. Since there are seven words that are valid morphological relationships of &amp;quot;abuse,&amp;quot; we would like to see a complete graph containing 21 solid edges. Yet, only eight connections can be found by semantics alone (AbuseG3Cabuse, abusersG3Cabusing, etc.).</Paragraph>
      <Paragraph position="2"> However, note that there is a path that can be followed along solid edges from every correct word to every other correct variant. This suggests that taking into consideration link transitivity (i.e., if XG3CY, YG3CY, YG3CY ,... and YG3CZ, then XG3CZ) 11 22 3 t may drastically reduce the number of deletions.</Paragraph>
      <Paragraph position="3"> There are two caveats that need to be considered for transitivity to be properly pursued. The first caveat: if no rule exists that would transform X into Z, we will assume that despite the fact that there may be a probabilistic path between the two, we  will disregard such a path. The second caveat is that the algorithms we test against. Furthermore, since we will say that paths can only consist of solid CELEX has limited coverage, many of these loweredges, namely each Pr(YG3CY ) on every path must frequency words could not be scored anyway. This i i+1 exceed the specified threshold. cut-off also helps each of the algorithms to obtain Given these constraints, suppose now there is a stronger statistical information on the words they do transitive relation from X to Z by way of some process which means that any observed failures intermediate path G8C={Y Y Y }. That is, assume cannot be attributed to weak statistics. i 1, 2,.. t there is a path XG3CY YG3CY ,...,YG3CZ. Suppose Morphological relationships can be represented as 1, 1 2 t also that the probabilities of these relationships are directed graphs. Figure 6, for instance, illustrates respectively p , p , p ,...,p . If G15 is a decay factor in the directed graph, according to CELEX, of words 0 12 t the unit interval accounting for the number of link associated with &amp;quot;conduct.&amp;quot; We will call the words separations, then we will say that the Pr(XG3CZ) of such a directed graph the conflation set for any of along path G8C has probability . We the words in the graph. Due to the difficulty in i combine the probabilities of all independent paths developing a scoring algorithm to compare directed between X and Z according to Figure 5: graphs, we will follow our earlier approach and only If the returned probability exceeds T , we declare X  and Z to be morphological variants of each other.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML