File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2130_metho.xml

Size: 14,748 bytes

Last Modified: 2025-10-06 14:14:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2130">
  <Title>Learning Part-of-Speech Guessing Rules from Lexicon: Extension to Non-Concatenative Operations*</Title>
  <Section position="4" start_page="770" end_page="771" type="metho">
    <SectionTitle>
2 The Learning Paradigm
</SectionTitle>
    <Paragraph position="0"> The major topic in the development of worth pos guessers is the strategy which is to be used f()r dm acquisition of the guessing rules.</Paragraph>
    <Paragraph position="1"> Brill (Brill, 1995) outlines a transformation-based learner which learns guessing rules from a pre-tagged training corpus. A statistical-based suffix learnex is presented in (Schmid, 1994). From a l)re-tagged training corpus it constructs the suffix tree where every sutfix is associated with its information measure.</Paragraph>
    <Paragraph position="2"> The learning technique employed in the induction of tile rules of the cascading guesser (Mikheev, 1996) does not require specially prepared training data and employs fully tmsupervised statistical learning from the lexicon supplied with the tagger and word-ti'equeneies obtained from a raw corpus. The learning is implemented as a two-staged process with fe.edback. First, setdng certain parameters a set of guessing rules is acquired, th(m it is evaluated and the results of evaluation are used for re.-acquisition of a bette.r tuued rule-set. As it has been already said, this learning technktue t)roved to be very successful, but did not attempt at the acquisition of word-guessing rules which do not obey simple concatenations of a main word with some prefix. Ilere we present an e, xte, nsion to accommodate such cases.</Paragraph>
    <Section position="1" start_page="770" end_page="771" type="sub_section">
      <SectionTitle>
2.1 Rule Extraction Phase
</SectionTitle>
      <Paragraph position="0"> In the initial learning technique (Mikheev, 1996) which ac(:ounted only tbr sitnl)le concatenative regularities a guessing rule was seen as a triph',: A = (S, I, H,) where S is the affix itself; I is the l'os-elass of words which should be looked llI) in the lexicon as main forms; R is the pos-elass which is assigned to unknown words if the rule is satisfied.</Paragraph>
      <Paragraph position="1"> IIere we extend this structure to handle cases of the mutation in the last n letters of tile main word (words of/-class), as, for instance, in the case of try -/tries, wtlen the letter &amp;quot;y&amp;quot; is changed to &amp;quot;i&amp;quot; before the suffix. To accommodate such alterations we included an additional mutation dement (M) into tile rule structure. This element keeps the, segment to be added to the main word. So the application of a guessing rule can be described as: unknown-word - 5' + M : I -}H, i.e. fl'om an unknown word we strip the affix S, add the nlutative segment M, lookup tile produced string in the lexicon and if it is of (:lass I we conclude that the unknown word is of class H,. For examt)le: the suttix rule A~: \[ S ~-~ ied I~--- (NN, VB) R~--~ (JJ VBD VBN) M=y\] or in short lied (NN VB) (JJ VBD VBN) y\] says that if there is an unknown word which ends with &amp;quot;led&amp;quot;, we should strip this ending and append to the remaining part the string &amp;quot;y&amp;quot;. If then we find this word in the lexicon as (NN VB) (noun/verb), we conclude that the guessed word is of category (JZ VBD VBN) (adjective, past verb or participle). This rule, for example, will work for word pairs like ,pecify - specified or deny - denied. Next, we modified the V operator which was  used for the extraction of morphological guessing rules. We augmented this operator with the index n which specifies the length of the mutative ending of the main word. Thus when the index n is 0 the result of the application of the V0 operator will be a morphological rule without alterations.</Paragraph>
      <Paragraph position="2"> The V1 operator will extract the rules with the alterations in the last letter of tile main word, as in the example above. The V operator is applied to a pair of words from the lexicon. First it segments the last n characters of the shorter word and stores this in the M element of the rule. Then it tries to segment an affix by subtracting the shorter word without the mutative ending from the longer word. If the subtraction results in an non-empty string it creates a morphological rule by storing the pos-class of the shorter word as the /-class, the pos-class of the longer word as the R(:lass and the segmented affix itself. For example: \[booked (JJ VBD VBN)\] Vo \[book (NN VB)\] --) A s : \[ed (NN VB) (JZ VBD VBN) .... \] \[advisable (JJ VBD VBN)\] ~1 \[advise (NN VB)\] --~ A ~ : \[able (NN VB) (JJ VBD VBN) &amp;quot;e&amp;quot;\] The V operator is applied to all possible lexicon-entry pairs and if a rule produced by such an application has already been extracted from another pair, its frequency count (f) is incremented. Thus sets of morphological guessing rules together with their calculated frequencies are produced. Next, from these sets of guessing rules we need to cut out infrequent rules which might bias the further learning process. To do that we eliminate all the rules with the frequency f less than a certain threshold 0:. Such filtering reduces the rule-sets more than tenfold and does not leave clearly coincidental cases among the rules.</Paragraph>
    </Section>
    <Section position="2" start_page="771" end_page="771" type="sub_section">
      <SectionTitle>
2.2 Rule Scoring Phase
</SectionTitle>
      <Paragraph position="0"> Of course, not all acquired rules are equally good as plausible guesses about word-classes. So, for ew~ry acquired rule we need to estimate whether it is an effective rule which is worth retaining in the final rule-set. To perform such estimation we take one-by-one each rule from the rule-sets produced at the rule extraction phase, take each word-token from the corpus and guess its POS-set using the rule if the rule is applicable to the word.</Paragraph>
      <Paragraph position="1"> For example, if a guessing rule strips a pro'titular suffix and a current word from the corpus does not have such suffix we classify these word and rule as incompatible and the rule as not applicable to that word. If the rule is applicable to the word we perform lookup in the lexicon and then compare the result of the guess with the information listed in the lexicon. If the guessed pos-set is the same as the Pos-set stated in the lexicon, we count it as success, otherwise it is failure. Then for each rule :usually we set this threshold quite low: 2-4.</Paragraph>
      <Paragraph position="2"> we calculate its score as explained in (Mikheev, 1996) using the scoring function as follows: .~/t 1 scorei =/3i - 1.65 * V '~, t~ + log(ISd)) where /3 is the proportion of all positive outcomes (x) of the rule application to the total number of compatible to the rule words (n), and ISl is the length of the affix. We also smooth/3 so as not to have zeros in positive or negative outcome</Paragraph>
      <Paragraph position="4"> Setting the threshold Os at a certain level lets only the rules whose score is higher than the threshold to be included into the final rule-sets.</Paragraph>
      <Paragraph position="5"> The method for setting up the threshold is based on empirical evaluations of the rule-sets and is described in Section 2.3.</Paragraph>
    </Section>
    <Section position="3" start_page="771" end_page="771" type="sub_section">
      <SectionTitle>
2.3 Setting the Threshold
</SectionTitle>
      <Paragraph position="0"> The task of assigning a set of pos-tags to a particular word is actually quite similar to the task of document categorisation where a document should be assigned with a set of descriptors which represent its contents. The performance of such assignment can be measured in: recall - the percentage of pos-tags which the guesser assigned correctly to a word; precision - the percentage of POS-tags tile guesser assigned correctly over the total number of pos-tags it assigned to the word; coverage - tile proportion of words which the guesser was able to classify, but not necessarily correctly.</Paragraph>
      <Paragraph position="1"> There are two types of test-data in use at this stage. First, we measure the performance of a guessing rule-set against the actual lexicon: every word from the lexicon, except for closed-class words and words shorter than five characters, is guessed by the rule-sets and the results are compared with the information the word has in the lexicon. In the second experiment we measure the performance of the guessing rule-sets against the training corpus. For every word we measure its metrics exactly as in the previous experiment. Then we multiply these measures by the corpus frequency of this particular word and average them. Thus the most fi'equent words have the greatest influence on the final measures.</Paragraph>
      <Paragraph position="2"> To extract the best-scoring rule-sets for each acquired set of rules we produce several final rule-sets setting the threshold 0, at different values. For each produced rule-set we record tile three metrics (precision, recall and coverage) and choose the sets with the best aggregate measures.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="771" end_page="773" type="metho">
    <SectionTitle>
3 Learning Experiment
</SectionTitle>
    <Paragraph position="0"> One of the most important issues in the induction of guessing rule-sets is the choice of right data for training. In our approach, guessing rules are ex- null corpus. As0 - suffixes with alterations scored over 80 points, $60 - suffixes without alterations scored over 60 points, ET~ - ending-guessing rule-set scored over 75 points. tracted from the lexicon and the actual corpus fre~ qnencies of word-usage then allow for discrinfination between rules which are no longer productive (but haw'~ left their imprint on the basic lexicon) and rules that are productive in real-life texts.</Paragraph>
    <Paragraph position="1"> Thus the major factor ill the learning process is the lexicon - it should be as general as possible (list all possible Poss for a word) and as large as possible, since guessing rules are meant to capture general language regularities. The corresponding corpus should include most of the words fi'om the lexicon and be large enough to obtain reliable estimates of word-frequency distribution.</Paragraph>
    <Paragraph position="2"> We performed a rule-induction experiment using the lexicon and word-frequencies derived from the Brown Corpus (Prancis&amp;Kucera, 1982).</Paragraph>
    <Paragraph position="3"> There are a number of reasons tbr choosing tile Brown Corpus data for training. The most important ones are that the Brown Corpus provides a model of general multi-domain language use, so general language regularities carl be induced h'om it;, and second, many taggers come with data trained on the. Brown Corpus which is useflll for comparison and evaluation. This, however, hy no means restricts the described technique to that or any other tag-set, lexicon or corpus. Moreover, despite the fact that tile training is performed on a particular lexicon and a particular corpus, the obtained guessing rules suppose to be domain and corpus independent and the only trainingdependent, feature is the tag-set in use.</Paragraph>
    <Paragraph position="4"> Using the technique described above and the lexicon derived frora the Brown Corpus we extracted prefix morphological rules (no alterations), suffix morphological rules without alterations and ending guessing rules, exactly as it was done in (Mikheev, 1996). Then we extracted suffix morphological rules with alterations ill the last letter (V1), which was a new rule-set for the cascading guesser. Quite interestingly apart froln tile expected suffix rules with alterations as: \[ S= led 1= (NN, VB) R= (JJ VBD VBN) M=y\] which can handle pairs like deny -+denied, this rule-set was populated with &amp;quot;second-order&amp;quot; rules which describe dependencies between secondary fornls of words. For instance, the rule</Paragraph>
    <Paragraph position="6"> says if by deleting the suffix &amp;quot;ion&amp;quot; from a word and adding &amp;quot;s&amp;quot; to the end of the result of this deletion we produce a word which is listed in the lexicon as a plural noun and 3-rd form of a verb (NNS VBZ) the unknown word is a noun (NN).</Paragraph>
    <Paragraph position="7"> This rule, for instance, is applicable to word pairs: affects -+affection, asserts -+assertion, etc.</Paragraph>
    <Paragraph position="8"> Table 1 presents some results of a comparative study of the cascading application of the new rule-set against the standard rule-sets of the cascading guesser. Tim first part of Table 1 shows tile best obtained scores for the standard suffix rules (S) and suffix rules with ~flterations in the last letter (A). Wtmn we applied the two suffix rule-sets cascadingly their joint lexical coverage increased by about 7-8% (from 37% to 45% on the lexicon and fl'om 30% to 37% on the corpus) while precision and recall remained at tile sanle high level.</Paragraph>
    <Paragraph position="9"> This was quite an encouraging result which, a('tually, agreed with our prediction. Then we measured whether suffix rules with alterations (A) add any improveulent if they are used in conjunction with the ending-guessing rules. Like in the previous experiment we measured the precision, recall and coverage both on tim lexicon and on tile corpus. The second part of Table 1 shows that simple concatenative suffix rules ($60) improved the precision of the guessing when they were applied before the ending-guessing rules (E75) by about 5%. Then we cascadingly applied the suffix rules with alterations (As0) whictl caused further improvement in precision by about 1%.</Paragraph>
    <Paragraph position="10"> After obtaining the optimal rule-sets we performed tile same experiments on a word-sample which was not included into the training lexicon and corpus. We gathered about three thousand words from tile lexicon devcloped for tile Wall  guesser with the additional rule-set of suffixes-with-Alterations. For each of these cascading guessers two tagging experiments were performed: the tagger was equipped with the flfll Brown Corpus lexicon and with the small lexicon of closed-class and short words (5,465 entries). Street Journal corpus 2 and collected frequencies of these words in this corpus. At this test-sample evaluation we obtained similar metrics apart from the (:overage which dropped by about 7% for both kinds of sutfix rules. This, actually, did not come as a surprise, since many main tbrms required by the suffix rules were missing in the lexicon.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML