XML Viewer - h89-2014

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-2014_metho.xml
Size: 9,009 bytes
Last Modified: 2025-10-06 14:12:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2014">
  <Title>Augmenting a Hidden Markov Model for Phrase-Dependent Word Tagging</Title>
  <Section position="3" start_page="94" end_page="95" type="metho">
    <SectionTitle>
AUGMENTED NETWORK
BASIC NETWORK
FULLY-CONNECTED NETWORK
CONTAINING ALL STATES
EXCEPT DETERMINER
</SectionTitle>
    <Paragraph position="0"> Augmenting the Model by Use of Networks The basic model consists of a first-order fully connected network. The lexical context available for modeling a word's category is solely the category of the preceding word (expressed via the transition probabilities P(Ci \[ Ci-1). Such limited context does not adequately model the constraint present in local word context. A straightforward method of extending the context is to use second-order conditioning which takes account of the previous two word categories. Transition probabilities are then of the form P(Ci \[ Ci-1, Ci-2). For an n category model this requires n 3 transition probabilities. Increasing the order of the conditioning requires exponentially more parameters. In practice, models have been limited to second-order, and smoothing methods are normally required to deal with the problem of estimation with limited data. The conditioning just described is uniform- all possible two-category contexts are modeled. Many of these neither contribute to the performance of the model, nor occur frequently enough to be estimated properly: e.g. P(Ci = determiner \[ el-1 -~ determiner, Ci-2 = determiner).</Paragraph>
    <Paragraph position="1"> An alternative to uniformly increasing the order of the conditioning is to extend it selectively. Mixed higher-order context can be modeled by introducing explicit state sequences. In the arrangement the basic first-order network remains, permitting all possible category sequences, and modeling first-order dependency. The basic network is then augmented with the extra state sequences which model certain category sequences in more detail. The design of the augmented network has been based on linguistic considerations and also upon an analysis of tagging errors made by the basic network.</Paragraph>
    <Paragraph position="2"> As an example, we may consider a systematic error made by the basic model. It concerns the disambiguation of the equivalence class adjective-or-noun following a determiner. The error is exemplified by the sentence fragment &amp;quot;The period of...&amp;quot;, where &amp;quot;period&amp;quot; is tagged as an adjective. To model the context necessary to correct the error, two extra states are used, as shown in Figure 1. The &amp;quot;augmented network&amp;quot; uniquely models all second-order dependencies of the type determiner - noun - X, and determiner - adjective - X (X ranges over {cl...cn}). Training a hidden Markov model having this topology corrected all nine instances of the error in the test data. An important point to note is that improving the model detail in this manner does not forcibly correct the error. The actual patterns of category usage must be distinct in the language.</Paragraph>
    <Paragraph position="3">  To complete the description of the augmented model it is necessary to mention tying of the model states (Jelinek and Mercer, 1980). Whenever a transition is made to a state, the state-dependent probability distribution P(Eqvi I Ci) is used to obtain the probability of the observed equivalence class. A state is generally used in several places (E.g. in Figure 1. there are two noun states, and two adjective states: one of each in the augmented network, and in the basic network). The distributions P(Eqvi I Ci) are considered to be the same for every instance of the same state. Their estimates are pooled and re-assigned identically after each iteration of the Baum-Welch algorithm.</Paragraph>
    <Section position="1" start_page="95" end_page="95" type="sub_section">
      <SectionTitle>
Modeling Dependencies across Phrases
</SectionTitle>
      <Paragraph position="0"> Linguistic considerations can be used to correct errors made by the model. In this section two illustrations are given, concerning simple subject/verb agreement across an intermediate prepositional phrase. These are exemplified by the following sentence fragments:  1. &amp;quot;Temperatures in the upper mantle range apparently from....&amp;quot;.</Paragraph>
      <Paragraph position="1"> 2. &amp;quot;The velocity of the seismic waves rises to...&amp;quot;.</Paragraph>
      <Paragraph position="2">  The basic model tagged these sentences correctly, except for- &amp;quot;range&amp;quot; and &amp;quot;rises&amp;quot; which were tagged as noun and plural-noun respectively 1. The basic network cannot model the dependency of the number of the verb on its subject, which precedes it by a prepositional phrase. To model such dependency across the phrase, the networks shown in Figure 2 can be used. It can be seen that only simple forms of prepositional phrase are modeled in the networks; a single noun may be optionally preceded by a single adjective and/or determiner. The final transitions in the networks serve to discriminate between the correct and incorrect category assignment given the selected preceding context. As in the previous section, the corrections are not programmed into the model. Only context has been supplied to aid the training procedure, and the latter is responsible for deciding which alternative is more likely, based on the training data. (Approximately 19,000 sentences were used to train the networks used in this example).</Paragraph>
      <Paragraph position="3">  In Figure 2, the two copies of the prepositional phrase are trained in separate contexts (preceding singulax/plural nouns). This has the disadvantage that they cannot share training data. This problem could be resolved by tying corresponding transitions together. Alternatively, investigation of a trainable grammar (Baker, 1979; Fujisaki et al., 1989) may be a fruitful way to further develop the model in terms of grammatical components.</Paragraph>
      <Paragraph position="4"> A model containing all of the refinements described, was tested using a magazine article containing 146 sentences (3,822 words). A 30,000 word dictionary was used, supplemented by inflectional analysis for words not found directly in the dictionary. In the document, 142 words were tagged as unknown (their possible categories were not known). A total of 1,526 words had ambiguous categories (i.e. 40% of the document). Critical examination of the tagging provided by the augmented model showed 168 word tagging errors, whereas the basic model gave 215 erroneous word tags. The former represents 95.6% correct word tagging on the text as a whole (ignoring unknown words), and 89% on the ambiguous words. The performance of a tagging program depends on the choice and number of categories used, and the correct tag assignment for words is not always obvious. In cases where the choice of tag was unclear (as often occurs in idioms), the tag was ruled as incorrect. For example, 9 errors are from 3 instances of &amp;quot;... as well as ...&amp;quot; that arise in the text. It would be appropriate to deal with idioms separately, as done by Gaxside, Leech and Sampson (1987). Typical errors beyond the scope of the model described here are exemplified by incorrect adverbial and prepositional assignment.</Paragraph>
      <Paragraph position="5">  For example, consider the word &amp;quot;up&amp;quot; in the following sentences: &amp;quot;He ran up a big bill&amp;quot;.</Paragraph>
      <Paragraph position="6"> &amp;quot;He ran up a big hill&amp;quot;.</Paragraph>
      <Paragraph position="7"> Extra information is required to assign the correct tagging. In these examples it is worth noting that even if a model was based on individual words, and trained on a pre-tagged corpus, the association of &amp;quot;up&amp;quot; (as adverb) with &amp;quot;bill&amp;quot; would not be captured by trigrams. (Work on phrasal verbs, using mutual information estimates (Church et ai., 1989b) is directly relevant to this problem). The tagger could be extended by further category refinements (e.g. inclusion of a gerund category), and the single pronoun category currently causes erroneous tags for adjacent words. With respect to the problem of unknown words, alternative category assignments for them could be made by using the context embodied in transition probabilities.</Paragraph>
      <Paragraph position="8"> Conclusions A stochastic method for assigning part-of-speech categories to unrestricted English text has been described. It minimizes the resources required for high performance automatic tagging. A pre-tagged training corpus is not required, and the tagger can cope with words not found in the training text. It can be trained reliably on moderate amounts of training text, and through the use of selectively augmented networks it can model high-order dependencies without requiring an excessive number of parameters.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML