File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0312_metho.xml

Size: 24,423 bytes

Last Modified: 2025-10-06 14:14:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0312">
  <Title>Learning to Tag Multilingual Texts Through Observation</Title>
  <Section position="3" start_page="0" end_page="112" type="metho">
    <SectionTitle>
2 RoboTag Architecture
</SectionTitle>
    <Paragraph position="0"> RoboTag design was motivated by our goal of developing an interactive learning system. The system had to process a large number of texts as well as provide the ability to visualize learning results and allow feedback to the learning system. To this end, RoboTag was designed as a client/server architecture. The client interface is an enhancement of a manual annotation tool. The interface works with multiple languages and includes support for both single- and double-byte coding schemes. We focus on English and Japanese in this paper. The server por- null tion of the system performs all the document management, text preprocessing, and machine learning functions. Because it was important to facilitate interaction between the user and the learning system, it was essential to show learned results rapidly. By separating the client interface from the server which performs the learning functionality, it was possible to use fast machines for the CPU-intensive learning operations rather than relying on the user's desktop machine.</Paragraph>
    <Section position="1" start_page="109" end_page="109" type="sub_section">
      <SectionTitle>
2.1 Client Interface
</SectionTitle>
      <Paragraph position="0"> The client consists of a tagging tool interface written in Tk/Tcl, a cross-platform GUI scripting language.</Paragraph>
      <Paragraph position="1"> The interface, shown in Figure 1, is designed primarily to function as a tagging tool. It makes it easy for a user to mark and edit tags within multilingual texts. The tool reads and writes texts in SGML format. What distinguishes this tagging tool is that the manually tagged documents are passed back through the RoboTag server to build a tagging procedure in line with what the user is tagging.</Paragraph>
      <Paragraph position="2"> RoboTag can thus suggest what should be tagged after having received some training through observation of the user. The interface has been augmented with several displays that allow for a thorough investigation of the learned tagging procedure. These include graphical displays of the induced logic for tagging (cf. Figure 2), graphical displays of tagging accuracy (i.e. precision and recall), and the ability to inspect the examples from the texts that justify the induced tagging procedure.</Paragraph>
    </Section>
    <Section position="2" start_page="109" end_page="109" type="sub_section">
      <SectionTitle>
2.2 Server
</SectionTitle>
      <Paragraph position="0"> The RoboTag server performs the tag learning functions. It manages the training and testing files, extracts features, learns tagging procedures from tagged training texts, and applies them to unseen test texts. Each RoboTag client invokes its own instance of the server to handle its learning tasks. There can be multiple servers running on the same machine, each independently handling a single client's tasks. The RoboTag server receives commands from the client and returns learning results to it. During this dialogue, the server maintains intermediate results such as learned tagging procedures, texts that have been preprocessed for learning or evaluation, and state information for the current task. This includes the parameter settings for the learning algorithm, feature usage statistics, and preprocessor output. The client connects to the RoboTag server on a network using TCP/IP. There is a well-defined interface to the server so it can act as a learning engine for other text handling applications as well.</Paragraph>
      <Paragraph position="1"> Examples of server commands include:  1. Process a text for training or testing 2. Learn a classifier 1 for a tag 3. Evaluate a learned classifier on a text 4. Load a previously learned classifier or save one for future use 5. Change a learning parameter 6. Enable or disable a lexical feature 3 Learning to Tag  RoboTag must learn to place tags of varying types within the text. This means placing an appropriate SGML begin tag like &lt;PERSON&gt; prior to a person's name in the text and following the person's name with an SGML end tag like &lt;/PERSON&gt;. In this paper, in order to compare with other name tagging system results as reported in the Message Understanding Conference 6 (MUC-6) (Adv, 1995) and the Multilingual Entity Task (MET) (Def, 1996), we will be tagging people, places, and organizations. RoboTag provides for learning other types of tags as well.</Paragraph>
      <Paragraph position="2"> For each tag learning task, RoboTag builds two decision trees - one to predict begin tags and one to predict end tags. The results of these classifiers are then combined using a tag matching algorithm to yield complete tags of each type. A tag post-processing step resolves overlapping tags of different types using a prioritization scheme. Altogether, these make up the learned tagging procedure.</Paragraph>
      <Paragraph position="3"> In this section we describe RoboTag's decision tree learning, learning representation, learning parameters, the tag matching algorithm, and postprocessing. null</Paragraph>
    </Section>
    <Section position="3" start_page="109" end_page="110" type="sub_section">
      <SectionTitle>
3.1 Decision Tree Learning
</SectionTitle>
      <Paragraph position="0"> RoboTag learns decision tree classifiers that predict where tags of each type should begin and end in the text. The decision trees are trained from texts which have already been tagged manually.</Paragraph>
      <Paragraph position="1"> For learning the tag begin/end classifiers, we build decision trees using C4.5 (Quinlan, 1993). 2 C4.5 is used to learn decision tree classifiers which distinguish items of one class from another based on attributes of the training examples. These attributes are referred to as fealures. In using a decision tree for classification, each node indicates a feature test to be performed. The result of the test indicates which branch of the tree to take next. Ultimately, a leaf node of the tree is reached which specifies the classification result. To produce our decision trees, an information theoretic criteria called information 1RoboTag uses decision tree classifiers as part of the learned tagging procedure. They will be discussed in the next section.</Paragraph>
      <Paragraph position="2"> :C4.5 has been specially adapted to work directly on our preprocessor-produced data structures for more efficient operation rather than through data files which is the normal mode of operation.</Paragraph>
      <Paragraph position="3">  gain ratio is used to measure, at each step of tree construction, which feature test would best distinguish the examples on the basis of their class. The simplest classification problem involves learning to distinguish positive and negative examples of some concept. In our case, this means characterizing text positions where a tag should begin or end from text positions in which it should not.</Paragraph>
      <Paragraph position="4"> In order to extract learning features, the RoboTag server employs a preprocessor plug-in for each language it operates with. This preprocessor performs tokenization, word segmentation, morphological analysis, and lexical lookup as necessary for each language. The preprocessor produces output in a well-defined format across languages which the server uses in carrying out the learning. For instance, in processing Japanese, RoboTag may use features which are uniquely Japanese but may not be present in English, or vice versa. Table 1 shows some of the features used by RoboTag for learning.</Paragraph>
      <Paragraph position="5"> Figure 2 shows a screen shot of a portion of a decision tree trained to produce begin tags. One of the leaf nodes of the tree has been selected producing a window which shows person names in context as classified at the leaf. The last test in the branch prior to the shown window tests to see if the word prior to the current word is a person title (like &amp;quot;President,&amp;quot; &amp;quot;Secretary,!' or &amp;quot;Judge&amp;quot; when a decision is being made about whether to start a name with &amp;quot;Reagan,&amp;quot; &amp;quot;Robert,&amp;quot; or &amp;quot;Galloway&amp;quot; respectively). The screen shot goes on to show that if the previous word is not a person title, the system consults the 2nd word prior to the candidate begin tag to see if it is an organization noun prefix (such as &amp;quot;bank&amp;quot;, &amp;quot;board&amp;quot;, or &amp;quot;court&amp;quot;).</Paragraph>
    </Section>
    <Section position="4" start_page="110" end_page="111" type="sub_section">
      <SectionTitle>
3.2 Learning Representation
</SectionTitle>
      <Paragraph position="0"> C4.5 represents training examples as fixed length feature vectors with class labels. The goal is to learn to predict the class label from the other features in the vector. In our case this means learning to label tokens as begin or end tags from the token's lexical features. When RoboTag processes a tagged training text, it creates labeled feature vectors (called tuples) from the preprocessor data. One tuple is created for each token in the text, with the label TRUE or FALSE. If we are learning a tree to predict begin tags, the label is TRUE if the token is the first token inside an SGML tag we are trying to learn, and false otherwise. Similarly for end tags, the tuple is labeled TRUE if the token is the last token in a training tag and false otherwise.</Paragraph>
      <Paragraph position="1"> A single token usually does not contain enough information to decide whether it makes a good tag begin or end. Features from the surrounding tokens must be used as well. To create a tuple from a token, RoboTag collects the preprocessor features for the token as well as its immediate neighbors. How many neighboring tokens to use is determined by a radius parameter, as will be discussed in Section 3.3. A radius of 1 means the current token and both the previous and next tokens will be part of the tuple (1 token in each direction).</Paragraph>
      <Paragraph position="2"> To fill in the tuple values, RoboTag calls on the preprocessor as a feature extractor. Each position in the tuple's feature vector holds a value from a pre-processor field. RoboTag can use whatever lexical and token-type features that the preprocessor pro- null vides. In this way the preprocessor forms the background knowledge for the target language. Once the training texts have been represented as tuples, the learning process can begin.</Paragraph>
    </Section>
    <Section position="5" start_page="111" end_page="111" type="sub_section">
      <SectionTitle>
3.3 Learning Parameters
</SectionTitle>
      <Paragraph position="0"> There are several parameters to ~oboTag that affect tagging performance. Below are descriptions of some of the parameters. The Experiments section discusses the settings that produced the best results for each task.</Paragraph>
      <Paragraph position="1"> * Radius: This controls the number of tokens used to make each training tuple. A higher radius gives the decision tree algorithm more contextual information in deciding whether a token makes a good begin or end tag.</Paragraph>
      <Paragraph position="2"> * Sampling Ratio: Creating one tuple from each token in a text leads to many more negative training examples than positive, since only the tokens at the beginning (or end) of a tag generate positive training tuples. Every other token forms a negative example; a place where a tag did not begin or end. Too many negative examples can hurt learning accuracy by making the system too conservative. In some extreme cases, this can lead to decision trees that never predict a tag begin or end no matter what the input. The sampling ratio is the ratio of negative to positive examples to use for training. All of the positive examples are used, and negative examples are chosen randomly in accordance with this parameter. What is interesting about the Sampling Ratio is that it allows recall to be traded off for precision directly. Increasing the sampling ratio gives the learning system more examples of things that should not be tagged, reducing the number of false positives which increases precision. Making the decision trees more conservative in this way can also lower recall. Finding a balance of precision and recall by tuning this parameter is essential for best results.</Paragraph>
      <Paragraph position="3"> * Certainty Factor: This parameter affects decision tree pruning, a process used to simplify learned decision trees. Pruning helps reduce over-fitting of training data and improves classification accuracy on unseen examples. This parameter takes values between 0 and 1, with lower values meaning more pruning.</Paragraph>
    </Section>
    <Section position="6" start_page="111" end_page="112" type="sub_section">
      <SectionTitle>
3.4 The Matching Algorithm
</SectionTitle>
      <Paragraph position="0"> When tagging a text, RoboTag evaluates the learned decision tree classifiers on the new text to produce a list of potential begin and end tags for each tag type. These lists are produced independently, and there may be many ways to pair begin and end tags together. For each begin tag found there may be several plausible end tags that could pair with it (and vice versa). The matching algorithm must decide the best possible pairing of the begin and end tags for each tag type.</Paragraph>
      <Paragraph position="1"> Each potential begin and end tag produced by the decision tree also has a confidence rating, a number between 0 and 1 estimating the chance of correct  classification. A scoring function is used to evaluate the relative merits of different sets of pairings. In addition to the confidence ratings for the tags, the scoring function makes use of statistical measures like the mean and standard deviation of the tag length in the training examples. The mateher can be biased to prefer tags longer, shorter, or closest to this mean length.</Paragraph>
      <Paragraph position="2"> Considering all possible begin/end tag pairings quickly becomes intractable as the number of potentially interacting tags increases. Therefore, the first step in the matching process seeks to divide the text up into a set of non-interacting sections.</Paragraph>
      <Paragraph position="3"> Each time a begin/end pair is made, any begin or end tags between the pair cannot be used (or the resulting tags would overlap). This means that each pair could preclude other possible matches. The text is divided into sections by observing which tags can possibly affect other tags. The mean distance, standard deviation, and match threshold determine the distance interval within which the matcher searches for tag pairs. If two tags are far enough apart, they can be matched independently without fear of one pairing precluding another. These boundary points in the text are found first. Then each independent section is searched separately for tag pairings. The best pairing set for a section maximizes the sum of the scores for each pair in the section.</Paragraph>
      <Paragraph position="4"> There are three parts to the scoring function for a pair. The first is the confidence with which the begin tag tree classifies the token as a good begin tag. The second component is the end tag tree confidence. The last part is a distance score, which is calculated from the tag length, mean distance, and match length preference. Each of the three length preferences (longest, shortest, or closest to mean distance) uses an appropriate bias to the way in which these inputs are combined.</Paragraph>
    </Section>
    <Section position="7" start_page="112" end_page="112" type="sub_section">
      <SectionTitle>
3.5 Tag Overlap Resolution
</SectionTitle>
      <Paragraph position="0"> Because the tag matching algorithm only ensures non-overlapping tags within each tag type, it is possible to have cases of embedded tags of different types (like tagging &amp;quot;Boston&amp;quot; as a location within the tag for &amp;quot;Boston Edison Company&amp;quot;). To resolve these cases, RoboTag uses a static tag priority scheme. For proper noun tagging the priority order from highest to lowest is person, entity, place. We do not currently learn the tag priorities although this is a logical extension to the learning technique.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="112" end_page="112" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> We set up experiments on English and Japanese name tagging using the same texts that were used for the named entity task of the MUC-6 and MET competitions. In this way, we can most easily compare RoboTag performance against a variety of other name tagging systems.</Paragraph>
    <Section position="1" start_page="112" end_page="112" type="sub_section">
      <SectionTitle>
4.1 English Results
</SectionTitle>
      <Paragraph position="0"> For English, the MUC-6 Wall Street Journal corpus was used. RoboTag was trained with 300 training texts and proceeded to automatically tag the 30 blind test texts. The scores on the test set are shown in the Table 2. For each tag type, the table gives the total number of tags of that type present in the training and testing sets and the recall, precision, and F-Measure 3 as measured on the test set.</Paragraph>
      <Paragraph position="1"> Overall totals are given at the bottom of the table.</Paragraph>
      <Paragraph position="2"> The best system in the MUC-6 named entity task, using hand-coded rules, returned F-Measures of 98.50 for person, 96.96 for place, and 92.48 for entity as shown in Table 3 (Krupka, 1995) .</Paragraph>
      <Paragraph position="3"> We found that RoboTag's best English results were obtained with a sampling ratio of 10, a radius of 2, and certainty factors of 0.75 for pruning for all the tag types.</Paragraph>
    </Section>
    <Section position="2" start_page="112" end_page="112" type="sub_section">
      <SectionTitle>
4.2 Japanese Results
</SectionTitle>
      <Paragraph position="0"> In Japanese, the MET corpus of press-conference related texts from Kyodo News Agency was used in the experiment. A training set of 300 texts was used with a blind test set of 99. RohoTag scores on the test set are reported in Table 4.</Paragraph>
      <Paragraph position="1"> The best system on the MET task, utilizing hand-coded rules, produced F-Measures of 95.37 for person, 93.43 for place, and 86.90 for entity (cf., Table 5) while the second place system posted 78.54 for person, 84.00 for place, and 79.25 for entity. RoboTag would have ranked 2nd among the MET systems on the Japanese entity task.</Paragraph>
      <Paragraph position="2"> Sampling ratios for our best Japanese results were 35, 15 and 10 for person, place, and entity. For all three tags we used a radius of 2 and certainty factors of 0.65 for pruning.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="112" end_page="114" type="metho">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> Vilain and Day (Vilain and Day, 1996) report on an approach which learns and applies rule sequences for the name tagging task (based on Eric Brill's rule sequence work (Brill, 1993)). It uses a greedy algorithm to generate and apply rules, incrementally refining the target concept. They report their best precision/recall results for machine-learned rules on the MUC-6 task with equivalent F-Measures 4 of 78.50 ZF-measure is calculated by:</Paragraph>
    <Paragraph position="2"> where P is precision, R is recall, and fl is the relative importance given to recall over precision. In this case, f = 1.0 as used in MUC-6 and MET.</Paragraph>
    <Paragraph position="3"> 4The F-Measure formula they report seems to be in error and they reported with a fl of 0.8. For comparison, we used the standard F-Measure formula with a fl of 1 as reported above.</Paragraph>
    <Paragraph position="4">  for person, 74.35 for place, and 82.81 for entity. Our English score is significantly better, especially for the person and place tasks. Because their Japanese results were not reported we cannot compare our Japanese performance.</Paragraph>
    <Paragraph position="5"> Gallippi (Gallippi, 1996) presents an approach to tag classification using decision trees. Hand-coded rules are employed to delimit proper nouns within the text. Each proper noun is then classified into an appropriate type (e.g., person, entity, place) using decision trees (ID3), an easier task than also learning to place tags. It is also less general to rely on hand coded rules for a significant part of the tagging task. Bikel et al. (Bikel et al., 1997) report on Nymble, an HMM-based name tagging system operating in English and Spanish. Nymble performs well, turning in F-measures of 90 and 93 respectively in Spanish and English on the MUC-6 task. These scores were achieved using 450,000 words of tagged text, 3 times the size of the 150,000 word training set used for the RoboTag experimental results reported here.</Paragraph>
    <Paragraph position="6"> Bikel reports that moving from 100,000 to 450,000 training texts yielded a 1-2% improvement. A direct comparison with Nymble on particular tag types is not possible because only the overall F-measure is reported for the MUC-6 task. In these experiments we only trained and tested on person, place, and entity tags. If we use RoboTag with our hand-coded rules for dates and number, the overall F-measure on the MUC-6 English task is 90.1.</Paragraph>
  </Section>
  <Section position="6" start_page="114" end_page="114" type="metho">
    <SectionTitle>
6 Future Directions
</SectionTitle>
    <Paragraph position="0"> There are a number of ways in which RoboTag performance could be improved. Perhaps the most obvious enhancement to our representation involves giving the learning system the actual text of the token in the feature vector. Currently, each tuple contains the preprocessor information for a window of tokens in the text, but the actual token text is not available to the learning. The decision trees can refer to classes of words by their lexicon features, but not individual words themselves. Adding this capability would allow performance improvement especially in cases where lexicon data is sparse. Using words as features is related to the idea of automatic word list modification. This would allow RoboTag to actually reconfigure its knowledge base of word lists and propose new features. This is one way that RoboTag could adapt to new extraction domains.</Paragraph>
    <Paragraph position="1"> Unlike some of the name tagging systems RoboTag is being compared to, RoboTag has no alias generation facility. By generating an alias from a recognized name, a system can scan for that alias (e.g., a company's acronym or an individual's first name) in order to improve the likelihood of identifying it. It would be straightforward to add such an alias capability to RoboTag.</Paragraph>
    <Paragraph position="2"> Another accuracy enhancement is to improve the tag matching algorithm. RoboTag does not currently use the lexical features of the tokens during the match process. The scoring function takes into account tag length and decision tree confidence values only. Many of the errors RoboTag makes come from the matching algorithm where the decision trees correctly predict tag begins and ends but the wrong tag pairings are chosen. Making the matching algorithm sensitive to lexical features should help correct this.</Paragraph>
    <Paragraph position="3"> Although, for comparison with other systems, we have presented traditional batch-mode learning results here, one of RoboTag's strengths is in its interactivity. We believe that allowing the user to give direct feedback to the learning system is key to rapidly addressing new extraction tasks. We plan to do further experiments which address how the use of deg this directed feedback can result in rapidly learned tagging procedures utilizing fewer tagged texts.</Paragraph>
    <Paragraph position="4"> Finally, our experiments have focused on proper name tagging, but RoboTag is not limited to this.</Paragraph>
    <Paragraph position="5"> We are planning to explore additional tagging tasks besides names in multiple languages such as Chinese, Thai, Spanish as well as English and Japanese.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML