File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0312_intro.xml

Size: 3,201 bytes

Last Modified: 2025-10-06 14:06:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0312">
  <Title>Learning to Tag Multilingual Texts Through Observation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The ability to tag proper names such as organization, person, and place names in multilingual texts has great value for tasks like information extraction, information retrieval, and machine translation (Aone, Charocopos, and Gorlinsky, 1997).</Paragraph>
    <Paragraph position="1"> The most successful systems currently rely on hand-coded patterns to identify the desired names in texts (Adv, 1995; Def, 1996). This approach achieves its best performance using different hand-coded rule sets for each language/domain pair. Several of these systems have improved in ease of use, particularly in the speed of the write pattern/evaluate performance/refine pattern loop which plays the central role in the development process.</Paragraph>
    <Paragraph position="2"> One approach in name tagging is to assist in the creation of hand-coded rules by making it easier for the developer to mark parts of the name and its surrounding context to include in the pattern. This boosts productivity in hand-coding rules but still requires a significant amount of effort by the developer to identify key parts of the pattern. A step up from this is to determine how to generalize the rule so that it is more broadly applicable or to suggest to the developer which parts of the context have highvalue for inclusion in the pattern. Nevertheless, a skilled developer with a thorough knowledge of the particular pattern language is still essential.</Paragraph>
    <Paragraph position="3"> Our goal in developing RoboTag was to make it possible for an end-user to build a tagging system simply by giving examples of what should be tagged, rather than requiring the user to understand a pattern language. RoboTag uses a machine learning algorithm to discover features that the training examples have in common. This knowledge is used to construct a tagging procedure that can find additional, previously unseen examples for extraction.</Paragraph>
    <Paragraph position="4"> It was important (for the confidence of our users) that the tagging procedure induced by the system be easily explained in terms of how it makes its decisions. This was one of the factors that led us to consider using decision trees (Quinlan, 1993) as a key component of the system. Other potential learning or statistical approaches for a problem like this (e.g., Neural Nets or Hidden Markov Models) did not offer this advantage. The RoboTag system is particularly well instrumented for exploration of different learning system parameters and inspection of the induced tagging procedures.</Paragraph>
    <Paragraph position="5"> First, we discuss the overall architecture for the l~oboTag system. Next, we focus on the machine learning algorithm employed for tag learning. We then present experimental results which compare RoboTag to both human-tagged keys and to the best hand-coded rule systems. Lastly, related work and future directions are discussed.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML