File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0722_intro.xml

Size: 2,435 bytes

Last Modified: 2025-10-06 14:01:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0722">
  <Title>Minimal Commitment and Full Lexical Disambiguation: Balancing Rules and Hidden Markov Models</Title>
  <Section position="3" start_page="0" end_page="111" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"> Before starting to develop our own MS tagger, some preliminary studies on general available systems were conducted; if these studies go far beyong the scope of this paper, we would like to report on the main conclusions. Both statistical taggers (HMM) and constraint-based systems were assessed. Two guidelines were framing the study: performances and minimal commitment. We call minimal commitment 2 the property of a system, which does not attempt to solve ambiguities when it is not likely to solve it well! Such property seems important for IR purposes, where we might prefer noise rather than silence in the recall process. However, it must remain optional, as some other tasks (such as the NP extraction, or the phrase chunking (Abney, 1991)) may need a full disambiguation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Data-driven tools
</SectionTitle>
      <Paragraph position="0"> We adapted the output of our morphological analyser for tagging purposes (Bouillon et al., 1999). We trained and wrote manual biases for an HMM tagger, but results were never far above 97% (i.e. about 3% of error); with an average ambiguity level of around 16%, it means that almost 20% of the ambiguities were attributed a wrong tag! We attempted to set a confidence threshold, so that for similarly weighted transitions, the system would keep the ambiguity, as in (Weischedel and al., 1993), but results were not satisfying.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="111" type="sub_section">
      <SectionTitle>
2.2 Constraint-based systems
</SectionTitle>
      <Paragraph position="0"> We also looked at more powerful principle-based parsers, and tests were conducted on  FIPSTAG 3 (a Government and Binding chart-parser (Wehrli, 1992)). Although this system performed well on general texts, with about 0.7% of errors, its results on medical texts were about the same as stochastic taggers. As we could not adapt our medical morphological analyser on this very integrated system, it had to cope with several unknown words.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML