File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1082_intro.xml

Size: 3,642 bytes

Last Modified: 2025-10-06 14:06:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1082">
  <Title>A flexible distributed architecture for NLP system development and use</Title>
  <Section position="4" start_page="0" end_page="615" type="intro">
    <SectionTitle>
2 Architecture
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> The central component of TEA is a frame-based data model (F) (see Fig.2). In this model, a document is a list of frames (Rich and Knight, 1991) for recording the properties about each token in the text (example in Fig.2). A typical TE system converts a document into F with an input plug-in. The information required at the output determines the set of process plug-ins to activate. These use the information in F to add annotations to F. Their dependencies are automatically resolved by TEA. System behavior is controlled by adjusting the configurable param- null based data model  This type of architecture has been implemented, classically, as a 'blackboard' system such as Hearsay-II (Erman, 1980), where inter-module communication takes place through a shared knowledge structure; or as a 'messagepassing' system where the modules communicate directly. Our architecture is similar to blackboard systems. However, the purpose of F (the shared knowledge structure in TEA) is to provide a single extendable data structure for annotating text. It also defines a standard interface for inter-module communication, thus, improves system integration and ease of software reuse.</Paragraph>
    <Section position="1" start_page="615" end_page="615" type="sub_section">
      <SectionTitle>
2.1 Voting mechanism
</SectionTitle>
      <Paragraph position="0"> A feature that distinguishes TEA from similar systems is its use of voting mechanisms for system integration. Our approach has two distinct but uniformly treated applications. First, for any type of language analysis, different techniques ti will return successful results P(r) on different subsets of the problem space. Thus combining the outputs P(rlti) from several ti should give a result more accurate than any one in isolation. This has been demonstrated in several systems (e.g. Choi (1999a); van Halteren et al. (1998); Brill and Wu (1998); Veronis and Ide (1991)). Our architecture currently offers two types of voting mechanisms: weighted average (Eq.1) and weighted maximum (Eq.2). A Bayesian classifier (Weiss and Kulikowski, 1991) based weight estimation algorithm (Eq.3) is included for constructing adaptive voting mechanisms. null</Paragraph>
      <Paragraph position="2"> Second, different types of analysis a/ will provide different information about a problem, hence, a solution is improved by combining several ai. For telegraphic text compression, we estimate E(w), the information value of a word, based on a wide range of different information sources (Fig.2.1 shows a subset of our working system). The output of each ai are combined by a voting mechanism to form a single measure.</Paragraph>
      <Paragraph position="4"> telegraphic text compression.</Paragraph>
      <Paragraph position="5"> Thus, for example, if our system encounters the phrase 'President Clinton', both lexical lookup and automatic tagging will agree that 'President' is a noun. Nouns are generally informative, so should be retained in the compressed output text. However, grammar-based syntactic analysis gives a lower weighting to the first noun of a noun-noun construction, and bigram analysis tells us that 'President Clinton' is a common word pair. These two modules overrule the simple POS value, and 'President Clinton' is reduced to 'Clinton'.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML