File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1082_metho.xml

Size: 4,983 bytes

Last Modified: 2025-10-06 14:15:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1082">
  <Title>A flexible distributed architecture for NLP system development and use</Title>
  <Section position="5" start_page="615" end_page="616" type="metho">
    <SectionTitle>
3 Related work
</SectionTitle>
    <Paragraph position="0"> Current trends in the development of reusable TE tools are best represented by the Edinburgh tools (LTGT) 2 (LTG, 1999) and GATE 3 (Cunningham et al., 1995). Like TEA, both LTGT and GATE are frameworks for TE.</Paragraph>
    <Paragraph position="1"> LTGT adopts the pipeline architecture for module integration. For processing, a text document is converted into SGML format. Processing modules are then applied to the SGML file sequentially. Annotations are accumulated as mark-up tags in the text. The architecture is simple to understand, robust and future proof.</Paragraph>
    <Paragraph position="2"> The SGML/XML standard is well developed and supported by the community. This improves the reusability of the tools. However,  tile architecture encourages tool development rather than reuse of existing TE components.</Paragraph>
    <Paragraph position="3"> GATE is based on an object-oriented data model (similar to the TIPSTER architecture (Grishman, 1997)). Modules communicate by reading and writing information to and from a central database. Unlike LTGT, both GATE and TEA are designed to encourage software reuse. Existing TE tools are easily incorporated with Tcl wrapper scripts and Java interfaces, respectively. null Features that distinguish LTCT, GATE and TEA are the configuration methods, portability and motivation. Users of LTGT write shell scripts to define a system (as a chain of LTGT components). With GATE, a system is constructed manually by wiring TE components together using the graphical interface. TEA assumes the user knows nothing but the available input and required output. The appropriate set of plug-ins are automatically activated. Module selection can be manually configured by adjusting the parameters of the voting mechanisms. This ensures a TE system is accessible to complete novices ~,,-I yet has sufficient control for developers.</Paragraph>
    <Paragraph position="4"> LTGT and GATE are both open-source C applications. They can be recompiled for many platforms. TEA is a Java application. It can run directly (without compilation) on any Java supported systems. However, applications constructed with the current release of GATE and TEA are less portable than those produced with LTGT. GATE and TEA encourage reuse of existing components, not all of which are platform independent 4. We believe this is a worth while trade off since it allows developers to construct prototypes with components that are only available as separate applications. Native tools can be developed incrementally.</Paragraph>
  </Section>
  <Section position="6" start_page="616" end_page="616" type="metho">
    <SectionTitle>
4 An example
</SectionTitle>
    <Paragraph position="0"> Our application is telegraphic text compression.</Paragraph>
    <Paragraph position="1"> The examples were generated with a subset of our working system using a section of the book HAL's legacy (Stork, 1997) as test data. First, we use different compression techniques to generate the examples in Fig.4. This was done by simply adjusting a parameter of an output plug4This is not a problem for LTGT since the architecture does not encourage component reuse.</Paragraph>
    <Paragraph position="2"> in. It is clear that the output is inadequate for rapid text skimming. To improve the system, the three measures were combine with an unweighted voting mechanism. Fig.4 presents two levels of compression using the new measure.</Paragraph>
  </Section>
  <Section position="7" start_page="616" end_page="617" type="metho">
    <SectionTitle>
5 Conclusions and future directions
</SectionTitle>
    <Paragraph position="0"> We have described an interesting architecture (TEA) for developing platform independent text engineering applications. Product delivery, configuration and development are made simple by the self-organizing architecture and variable interface. The use of voting mechanisms for integrating discrete modules is original. Its motivation is well supported.</Paragraph>
    <Paragraph position="1"> The current implementation of TEA is geared towards token analysis. We plan to extend the data model to cater for structural annotations. The tool set for TEA is constantly being extended, recent additions include a prototype symbolic classifier, shallow parser (Choi, Forthcoming), sentence segmentation algorithm (Reynar and Ratnaparkhi, 1997) and a POS tagger (Ratnaparkhi, 1996). Other adaptive voting mechanisms are to be investigated. Future release of TEA will support concurrent execution (distributed processing) over a network. Finally, we plan to investigate means of improving system integration and module organization, e.g. annotation, module and tag set compatibility.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML