File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0105_intro.xml

Size: 7,666 bytes

Last Modified: 2025-10-06 14:03:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0105">
  <Title>Web-based Interfaces for Natural Language Processing Tools</Title>
  <Section position="2" start_page="0" end_page="30" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The Problem: Natural language processing (NLP) technology is relevant to non-computer scientists: our classes are populated by students from neuroscience, speech pathology, linguistics, teaching of foreign languages, health informatics, etc. To effectively use NLP technology, it is helpful understand, at some level, how it works. Hands-on experimentation is an effective method for gaining such understanding. Unfortunately, to be able to experiment, non-computer scientists often need to acquire some programming skills and knowledge of the Unix operating system. This can be time consuming and tedious and can distract students from their central goal of understanding how a technology works and how best to employ it for their interests.</Paragraph>
    <Paragraph position="1"> In addition, getting a technology to run on a set lab machines can be problematic: the programs may be developed for a different platform, e.g., a program was developed for Linux but the lab machines run MSWindows. Another hurdle is that machine administrators are often loath to install applications that they perceive as non-standard. Finally, lab times can be restrictive and thus it is preferable to enable students to use computers to which they have easy access.</Paragraph>
    <Paragraph position="2"> Our Solution: We built web interfaces to many core NLP modules. These interfaces not only allow students to use a technology but also allow students to modify and extend the technology. This enables experimentation. We used server-side scripting languages to build such web interfaces. These programs take input from a web browser, feed it to the technology in question, gather the output from the technology and send it back to the browser for display to the student. Access to web browsers is nearly ubiquitous and thus the issue of lab access is side-stepped. Finally, the core technology need only run on the web server platform. Many instructors have access to web servers running on different platforms and, in general, administering a web server is easier than maintaining lab machines.</Paragraph>
    <Paragraph position="3"> An Example: Finite state transduction is a core NLP technology and one that students need to understand. The Cass partial parsing system (Abney, 1997) makes use of a cascade of FSTs. To use this system, a student creates a grammar. This grammar is compiled and then applied to sentences provided  by the student. Prior to our work, the only interface to Cass involved the Unix command line shell. Figure 3 shows an example session with the command line interface. It exemplifies the sort of interface that users must master in order to work with current human language technology.</Paragraph>
    <Paragraph position="4">  A web-based interface hides many of the details, see Figure 1 and Figure 2. For example, the use of an ASCII-based text editor such as emacs become unnecessary. In addition, the student does not need to remembering flags such as -v -g and does not need to know how to use Unix pipes, |, and output redirection, &gt;. None of this knowledge is terribly difficult but the amount accumulates quickly and such information does not help the student un- null 1999).</Paragraph>
    <Paragraph position="5"> These interfaces have been used in an introduction to computational linguistics course and an introduction to creating and using corpora course. Prior to the interface construction, no hands-on lab assignments were given; instead all assignments were pencil and paper. The NLP technologies listed above were chosen because they fit into the material of the course and because of their availability.</Paragraph>
    <Section position="1" start_page="29" end_page="29" type="sub_section">
      <SectionTitle>
2.1 Allowing the student to process input
</SectionTitle>
      <Paragraph position="0"> The simplest type of interface allows students to provide input and displays corresponding output. All the interfaces above provide this ability. They all start with HTML forms to collect input. In the simplest case, PHP scripts process the forms, placing input into files and then system calls are made to run the NLP technology. Finally, output files are wrapped in HTML and displayed to the user. The basic PHP program remains largely unchanged from one NLP technology to the next. In most cases, it suffices to use the server file system to pass data back and forth to the NLP program -- PHP provides primitives for creating and removing unique temporary files. In only one case was it necessary to use a semaphore on a hard-coded filename. We also experimented with Java server pages and Perl CGI scripts instead of PHP.</Paragraph>
    </Section>
    <Section position="2" start_page="29" end_page="30" type="sub_section">
      <SectionTitle>
2.2 Allowing the student to modify knowledge
resources
</SectionTitle>
      <Paragraph position="0"> The web interfaces to the Cass parser, Gsearch, and MontyTagger allow the student to provide their corresponding knowledge base. For Cass and Gsearch, an additional text box is provided for the grammars they require. The rule sequence and lexicon that the MontyTagger uses can be large and thus unwieldy for a textarea form input element. We solved the problem by preloading the textareas with a &amp;quot;standard&amp;quot; rule sequence and lexicon which the student can then modify. We also provided the ability to upload the rule sequences and lexicon as files. One problem with the file upload method is that it assume that the students can generate ASCII-only files with  the appropriate line break character. This assumption is often false.</Paragraph>
      <Paragraph position="1"> An additional problem with allowing students to modify knowledge resources is providing useful feedback when these student-provided resources contain syntax or other types of errors. At this point we simply capture the stderr output of the program and display it.</Paragraph>
      <Paragraph position="2"> Finally, with some systems such as Spew (Schwartz, 1999), and The Dada Engine (Bulhak, 1996), allowing web-based specification of knowledge bases amounts to allowing the student to execute arbitrary code on the server machine, an obvious security problem.</Paragraph>
    </Section>
    <Section position="3" start_page="30" end_page="30" type="sub_section">
      <SectionTitle>
2.3 Allowing the student to examine internal
</SectionTitle>
      <Paragraph position="0"> system processing Displaying system output with a web interface is relatively easy; however, showing the internal workings of a system is more challenging with a web interface. At this point, we have only displayed traces of steps of an algorithm. For example, the NLTK context-free grammar parser interface provides a trace of the steps of the parsing algorithm. One possible solution would be to generate Flash code to animate a system's processing.</Paragraph>
    </Section>
    <Section position="4" start_page="30" end_page="30" type="sub_section">
      <SectionTitle>
2.4 Availability
</SectionTitle>
      <Paragraph position="0"> The web pages are currently available at que.infoscience.uiowa.edu/~light/classes/compLing/ However, it is not our intent to provide server cycles for the community but rather to provide the PHP scripts open source so that others can run the interfaces on their own servers. An instructor at another university has already made use of our code.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML