File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/86/p86-1020_concl.xml

Size: 2,417 bytes

Last Modified: 2025-10-06 13:56:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="P86-1020">
  <Title>BULK PROCESSING OF TEXT ON A MASSIVELY PARALLEL COMPUTER</Title>
  <Section position="11" start_page="134" end_page="134" type="concl">
    <SectionTitle>
9 Further Applications of Scan
</SectionTitle>
    <Paragraph position="0"> to Bulk Processing of Text The scan algorithm has many other applications in text processing. For example, it can be used to lexically parse text in the form of 1 character per processor into the form of 1 word per processor. Syntactic rules could rapidly determine which characters begin and end words. Scan could then be used to enumeral:e how many words there are, and what position each character occupies within its word. The processors could then use this information to send their characters to the word-processor at which they belong. Each word-processor would receive the characters making up its word and would assemble them into a string.</Paragraph>
    <Paragraph position="1"> Another application of scan, suggested by Guy L.</Paragraph>
    <Paragraph position="2"> Steele, Jr., would be as a regular expression parser, or lexer. Each word in the CM is viewed as a transition matrix from one set of finite automata states to another set. Scan is used, along with an F which would have the effect of composing transition matrices, to apply a finite automata to many sentences in parallel. After this application of scan, the last word in each sentence contains the state that a finite automata parsing the string would reach. The lexer's state transition function F would be associative, since string concatenation is associative, and the purpose of a lexer is to discover which particular strings/tokens were concatenated to create a given string/file.</Paragraph>
    <Paragraph position="3"> The experience of actually implementing parallel natural language programs on real hardware has clarified which operations and programming techniques are the most efficient and useful. Programs that build upon general algorithms such as sort and scan are far, easier to debug than programs that attempt a direct assault on a problem (i.e. the hashing scheme discussed earlier; or a slow, hand-coded regular expression parser that I implemented). Despite their ease of implementation, programs based upon generally useful submodules often are more efficient than specialized, hand-coded programs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML