File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/m91-1031_metho.xml
Size: 11,055 bytes
Last Modified: 2025-10-06 14:12:50
<?xml version="1.0" standalone="yes"?> <Paper uid="M91-1031"> <Title>HEAD &quot;SHINING PATH&quot; ; DETERMINER &quot;A&quot; ; NUMBER SINGULAR ; PERSON THIRD ;</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> SYNCHRONETICS : DESCRIPTION OF THE SYNCHRONETICS SYSTE M USED FOR MUC- 3 </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="207" type="metho"> <SectionTitle> PROJECT BACKGROUN D </SectionTitle> <Paragraph position="0"> Synchronetics, Inc ., is a startup company in Baltimore founded to develop text processing software product s for the commercial and Government sectors . The company, consisting of 7 people, was founded in 1989.</Paragraph> <Paragraph position="1"> Synchronetics had two natural language processing software development projects prior to participatio n in MUC-3: an off-the-shelf parsing utility called NL-Builder ; and a text retrieval system prototype calle d Text-SR, which was developed under an SBIR contract for Wright Patterson Air Force Base .</Paragraph> <Paragraph position="2"> Neither of these projects alone was sufficient to handle the MUC-3 problem . Synchronetics was therefore prompted to look elsewhere for additional support . Members that participated on the `Synchronetics Team ' on a volunteer basis' were James Mayfield of the University of Maryland, Baltimore County (technical lea d and template generation software), Kenneth Litkowski of CL Research of Gaithersburg Md . (software for building the lexicon from a machine-readable dictionary), and Mark Wilson, Roy Cutts, and Bonnie Blade s (implementation of the semantic net and phrase and sentence interpretation) .</Paragraph> <Paragraph position="3"> The system was not integrated at the February meeting . At that time static cases were being passed by hand from one processing stage to another . The complete system was fully integrated and running on 10 0 texts only three weeks before the final submission was due . Because of the relative youth of the system , little time was spent fine-tuning the algorithms and knowledge bases with the 1300 text development corpus .</Paragraph> <Paragraph position="4"> Therefore, we feel that the final results demonstrate the feasibility, but not the potential performance, o f our approach .</Paragraph> <Paragraph position="5"> We estimate that we spent 9 person-months on the development of our MUC-3 system, and that w e made use of about 9 person-months of work that was done before we initiated the project . The bulk of the latter time was spent in the development of the NL-Builder product, and in the development of a previou s LISP-based version of the KODIAK semantic net representation language .</Paragraph> <Paragraph position="6"> 'Synchronetics participation was funded for travel and incidental expenses only--all other labor was voluntary.</Paragraph> </Section> <Section position="3" start_page="207" end_page="208" type="metho"> <SectionTitle> ARCHITECTURE </SectionTitle> <Paragraph position="0"> The Synchronetics system architecture has been strongly influenced by the composition of the Synchronetic s team . With team members located at six different sites spread across Maryland, we needed an architectur e comprising components that could be developed separately and tested individually .</Paragraph> <Paragraph position="1"> The Synchronetics system consists of five 2 separate modules that communicate via a semantic net representation language in a pipelined fashion . Each module is a stand-alone program that is written in C an d operates on a variety of platforms . Figure 1 depicts this architecture . The five modules are : 1. A phrase parse r 2. A phrase interpreter 3. A sentence parser 4. A sentence interpreter 5. A template generator A semantic net representation language (a variant of the KODIAK language) was developed for use with thi s project. World knowledge is represented as a single net that is made available to each of the components . In addition, each component passes on to its successor a network description of the text, including all inference s that have been made about the text .</Paragraph> <Paragraph position="2"> Parsers It was important to us both to maintain the pipelined architecture (to facilitate the development of differen t parts of the system at different sites), and to allow feedback from the semantic components of the syste m to the syntactic components . Therefore, we split the syntactic analysis component into two pieces : a phras e parser and a sentence parser. The phrase parser is responsible for breaking a text up into words, looking those words up in the dictionary, grouping the words into phrases, and constructing parse trees for thos e phrases . The sentence parser is a second parser that is responsible for constructing a single parse tree for eac h sentence in the message . The input to the sentence parser is a sequence of tokens representing the phrase s of a sentence as produced by the phrase interpreter . These processes are all performed by the Synchronetics NL-Builder product .</Paragraph> <Paragraph position="3"> NL-Builder is a `programmable' parser. That is, the user may enter and modify the grammar, semantic interpretation rules and morphology, as well as import a dictionary . NL-Builder was used to provide bot h dictionary tools, and the two parsers. The significant components of NL-Builder are: * DICTIONARY - The NL-Builder dictionary utilities include morphology rules that are modifiable b y the user, a B-tree compiler, and user-specifiable features on the lexical categories . Our initial dictionary was an available NL-Builder dictionary with 4000 words in it . It was not matche d to the domain, but it contained many common English words . This initial dictionary also included morphological rules, which were left largely unchanged . The dictionary was extended using utilities for dictionary building that are packaged with NL-Builder ; these utilities were run on the MUC-3 development corpus . This extension added many domain-specific terms and many slot fill terms and their synonyms . Ken Litkowski then built a system to extract information from the Proximity Linguistic System and enter it into the dictionary by comparing the dictionary with the words in th e MUC-3 test corpus . The linking of relevant word senses in the dictionary to the appropriate nodes o f the semantic network was done manually .</Paragraph> <Paragraph position="4"> The final dictionary consisted of approximately 10,000 word senses and about 30 morphological an d tokenization rules . The dictionary was compiled into a b-tree for fast access .</Paragraph> <Paragraph position="5"> * TOKENIZER - A tokenizer module (which comes as part of the NL-Builder system) is used for markin g text into tokens and identifying patterns that may not be in the dictionary (numbers, proper nouns , etc.) .</Paragraph> <Paragraph position="6"> * PARSER - The parser is an extended ATN . It allows a user-specified recursive network state definition with augmented conditions and actions on arcs. In addition, it allows look-ahead tests to prune search paths. Here is an example of a portion of the ATN that handles passive verbs :</Paragraph> </Section> <Section position="4" start_page="208" end_page="211" type="metho"> <SectionTitle> ARC S.PASSIVE FROM A TO END MATCH VERB CONDITIONS VERB: FORM .* == VERB:PASTYARTICIPLE ; </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> The parser produces a `syntactic net' that is stored in the same format as the semantic net . Here is a portion of the syntactic net that is produced by the phrase parser for the sentence (from message 9 9 of the tstl corpus) : `Some 3 years ago two Marines died following a Shining Path bombing of a market used b y</Paragraph> <Section position="1" start_page="209" end_page="211" type="sub_section"> <SectionTitle> Semantic Interpreters </SectionTitle> <Paragraph position="0"> The phrase interpreter is responsible for building a semantic interpretation of each of the phrases discovere d by the phrase parser . This process entails mapping from the words in the phrases to the corresponding node s in the semantic net, then attaching these nodes to each other according to the meaning of the phrase . The sentence interpreter is responsible for building a semantic interpretation of the entire sentence . It uses both the output of the phrase interpreter, and the output of the sentence parser .</Paragraph> <Paragraph position="1"> Our aim with the semantic interpreters was to make them robust enough to find appropriate connection s between the selected nodes in the semantic net even if no explicit semantic interpretation rules are availabl e for the syntactic structure being interpreted. Thus the basis for semantic interpretation is a spreading activation process. If there is a semantic interpretation rule for a given phrase, then that rule is used t o connect the nodes in the semantic net representing the components of the phrase . If, however, there is no semantic interpretation rule, spreading activation is used to find plausible connections between concepts .</Paragraph> <Paragraph position="2"> To continue our example, here is a portion of the phrase interpreter's output for the bombing sentence .</Paragraph> <Paragraph position="3"> Notice that the phrase interpreter has established mappings (via 'SI,' or Semantic Interpretation, links ) between the syntactic nodes produced by the phrase parser, and concept nodes in the semantic net : The template generator is responsible for determining which actions that have been represented in th e semantic net should lead to the generation of a template, and for the creation of those templates . It begins b y examining each potentially reportable action in the semantic net (such as the children of KIDNAP_ACTION , the children of BOMB_ACTION, etc .) . For each such action, it tries to determine whether the action fall s within the parameters of a reportable action as laid out in the MUC-3 specifications . Since the long-term knowledge stored in the semantic net is currently quite limited, the system usually defaults to reporting th e action . Once an action to report has been selected, a template is created for the action, and its slots ar e filled one at a time . In most cases, slots are filled by starting from the node representing the action bein g reported, and following a path through the semantic net to another node that stands in the desired relatio n to the action node . Links are maintained from the syntactic world to the semantic world, so that the syste m can trace back from a node in the semantic net to the words that caused the creation of that node. For the MUC-3 final test, we attempted to fill only slots 0-7 and slot 11 .</Paragraph> <Paragraph position="4"> Here is the template that is generated for the bombing sentence :</Paragraph> </Section> </Section> class="xml-element"></Paper>