SYNCHRONETICS :
DESCRIPTION OF THE SYNCHRONETICS SYSTE M
USED FOR MUC- 3
James Mayfield
Computer Science Dept .
University of Maryland, Baltimore Count y
Baltimore, MD 21228-539 8
mayfield0umbc3.umbc.edu
(301) 455-3099
Edwin Addison
Synchronetics, Inc .
3700 I{oppers St ., Suite 131
Baltimore MD 2122 7
76366.1115©compuserve. com
(301) 644-2400
PROJECT BACKGROUN D
Synchronetics, Inc ., is a startup company in Baltimore founded to develop text processing software product s
for the commercial and Government sectors . The company, consisting of 7 people, was founded in 1989.
Synchronetics had two natural language processing software development projects prior to participatio n
in MUC-3: an off-the-shelf parsing utility called NL-Builder ; and a text retrieval system prototype calle d
Text-SR, which was developed under an SBIR contract for Wright Patterson Air Force Base .
Neither of these projects alone was sufficient to handle the MUC-3 problem . Synchronetics was therefore
prompted to look elsewhere for additional support . Members that participated on the `Synchronetics Team '
on a volunteer basis' were James Mayfield of the University of Maryland, Baltimore County (technical lea d
and template generation software), Kenneth Litkowski of CL Research of Gaithersburg Md . (software for
building the lexicon from a machine-readable dictionary), and Mark Wilson, Roy Cutts, and Bonnie Blade s
(implementation of the semantic net and phrase and sentence interpretation) .
The system was not integrated at the February meeting . At that time static cases were being passed by
hand from one processing stage to another . The complete system was fully integrated and running on 10 0
texts only three weeks before the final submission was due . Because of the relative youth of the system ,
little time was spent fine-tuning the algorithms and knowledge bases with the 1300 text development corpus .
Therefore, we feel that the final results demonstrate the feasibility, but not the potential performance, o f
our approach .
We estimate that we spent 9 person-months on the development of our MUC-3 system, and that w e
made use of about 9 person-months of work that was done before we initiated the project . The bulk of the
latter time was spent in the development of the NL-Builder product, and in the development of a previou s
LISP-based version of the KODIAK semantic net representation language .
'Synchronetics participation was funded for travel and incidental expenses only—all other labor was voluntary.
207
semantic
net
texts phraseparser phraseinterpreter
sentenceparser sentence
interpreter
template
generator =templates
Figure 1 : System Architecture
ARCHITECTURE
The Synchronetics system architecture has been strongly influenced by the composition of the Synchronetic s
team . With team members located at six different sites spread across Maryland, we needed an architectur e
comprising components that could be developed separately and tested individually .
The Synchronetics system consists of five 2 separate modules that communicate via a semantic net rep-
resentation language in a pipelined fashion . Each module is a stand-alone program that is written in C an d
operates on a variety of platforms . Figure 1 depicts this architecture . The five modules are :
1. A phrase parse r
2. A phrase interpreter
3. A sentence parser
4. A sentence interpreter
5. A template generator
A semantic net representation language (a variant of the KODIAK language) was developed for use with thi s
project. World knowledge is represented as a single net that is made available to each of the components . In
addition, each component passes on to its successor a network description of the text, including all inference s
that have been made about the text .
Parsers
It was important to us both to maintain the pipelined architecture (to facilitate the development of differen t
parts of the system at different sites), and to allow feedback from the semantic components of the syste m
to the syntactic components . Therefore, we split the syntactic analysis component into two pieces : a phras e
parser and a sentence parser. The phrase parser is responsible for breaking a text up into words, looking
those words up in the dictionary, grouping the words into phrases, and constructing parse trees for thos e
phrases . The sentence parser is a second parser that is responsible for constructing a single parse tree for eac h
sentence in the message . The input to the sentence parser is a sequence of tokens representing the phrase s
of a sentence as produced by the phrase interpreter . These processes are all performed by the Synchronetics
2A number of other components have been implemented or are under development, but were not included in the Phase 2
test.
208
NL-Builder product .
NL-Builder is a `programmable' parser. That is, the user may enter and modify the grammar, semantic
interpretation rules and morphology, as well as import a dictionary . NL-Builder was used to provide bot h
dictionary tools, and the two parsers. The significant components of NL-Builder are:
• DICTIONARY – The NL-Builder dictionary utilities include morphology rules that are modifiable b y
the user, a B-tree compiler, and user-specifiable features on the lexical categories .
Our initial dictionary was an available NL-Builder dictionary with 4000 words in it . It was not matche d
to the domain, but it contained many common English words . This initial dictionary also included
morphological rules, which were left largely unchanged . The dictionary was extended using utilities
for dictionary building that are packaged with NL-Builder ; these utilities were run on the MUC-
3 development corpus . This extension added many domain-specific terms and many slot fill terms
and their synonyms . Ken Litkowski then built a system to extract information from the Proximity
Linguistic System and enter it into the dictionary by comparing the dictionary with the words in th e
MUC-3 test corpus . The linking of relevant word senses in the dictionary to the appropriate nodes o f
the semantic network was done manually .
The final dictionary consisted of approximately 10,000 word senses and about 30 morphological an d
tokenization rules . The dictionary was compiled into a b-tree for fast access .
• TOKENIZER – A tokenizer module (which comes as part of the NL-Builder system) is used for markin g
text into tokens and identifying patterns that may not be in the dictionary (numbers, proper nouns ,
etc.) .
• PARSER – The parser is an extended ATN . It allows a user-specified recursive network state definition
with augmented conditions and actions on arcs. In addition, it allows look-ahead tests to prune search
paths. Here is an example of a portion of the ATN that handles passive verbs :
ARC S.PASSIVE FROM A TO END MATCH VERB
CONDITIONS
VERB: FORM .* == VERB:PASTYARTICIPLE ;
VERB:TYPE.LAST_VERB == VERB:BE;
ACTIONS
VOICE = PASSIVE;
VERB APPEND *;
The parser produces a `syntactic net' that is stored in the same format as the semantic net . Here is a
portion of the syntactic net that is produced by the phrase parser for the sentence (from message 9 9
of the tstl corpus) :
`Some 3 years ago two Marines died following a Shining Path bombing of a market used b y
Soviet Marines .'
Notice that the phrase parser has made a number of errors here, most notably the assumption tha t
`bombing' is a verb :
NP1110063B20
ISA NP ;
HEAD "SHINING PATH" ;
DETERMINER "A" ;
NUMBER SINGULAR;
PERSON THIRD ;
VP1110063B40
ISA VP;
VERB "BOMBING" ;
209
Semantic Interpreters
The phrase interpreter is responsible for building a semantic interpretation of each of the phrases discovere d
by the phrase parser . This process entails mapping from the words in the phrases to the corresponding node s
in the semantic net, then attaching these nodes to each other according to the meaning of the phrase . The
sentence interpreter is responsible for building a semantic interpretation of the entire sentence . It uses both
the output of the phrase interpreter, and the output of the sentence parser .
Our aim with the semantic interpreters was to make them robust enough to find appropriate connection s
between the selected nodes in the semantic net even if no explicit semantic interpretation rules are availabl e
for the syntactic structure being interpreted. Thus the basis for semantic interpretation is a spreading
activation process. If there is a semantic interpretation rule for a given phrase, then that rule is used t o
connect the nodes in the semantic net representing the components of the phrase . If, however, there is no
semantic interpretation rule, spreading activation is used to find plausible connections between concepts .
To continue our example, here is a portion of the phrase interpreter's output for the bombing sentence .
Notice that the phrase interpreter has established mappings (via 'SI,' or Semantic Interpretation, links )
between the syntactic nodes produced by the phrase parser, and concept nodes in the semantic net :
NP1110063B20
ISA NP;
HEAD "SHINING PATH" ;
DETERMINER "A" ;
NUMBER SINGULAR ;
PERSON THIRD ;
SI ORGANIZATION35 ;
VP1110063B40
ISA VP;
VERB "BOMBING" ;
SI BOMB.ACTION_56 ;
ORGANIZATION55
ISA ORGANIZATION;
BOMB ACTION .56
ISA BOMB.ACTION;
The sentence interpreter must put together an interpretation for the entire sentence . Here is a portion of it s
output from this sentence :
BOMBJ~CTION.56
ISA BOMBJ~CTION;
HASJ'ERPETRATOR ORGANIZATION-55 (S : SUBJECT) ;
Notice that the sentence interpreter has identified the Shining Path organization as the perpetrator of th e
bombing action .
210
Template Generator
The template generator is responsible for determining which actions that have been represented in th e
semantic net should lead to the generation of a template, and for the creation of those templates . It begins b y
examining each potentially reportable action in the semantic net (such as the children of KIDNAP_ACTION ,
the children of BOMB_ACTION, etc .) . For each such action, it tries to determine whether the action fall s
within the parameters of a reportable action as laid out in the MUC-3 specifications . Since the long-term
knowledge stored in the semantic net is currently quite limited, the system usually defaults to reporting th e
action . Once an action to report has been selected, a template is created for the action, and its slots ar e
filled one at a time . In most cases, slots are filled by starting from the node representing the action bein g
reported, and following a path through the semantic net to another node that stands in the desired relatio n
to the action node . Links are maintained from the syntactic world to the semantic world, so that the syste m
can trace back from a node in the semantic net to the words that caused the creation of that node. For the
MUC-3 final test, we attempted to fill only slots 0-7 and slot 11 .
Here is the template that is generated for the bombing sentence :
0 . MESSAGE ID
	
TST2-MUC-3-0099
1. TEMPLATE ID
	
2
2. DATE OF INCIDENT
	
25 OCT 89
3. TYPE OF INCIDENT
	
BOMBING
4. CATEGORY OF INCIDENT
	
TERRORIST ACT
5. PERPETRATOR: ID OF INDIV(S)
	
-
6. PERPETRATOR: ID OF ORG(S)
	
"SHINING PATH"
7. PERPETRATOR: CONFIDENCE
	
REPORTED AS FACT : "SHINING PATH"
8. PHYSICAL TARGET: ID(S)
	
-
9. PHYSICAL TARGET: TOTAL NUM
	
-
10. PHYSICAL TARGET: TYPE(S)
	
-
11. HUMAN TARGET: ID(S)
	
-
12. HUMAN TARGET: TOTAL NUM
	
-
13. HUMAN TARGET: TYPE(S)
	
-
14. TARGET: FOREIGN NATION(S)
	
-
15. INSTRUMENT: TYPE(S)
	
*
16. LOCATION OF INCIDENT
	
-
17. EFFECT ON PHYSICAL TARGET(S)
	
-
18. EFFECT ON HUMAN TARGET(S)
	
-
The date of the incident was not extracted from the sentence, so an incorrect default (the date of the article )
was entered . Consequently, the bombing action met the date test, and the template was generated .
211
