HUGHES RESEARCH LABORATORIE S
TRAINABLE TEXT SKIMMER :
MUC-4 TEST RESULTS AND ANALYSI S
Stephanie E. August
Hughes Aircraft Compan y
Electro-Optical and Data Systems Group
P.O. Box 902 -- EO E52 C235
El Segundo, CA 90245-0902
august@ sed l 70.hac.com
(310) 616-6491
Charles P . Dolan
Hughes Research Laboratorie s
3011 Malibu Canyon Road M/S RL9 6
Malibu, CA 90265
cpd@aic.hrl.hac.com
(310) 317-5675
SUMMARY OF MUC-4 PERFORMANC E
Table 1 shows the official template-by-template score results for the Hughes Trainable Text Skimmer use d
for MUC-4 (TTS-MUC4) on TST3 . TI'S is a largely statistical system, using a set of Bayesian classifiers with the
output of a shallow parser as features. (See the System Summary section of this volume for a detailed description o f
TTS-MUC4) .
SLOT POS ACTICOR PAR INCIICR IPAI SPU MIS NONIREC PRE OVG FAL
template-id 112 1061
	
63 0 01 0 01 43
	
49 01 56 59 40
inc-date 109 1011
	
22 15 241 22 151 40
	
48 61 27 29 40
inc-loc 112 871
	
11 39 41 0 171 33
	
58 101 27 35 38
inc-type 112 1061
	
55 8 01 0 01 43
	
49 01 53 56 40 4
inc-stage 112 1061
	
59 0 41 0 01 43
	
49 01 53 56 40 13
inc-instr-id 33 141
	
5 1 01 1 11 8
	
27 1271 17 39 57
inc-instr-type 52 141
	
4 0 21 0 Cl 8
	
46 1091 8 28 57 0
perp-inc-cat 69 1011
	
28 0 101 0 01 63
	
31 231 40 28 62 30
perp-ind-id 85 871
	
12 5 191 2 51 51
	
49 351 17 17 59
perp-org-id 52 521
	
12 0 71 1 01 33
	
33 721 23 23 63
perp-org-conf 52 521
	
4 2 131 0 21 33
	
33 721 10 10 63 5
phys-tgt-id 66 1121
	
13 2 101 0 21 87
	
41 741 21 12 78
phys-tgt-type 66 1121
	
10 4 111 0 31 87
	
41 741 18 11 78 4
phys-tgt-num 67 1221
	
13 7 51 0 71 97
	
42 741 25 14 80
phys-tgt-nation 2 01
	
0 0 01 0 01 0
	
2 1541 0 * * 0
phys-tgt-effect 39 1121
	
6 6 21 0 51 98
	
25 821 23 8 88 10
phys-tgt-total-num 0 391
	
0 0 01 0 01 39
	
0 1161 * 0 100
hum-tgt-name 57 1731
	
22 5 91 1 51 137
	
21 681 43 14 79
hum-tgt-desc 132 2221
	
29 24 171 1 241 152
	
62 351 31 18 68
hum-tgt-type 146 3711
	
35 16 321 1 131 288
	
63 231 29 12 78 17
hum-tgt-num 146 3891
	
35 32 161 1 261 306
	
63 231 35 13 79
hum-tgt-nation 16 01
	
0 0 01 0 01 0
	
16 1431 0 * * 0
hum-tgt-effect 124 3861
	
35 20 111 1 181 320
	
58 261 36 12 83 20
hum-tgt-total-num 1 331
	
0 0 01 0 01 33
	
1 1211 0 0 100
inc-total 530 4281156 63 341 23 331 175 277 2521 35 44 41
perp-total 258 2921
	
56 7 491 3 71 180 146 2021 23 20 62
phys-tgt-total 240 4971
	
42 19 281 0 171 408 151 5741 21 10 82
hum-tgt-total 622 15741156 97 851 5 8611236 284 4391 33 13 78
MATCHED/MISSING 1650 18181410 186 1961 31 14311026 858 10171 30 28 56
MATCHED/SPURIOUS 919 27911410 186 1961 31 14311999 127 9361 55 18 72
MATCHED ONLY 919 18181410 186 1961 31 14311026 127 4861 55 28 56
ALL TEMPLATES 1650 27911410 186 1961 31 14311999 858 14671 30 18 72
SET FILLS ONLY 790 8791236 56 851 2 411 502 413 4911 33 30 57 2
STRING FILLS ONLY 425 4341
	
93 37 621 6 371 242 233 2831 26 26 56
TEXT FILTERING 69 991
	
68 * *1 * *I 31
	
1 01 98 69 31 100.
PSR 2P&R P&2R
F-MEASURES 22 .5 19 .57 26.47
Table 1: Official TST3 score report.
10 4
The performance, on a slot by slot basis, is, therefore, what one might expect : the pure set fills such as
INCIDENT: TYPE and INCIDENT : STAGE OF EXECUTION show much better performance than the string fill s
such as HUM TGT: NAME.
Table 2 shows the summary rows of the official template-by-template results on TST4 . The complete
official score report for TTS-MUC4 on TST4 can be found in Appendix G : Final Test Score Summaries .
Performance was comparable on both sets of texts .
SLOT
	
POS ACTICOR PAR INCIICR IPAI SPU MIS NONIREC PRE OVG FAL
MATCHED/MISSING
	
1157 12601340 146 1571 34 891617 514 645 136 33 49
MATCHED/SPURIOUS
	
803 22731340 146 1571 34 8911630 160 955 151 18 72
MATCHED ONLY
	
803 12601340 146 1571 34 891617 160 404
	
151 33 49
ALL TEMPLATES
	
1157 22731340 146 1571 34 8911630 514 11961 36 18 7 2
SET FILLS ONLY
	
561 6121195 48 77 I 0 311292 241 314 139 36 48
	
2
STRING FILLS ONLY
	
302 2931 80 22 47 1 2 221144 153 179 130 31 49
TEXT FILTERING
	
56 981 56
	
*
	
* I *
	
*I 42
	
0
	
2 1100 57 43 95
P&R
	
2P&R
	
P&2R
F-MEASURES
	
24.0
	
20.0
	
30 . 0
Table 2: Summary rows of the official TST4 score report .
MUC-4 TEST SETTING S
TTS-MUC4 uses Bayesian classifiers for each of the template slots . The general form for Bayesian
classifiers is to compute,
Pr(ci Ifl A f2 .. .fn )
where fi are textual features. For set fill slots, the Ci are the possible values (e .g. DEATH, SOME DAMAGE ,
etc.). For the string fill slots, the Ci are yes or no answers to whether a particular item fills a slot, (e .g. HUMAN-
TGT-NAME versus HUMAN-TGT-NAME-NOT). For typical Bayesian classifiers, the tunable parameter is th e
prior probabilities for the Ci . In TTS-MUC4 we have two different settings, EQUI-PROS and REL-FREQ ,
respectively for probabilities that are equal for all classes and probabilities that reflect the relative frequency of classes
in the training data . EQUI-PROB favors recall, and REL-FREQ favors precision .
In addition, for text applications, there is an issue as to whether one includes only those features present i n
the text, or, also, those that are absent. In TTS-MUC4 we used two different settings, PRESENT and
PRESENT&FREQUENT, where PRESENT&FREQUENT considers all those features which are present and als o
those that are absent, but which occur very frequently in the texts . The threshold for whether a feature wa s
considered frequent was set so that, for each slot, approximately 30 features were considered frequent . In the TTS-
MUC4 conceptual hierarchy there are over 400 potential features .
For each slot, the parameter settings were optimized to balance recall and precision . The optimization wa s
done using TST1 and TST2 . Table 3 gives the parameter settings for each slot. Balancing precision and recall for
string fill slots is difficult in TTS-MUC4 . For example, in the training corpus, TTS-MUC4 detects over 4,000
potential HUMAN-TARGET-NAMES, but less than 10% of these are actual string fills.
TRAINING METHODOLOG Y
To compute the conditional probabilities, the MUC-3 development (DEV) corpus and the associate d
templates where used . Each sentence in the DEV corpus that contained a string fill for some template was used as a
training sample . TI'S detects features for important domain words (e.g. explosion, report, etc.), and also for
phrases that may map into string fills . For each training sample, the presence or absence of each feature was
examined to compute, for example,
105
Pr,f (:explosion - wI:PHYS - TGT - TYPE = :COMMERCIAL )
The probability estimates using relative frequency„ Pr,1, are then combined using Bayes rule on a ne w
sentence to compute:
~'( c,lft A f2 . . . f,, )
SLOT
	
I
	
Priors Tests
INCIDENT-TYPE REL-FREg PRESENT
STAGE-OF-EXEC
	
_ REL-FREQ PRESENT
INSTRUMENT-ID
	
_ EQUI-PROB PRESENT&FREQUENT
INSTRUMENT-TYPE REL-FRE9 PRESENT&FREQUENT
PERP-INDIV EQUI-PROB PRESENT
PERP-ORG
	
_ EQUI-PROB PRESENT
PERP-CAT EQUI-PROB PRESENT
PERP-CONF EQUI-PROB PRESENT&FREQUENT
HUM-TGT-NAME EQUI-PROB PRESENT
HUM-TGT-DESCR EQUI-PROS PRESENT
HUM-TGT-TYPE REL-FRE9 PRESENT
HUM-TGT-EFFECT REL-FREQ PRESENT
PHYS-TGT- ID EfUI-PROS PRESENT&FREQUENT
PHYS-TGT-TYPE REL-FREQ PRESENT
PHYS-TGT-EFFECT
	
_ REL-FREQ PRESENT
Table 3: Test run setting for the Bayesian classifiers .
In addition to training of the Bayesian classifiers, the DEV corpus was used, exactly as in TTS-MUC3, t o
derive phrase patterns for potential string fills. For example, "SIX JESUITS" would drive the creation of the
phrase ( :NUMBER-W : RELIGIOUS-ORDER-W) . The type of the string fill served as the semantic feature for
the phrase, which is :CIVILIAN-DESCR, in this example .
Improvement that occurred over time in TTS-MUC4 is attributable to two factors: the introduction of the
Bayesian classifiers to replace the K-Neighbors technique from TTS-MUC3, and the tuning of the parameters of th eBayesian
classifiers for each slot.
All of the training for TTS-MUC4 is automated. As with TTS-MUC3, the only manual portion of th e
process is choosing the conceptual classes for the lexicon .
ALLOCATION OF EFFORT
Two calendar months and approximately 2 .5 person months were spent on enhancing the TTS-MUC3
system to create TTS-MUC4.
TTS-MUC4 effort falls roughly into three categories : classifier evaluation, system training, and filte r
development. Approximately 20% of our time was spent on developing and evaluating the performance of th e
Bayesian classifier, and tuning the parameters used in this classifier . This classifier replaced the K-Nearest Neighbor
classifier previously employed in TTS-MUC3. 10% of the development effort focused on tuning other system
parameters, such as the *fill-strength-threshold*, which provides a means for filtering out unlikely slot fillers .
About 40% of our time was devoted to developing filters to improve the precision of the values of the templat e
fillers, and evaluating their effects . Retraining of the system to take advantage of a modified lexicon and t o
accommodate the revised templates took up about 10% of the time. The remaining 20% of the effort was spent o n
developing code to extract information to fill the new and revised slots of the MUC-4 templates .
LIMITING FACTOR S
One limiting factor for the Hughes TTS-MUC4 system was time. The Bayesian classifier is effective for
filling most slots, but the K-Nearest Neighbor classifier might provide better fills for others . However, time did not
10 6
permit us to experiment enough to identify the best classifier to use for each slot . Another aspect of TTS to which
we would like to have devoted more attention is on dynamically weighting features retrieved from the knowledge
base depending upon their relevance to the slot being processed . Our algorithm for grouping sentences into topics
was responsible for many of our errors . Improving the slot-dependent weighting portion of the system would take a
considerable amount of additional time, and would require that domain knowledge be added into the processing .
FUTURE WOR K
The following enhancements are most relevant to the current MUC-oriented software : (1) filters for string
fills based on linguistic knowledge, (2) reference resolution, and (3) better learningfpattem classification algorithms .
TTS-MUC4 currently has a very limited amount of processing that is specialized for language . One of the feature s
that we would have liked to detect in the MUC-4 corpus was the source of information in a story . Individuals who
are the source of a report occurred frequently, and er oneously, as human targets . Another "language specific" portio n
we would like to add is reference resolution for string fills . TTS-MUC4 currently suffers in its precision score
because it lists each referent for a filler several times .
Additional changes would make a more usable "real syste m", although they are not essential for the MUC
task as it now stands. These include (1) the development of a user interface for corpus marking, and (2) integratio n
with on-line data sources, such as map databases, to eliminate the burden of creating special data files for natura l
language processing.
TRANSFERABILITY TO OTHER TASK S
Currently, TTS only requires a lexicon and a training corpus with templates . Therefore, extension to
terrorism in another locale or to a completely different domain would be easy . However, once features are added to
improve performance, as noted in Section 6 above, handling a new domain will be more difficult .
LESSONS LEARNED
TTS-MUC4 represents a small increase in performance beyond TTS-MUC3 . TTS currently has very littl e
processing specific to language ; most of the processing is simple feature detection followed, by pattern recognitio n
algorithms . We believe that TTS-MUC4 represents a plateau in performance that will require more linguisti c
knowledge to increase performance . The goal for TTS, then, is to significantly increase performance withou t
increasing development time for new applications .
REFERENCES
[1] Dolan, Charles P ., Goldman, Seth R ., Cuda, Thomas V ., Nakamura, Alan M. Hughes Trainable Tex t
Skimmer: description of the US system as used for MUC-3 . Proceedings of the Third Message Understanding
Conference (MUC-3) . San Diego, California, 21-23 May 1991 .
[2] Dolan, Charles P ., Goldman, Seth R ., Cuda, Thomas V ., Nakamura, Alan M . Hughes Trainable Tex t
Skimmer: MUC-3 test results and analysis . Proceedings of the Third Message Understanding Conferenc e
(MUC-3) . San Diego, California, 21-23 May 1991 .
107
