File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/m91-1019_metho.xml
Size: 14,182 bytes
Last Modified: 2025-10-06 14:12:43
<?xml version="1.0" standalone="yes"?> <Paper uid="M91-1019"> <Title>Incident Training Training Training</Title> <Section position="3" start_page="0" end_page="120" type="metho"> <SectionTitle> 2. The Training Set 3 (306 messages) contains only those messages that are relevant and generat e </SectionTitle> <Paragraph position="0"> one and only one template. The occurrence distribution of the various incident types for the three training sets are presented in Table 1. The *ed rows indicate which of the incident types have enough occurrences in the training set to be learnable. Similarly, not all fills associated with other slots ar e learnable .</Paragraph> <Paragraph position="1"> In any run, only two of the three training sets are used . One of these two training sets is used to develop a rule vector (termed as the optimal_query) that can identify a message as being relevant to the Muc-3 task. The other training set is used to develop concept rule vectors that can identify which among the various possible slot fills are actually applicable' to a message . Since our system mainly deals with slots for which fills come from a predefined set of fills (i.e. these are identifiedwith concepts to be learned), the number of concept rule vectors that pertain to each slot is not to o many.</Paragraph> <Paragraph position="2"> The activation value for a slot fill with respect to a test message is computed as the dot product of the concept rule vector and the message representation (as a vector) . For the required and some of the optional runs (options 2, 4, and 5), the system decides that a slot fill applies if its activatio n value with respect to the rule vector forte slot fill is greater than a dynamically generated threshol d T1 . This threshold for a given slot fill is based on the percentage of messages in the training set to which the set fill is applicable and the histogram depicting the distribution of the activation value s</Paragraph> <Paragraph position="4"> Ti No phrase s Optimal query needs non-relevant messages in the training set . When non-relevant messages are present in the training set for concepts , they are treated as negative examples for every concept . in the test set of messages for this slot fill. A second option is to use a zero threshold implying that the slot fill is applicable if the activation value of the corresponding rule vector with respect to th e message representation is positive .</Paragraph> <Paragraph position="5"> The training sets and the threshold setting used for official and optional test runs are presente d in Table 2 . The number of templates generated are compared in Table 3 and detailed results in terms of precision, recall, and overgeneration for Option 4 are presented in Table 4 . Since our system placed an emphasis on set list type slot fills, our system's performance, with respect to set fills only, for the various test runs, is summarized in Table 5. The results for Options 2, 3, and 5 ,shown in Tables 3 and 5, are scored at our site rather than by official scorers . Consequently, these figures are not completely consistent with those of the other options . In our assessment, the recalland precision values of our scoring are lower in Table 3 than what they would have been if score d by the official scorers . In contrast, the same options in Table 5 are somewhat inflated compared t o what the official scoring would have yielded . With this disparity in mind, the following observations are made on results from different runs : Required Run (Official-1) The official run generated a large number of templates . This run resulted in a moderate recall and moderate precision .</Paragraph> <Paragraph position="6"> Option 1 (Official-2) The run for option 1 does not generate many templates . This option use s a stricter or higher threshold for concepts compared to the Official-1 run . Therefore, thi s method achieves a low recall with a reasonable level of precision . This option sacrifices recall two training sets used for optimaLquery versus the other concepts. The effects on recall and precision are insignificant .</Paragraph> <Paragraph position="7"> Option 3 Option 3 used the same training sets as option 2 but the threshold was set to default .This led to a sharp drop in the total number of templates resulting in a much smaller valu e for recall . But as in option 1, there is significant improvement in precision . Comparing option 2 to option 3 serves the same purpose as comparing option 1 to the required run .</Paragraph> <Paragraph position="8"> Option 4 (Official-3) This option used the Training Set 3 to develop the rule vectors for slo t fills. When compared to option 2, this provides an assessment of the effectiveness of replacin g Training Set 1 by Training Set 3 for learning rule vectors for various possible slot fills . This option results in a large number of templates compared to the other options . The use of Training Set 3 makes the examples in the training set relatively cleaner. The method retains the level of recall roughly at the same level as the Official-1 run and option 2 while improvin g precision.</Paragraph> <Paragraph position="9"> Option 5 Option 5 is meant to test the effect of the use of phrases on the development of rule vector.The results from this test are similar to those from option 4 . Since we have obtained othe r results (not reported) where the use of phrases is helpful, we feel that the results concerning phrases is not yet conclusive.</Paragraph> </Section> <Section position="4" start_page="120" end_page="120" type="metho"> <SectionTitle> EXPLANATION OF TEST SETTING S </SectionTitle> <Paragraph position="0"> In each experiment, two different training sets are employed . A new (test) message is processe d against the optimaLquery computed from the first training set . The message is processed with respect to the rule vectors corresponding to all the possible slot fills, as computed from the secon d training set . A message is deemed relevant to Muc-database either if it is sufficiently similar t o the optimaLquery or, based on the concepts that are applicable and the rules in the rulebase, th einference engine evaluates the root concept to be true . Since the second training set is used to develo p rule vectors for different slot fills, it is desirable that the training set messages contain incidence o fall the slot fills . If there are no examples corresponding to a slot fill, the system is unable to develo p a rule vector for that fill and consequently, cannot recognize the occurrence of that slot fill in a new message. The way in which training sets are used enables us to test the effects of not only th e size of the training set, but also the quality in terms of training messages being non-ambiguous an d noise-free.</Paragraph> <Paragraph position="1"> The second variable in the optional testing - the threshold activation value - is used to selec t or ignore a slot fill. The system compares the representation of each message with respect to the rule vectors and computes an activation value for the corresponding slot fill . The default activatio n value is taken to be C., that is, a slot fill is deemed applicable if the activation value of its rule vecto r with respect to the message is positive . A precision-recall tradeoff can be achieved by changing th evalue of the threshold activation value . If this threshold is lowered, a concept becomes applicable t o more messages resulting in an improvement in recall at the cost of precision .</Paragraph> </Section> <Section position="5" start_page="120" end_page="123" type="metho"> <SectionTitle> EFFORT </SectionTitle> <Paragraph position="0"> The team for the Muc-3 project consisted of two professors, three graduate research assistants, an d four part-time programmers . The following graduate students made significant contributions to thi s project: V. K. Elayavalli and Y. Zhang of usL, and S. K. Bhatia of UNL . The bulk of the effort was spent on the process for phrase extraction, followed by selection of training set, developing inferenc e engine for the rulebase and the template filler . The learning and use of scoring program also took a considerable amount of time .</Paragraph> </Section> <Section position="6" start_page="123" end_page="123" type="metho"> <SectionTitle> LIMITING FACTOR S </SectionTitle> <Paragraph position="0"> The biggest limiting factor was time . Our estimation of the time and manpower for the project wasalso affected, in part, by a lack of participation in MUC-1 and MUC-2 . The project often competed, usually unsuccessfully, for the time of graduate students because of their classes and examinations .</Paragraph> <Paragraph position="1"> A lot of effort was spent on the extraction and the use of phrases . However, this effort did notprovide much contribution as the phrase information was not exploited to its limits . Towards the end, we succeeded in developing interesting techniques for phrase extraction and usage but coul dnot realize the benefits due to time constraints . In retrospect, we feel that we should have spentmore time on template filler module than on indexing module .</Paragraph> </Section> <Section position="7" start_page="123" end_page="123" type="metho"> <SectionTitle> TRAININ G </SectionTitle> <Paragraph position="0"> The quality and size of training sets have tremendous effect on the performance of the system . A limitation of hardware affected the size of training set that could be selected . A large training set required larger main memory and computer time for different modules in the project than could b e afforded by the limited computing resources at the two campuses . Again, the computing resource s had to be shared with instructional and other research users which had an adverse effect on th e resources for the project . Limited hardware resources were responsible for our search for a goodtraining set . In this project, we were limited to use at most 300 messages in the training set du e to memory and time constraints. Our initial approach was to manually select some messages t odevelop the training set . Later, we developed the training set through a program by selecting onl y those messages that addressed exactly one incident type.</Paragraph> <Paragraph position="1"> Ideally, the training set should contain enough messages such that all possible set fills are suffi-ciently represented . If the training set does not contain any message addressing a certain slot fill, the system is incapable of recognizing that slot fill . We also developed a module which could b e used to select a training set by computing the representational similarity of messages in the test setto those in the development set . Unfortunately, the module was not tested well enough to be usedfor the MuC-3 official testing .</Paragraph> </Section> <Section position="8" start_page="123" end_page="123" type="metho"> <SectionTitle> DOMAIN INDEPENDENCE </SectionTitle> <Paragraph position="0"> Nearly all of the system can be used independent of the domain of application . The system automatically learns the rule vectors corresponding to different slot fills and uses these rule vectors t oidentify the slot fills in new messages . The only domain dependent part of the system is the rule base that is used to decide the relevance of a message depending on whether certain concepts ar eapplicable to the message in a desired combination .</Paragraph> </Section> <Section position="9" start_page="123" end_page="125" type="metho"> <SectionTitle> CONCLUSIO N </SectionTitle> <Paragraph position="0"> MuC-3 provided us with a unique opportunity to test our ideas on conceptual classification of documents in the area of message understanding . Our approach is based on the recognition of message contents rather than actually understanding the messages . The recognition of certain patterns in a message allows the system to conclude whether certain subjects are addressed in a message . The system is, however, highly sensitive to the selection of a good training set .</Paragraph> <Paragraph position="1"> In the context of a MUC-like task, the system is capable of recognizing the presence or absenc e of different concepts that correspond to fills drawn from a set of values . Specifically, our system can efficiently identify the domain of a message and which among certain salient concepts are addressed .</Paragraph> <Paragraph position="2"> For example, in the case of Option 4 run, the recall and precision associated with the optimal_quer yvector is respectively 0 .78 and 0.88 (i.e . if the question is whether at least one template shoul dbe generated) . The performance, in terms of precision, is also impressive in certain slots such a s INCIDENT TYPE(S) and effects on PHYSICAL OR HUMAN TARGETS . Furthermore, the concept rul e vectors are found to be successful in identifying relevant paragraphs within messages . In many application environments, such capabilities may be adequate. Furthermore, our system may be used as a front-end to a comprehensive message understanding system .</Paragraph> <Paragraph position="3"> The system has only a limited ability to identify fills that are of string type appearing in th e messages. Phrase extraction, combined with the locality of information, was particularly useful i n filling certain slots that require string fills. The process to extract and use phrase information can be exploited to a greater extent than has been done in the present system .</Paragraph> <Paragraph position="4"> The system can be improved by the following enhancements. From a domain independent viewpoint, the system can benefit from a more robust procedure for training set selection. Moreover, theprocess to extract and use phrase information can be exploited to a greater extent . For improving the performance in the current domain, the template filler module can be modified by taking int o account the information regarding dependencies between different slot fills . In general, much more effort is needed in designing the template filler module .</Paragraph> </Section> class="xml-element"></Paper>