File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/m92-1024_metho.xml
Size: 15,248 bytes
Last Modified: 2025-10-06 14:13:13
<?xml version="1.0" standalone="yes"?> <Paper uid="M92-1024"> <Title>BOMBINGACCOMPLISHED 'THE BOMB &quot;&quot;EXPLOSIVES &quot; BOMB: 'THE BOMB &quot;EXPLOSIVE: &quot;EXPLOSIVES&quot; TERRORIST ACT&quot;GUERRILLAS&quot; &quot;FMLN &quot; REPORTED AS FACT: &quot;FMLN&quot;&quot;MERINO'S HOME&quot;</Title> <Section position="4" start_page="169" end_page="171" type="metho"> <SectionTitle> (&quot;SALVADORAN PRESIDENT-ELECT ALFREDO CRISTIANI CONDEMNED THE TERRORIST KILLING O F ATTORNEY GENERAL ROBERTO GARCIA ALVARADO&quot; </SectionTitle> <Paragraph position="0"> 'This number is inflated due to the fact that sentence-final punctuation always appears as a separate fragment, an d the fact that commas frequently appear as isolated fragments.</Paragraph> <Paragraph position="1"> The semantic interpreter operates on each fragment produced by FPP in a bottom-up, compositional fashion . Throughout the system, defaults are provided so that missing semantic information or rules do not produce errors , but simply mark semantic elements or relationships as unknown . This is consistent with our belief that partial understanding has to be a key element of text processing systems, and missing data has to be regarded as a normal event. The semantic component encompasses both lexical semantics and semantic rules . The semantic lexicon is separate from the parses lexicon and has much less coverage.</Paragraph> <Paragraph position="2"> We used an automatic case frame induction procedure to construct an initial version of the lexicon [2] . Word senses of the semantic lexicon have probability assignments, which we plan to derive automatically from corpora . For MUC-4, probabilities were assigned so each word sense is more probable than the next sense of the word a s entered in the lexicon.</Paragraph> <Paragraph position="3"> Lexical semantic entries indicate the word's semantic type (a domain model concept), as well as predicate s pertaining to it. For example, here is the lexical semantics for the verb BOMB : (defverb &quot;BOMB&quot; ( BOMB-V-1 BOMBING (:case (subject PEOPLE TI-PERP-OF) (object ANYTYPE OBJECT-OF))) ) This entry indicates that the type is BOMBING, that a subject argument whose type is PEOPLE should be given the role TI-PERP-OF, and that an object argument of any type should be given the role OBJECT-OF . BOMB-V-1 is the unique identifier of this (only) word sense.</Paragraph> <Paragraph position="4"> The semantic rules are based on general syntactic patterns, using wildcards and similar mechanisms to provide an extra measure of robustness . The basic elements of our semantic representation are &quot;semantic forms&quot;, each of which introduces a variable (e .g. ? 13) with a type and a collection of predicates pertaining to that variable . There are three basic types of semantic forms: entities of the domain, events, and states of affairs. Each of these three can be further categorized as known, unknown, and referential. Entities correspond to the people, places , things, and time intervals of the domain. These are related in important ways, such as through events (who did wha t to whom) and states of affairs (properties of the entities) . Entity descriptions typically arise from noun phrases; events and states of affairs may be described in clauses .</Paragraph> <Paragraph position="5"> Not everything that is represented in the semantics has actually been understood. For example, the predicate PP-MODIFIER indicates that two entities (expressed as noun phrases) are connected via a certain preposition . In this way, we have a &quot;placeholder&quot; for the information that a certain structural relation holds between these two items , even though we do not know what the actual semantic relation is . Sometimes understanding the relation more full y is of no consequence, since the information does not contribute to the template-filling task . The information i s maintained, however, so that later expectation-driven processing can use it if necessary .</Paragraph> <Paragraph position="6"> Here is a semantic rule which handles, for example, &quot;group ofbusinessmen &quot;, &quot;murder of a man&quot;, and &quot;terrorist s of the FMLN&quot;: For an NP dominating an NP1, and a PP whose PREP is &quot;OF&quot; and which dominates NP2: If NP1 is in (&quot;GROUP, &quot;BAND&quot;) ; return semantics of NP2 If NP1 is an EVENT of type TERRORIST ; make NP2 the OBJECT-OF NP1 ; return new NP1 sem If type of NP1 is PEOPLE and type ofNP2 is ORGANIZATION, merge semantics, showing that NP1 BELONGS-TO NP2; otherwise use a more general NP => NP PP rule An important consequence of the fragmentation produced by FPP is that top-level constituents are typically more shallow and less varied than full sentence parses. As a result, more semantics coverage was obtained early on in the development process with few semantic rules than would have been expected if the system had had to cover widely varied syntactic structures before producing any semantic structures . In this way, semantic coverage can be added gradually, while the rest of the system is progressing in parallel.</Paragraph> <Paragraph position="7"> After having assigned semantic representations to the fragments produced by FPP, it is often possible to mak e some of the attachment decisions which had been deferred. For example, it is possible to combine two NPs o f compatible semantic types that are conjoined, or attach prepositional phrases preferentially, using informatio n automatically derived from a corpus [7] . Our basic system uses fragment combination for certain proper nam e constructions, while some of our submitted optional runs used more extensive patterns for fragment combination. Figure 2 shows a graphical version of the semantics generated for the first fragment of S1 in TST2-MUC4-0048 . In this example note the UNKNOWN-EVENT created for the main verb &quot;CONDEMNED&quot;, which has no lexica l semantics in our system, but still generates a useful semantic representation .</Paragraph> </Section> <Section position="5" start_page="171" end_page="172" type="metho"> <SectionTitle> CONDEMNED THE TERRORIST KILLING OF ATTORNEY GENERAL ROBERTO GARCIA ALVARADO&quot; </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="172" end_page="172" type="sub_section"> <SectionTitle> Discourse Processing </SectionTitle> <Paragraph position="0"> The discourse component of PLUM performs the operations necessary to construct event objects corresponding to relevant events in the message . Each event object in the discourse event structure is similar in principle to th e notion of a &quot;frame&quot;, with its corresponding &quot;slots&quot; or fields . The semantic representation of an event in the text onl y includes information contained locally in a fragment (after fragment combination); in creating corresponding even t objects, the discourse module must infer other long-distance or indirect relations not explicitly found by th e interpreter, and resolve any references in the text . The template generator then uses the structures created by the discourse component to generate the final templates . Currently only terrorist incidents (and &quot;possible terrorist incidents&quot;) generate discourse events, since these are the core events for MUC-4 template generation . The discourse component was further discussed in [1] . Two primary structures are created by the discourse processo r which are used by the template generator: the discourse predicate database and the event structure . The database contains all the predicates mentioned in the semantic representation of the message . When references are resolved , corresponding semantic variables are unified in the database . Any other inferences done by the discourse componen t also get added to the database.</Paragraph> <Paragraph position="1"> To create the discourse event structure, the discourse component processes each semantic form produced by the interpreter, adding its information to the database and performing reference resolution when needed . Pronouns and person-type anaphoric definite NPs may be resolved. In addition, set- and member-type reference is also treated in other simpler domains. Some intra-sentential structural constraints on reference are enforced. When a semantic form for an event of interest is encountered, a discourse event is generated, and any slots already found by the interprete r are filled in the event. This event is then merged with a previous event if they are compatible . This heuristic assumes that the events were possibly derived from repeated references to a single real event .</Paragraph> <Paragraph position="2"> Once all the semantic forms have been processed, heuristic rules are applied to fill in any unfilled slots by looking at text surrounding the forms which triggered a given event. Each filler found is assigned a score based on where it was found in relation to an event trigger, indicating a higher confidence for fillers found closer to a trigger.</Paragraph> <Paragraph position="3"> Following is the discourse event structure for the first event in TST2-MUC4-0048 :</Paragraph> </Section> </Section> <Section position="6" start_page="172" end_page="173" type="metho"> <SectionTitle> &quot;ATTORNEY GENERAL ROBERTO GARCIA ALVARADO&quot; (score = 0) &quot;PRESIDENT OF THE LEGISLATIVE&quot; (score = 2) </SectionTitle> <Paragraph position="0"> &quot;AN ARENA LEADER&quot; (score = 2) Each trigger fragment contains one or more words whose semantics triggered this event . In the example above , a score of 0 indicates the filler was found directly by the semantics ; 1 that it was found in the same fragment as a trigger semantic form ; 2 in the same sentence ; 4 in the same paragraph; and 6 in an adjacent paragraph .</Paragraph> <Section position="1" start_page="172" end_page="173" type="sub_section"> <SectionTitle> Template Generation </SectionTitle> <Paragraph position="0"> The template generator takes the event structure produced by discourse processing and fills out the application specific templates. Clearly much of this process is governed by the specific requirements of the application , considerations which have little to do with linguistic processing. The template generator must address any arbitrar y constraints, as well as deal with the basic details of formatting .</Paragraph> <Paragraph position="1"> The template generator uses a combination of data-driven and expectation-driven strategies . First the information in the event structure is used to produce initial values . At this point, values which should be filled i n but are not available in the event structure are supplied from defaults, either from the header (e .g., date and location information) or from reasonable guesses (e.g. that the object of a murder is usually a suitable filler for the huma n target slot when the semantic type of the object is unknown).</Paragraph> <Paragraph position="2"> We expect to eventually use a classifier at this stage of processing. This is especially appropriate for template slots with a set list of possible fillers, e .g. perpetrator confidence, category of incident, etc.</Paragraph> </Section> <Section position="2" start_page="173" end_page="173" type="sub_section"> <SectionTitle> Text Relevance </SectionTitle> <Paragraph position="0"> A new classifier for determining text relevance is now a component of PLUM . It may be utilized by our system to filter out a discourse event object when none of the phrases that gave rise to it is found in a paragraph classified as relevant . Since the event objects are the input lo the template generator, it serves effectively as a filte r on templates .</Paragraph> <Paragraph position="1"> The text classifier uses a probabilistic model to perform a binary classification . The features used by the model are stemmed words . The text classifier is trained automatically from two sets of text representing the categories of the classifier (i .e. relevant and irrelevant). A chi-square test is used to determine which words are good indicators of membership on one category but not the other. These words become the features of the probabilistic model. A log probability representing the likelihood of the word occurring in text of one type or the other is assigned to eac h word. It is this probability that is used in the classification process.</Paragraph> <Paragraph position="2"> When a piece of text is to be classified, it is scanned for occurrences of the word features selected durin g training. Summing the log probabilities of all the evidence found in the text gives a measure of the likelihood that the text is a member of a particular category. The sum is then compared to a user-selected threshold to determine the classification. Different thresholds produce different recall and precision values, allowing the user to tune the classifier for high recall, high precision, or something in between. Several of our optional runs showing a wide range of recall-precision tradeoffs were obtained by varying the classifier threshold.</Paragraph> <Paragraph position="3"> Parameters in PLUM An important feature of PLUM is that many aspects of its behavior can be controlled by simply varying th evalues of system parameters . An important goal has been to make our system as &quot;parameterizable&quot; as possible, so that the same software can meet different demands for recall, precision, and overgeneration . PLUM has parameters to control, for example, some aspects of fragment combination, event merging and slot filling by discourse, an d relevance assignment by the classifier. In order to pick which system configuration to use for our required MUC- 4 run, we tested more than 25 configurations on two test sets and one training set. Maximal F-scores were obtained with settings for aggressively merging events, conservatively looking for slot fillers, and a classifier threshold on relevance of 1.</Paragraph> </Section> </Section> <Section position="7" start_page="173" end_page="174" type="metho"> <SectionTitle> TEMPLATES FOR EXAMPLE MESSAGE </SectionTitle> <Paragraph position="0"> 0. MESSAGE: ID 1. MESSAGE: TEMPLAT E 2. INCIDENT : DATE 3. INCIDENT : LOCATION 4. INCIDENT : TYPE 5. INCIDENT : STAGE OF EXECUTION 6. INCIDENT: INSTRUMENT ID 7. INCIDENT: INSTRUMENT TYPE 8. PERP: INCIDENT CATEGORY 9. PERP: INDIVIDUAL ID 10. PERP: ORG ID 11. PERP: ORG CONFIDENC E</Paragraph> </Section> <Section position="8" start_page="174" end_page="174" type="metho"> <SectionTitle> BOMBINGACCOMPLISHED 'THE BOMB &quot;&quot;EXPLOSIVES &quot; BOMB: 'THE BOMB &quot;EXPLOSIVE: &quot;EXPLOSIVES&quot; TERRORIST ACT&quot;GUERRILLAS&quot; &quot;FMLN &quot; REPORTED AS FACT: &quot;FMLN&quot;&quot;MERINO'S HOME&quot; CIVILIAN RESIDENCE : &quot;MERINO'S HOME &quot;1: &quot;MERINO'S HOME&quot; </SectionTitle> <Paragraph position="0"/> </Section> class="xml-element"></Paper>