File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1043_metho.xml

Size: 13,230 bytes

Last Modified: 2025-10-06 14:13:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1043">
  <Title>Test Sets</Title>
  <Section position="4" start_page="0" end_page="224" type="metho">
    <SectionTitle>
RELEVANCY DISCRIMINATIONS
</SectionTitle>
    <Paragraph position="0"> Terrorism is a complex domain, especially when it is combined with a complicated set of domain relevancy guidelines. Relevancy judgements in this domain are often difficult even for human readers. Many news articles go beyond the scope of the guidelines or fall into grey areas no matter how carefully the guidelines are constructed. Even so, human readers can reliably identify some subset of relevant texts in the terrorism domain with 100% precision, and often without reading these texts in their entirety. Text skimming techniques are therefore a promising strategy for text classification as long as lower levels of recall 1 are acceptable. Although it might be 1Recall refers to the percentage of relevant texts that are correctly classified as relevant. Precision is the percentage of texts classified as relevant that actually are relevant. To illustrate the difference, imagine that you answer 3 out of 4 questions correctly on a true-or-false exam. Your recall rate is then 75%. Your precision, however, depends on how many of the questions you  unrealistic to try to classify all of the news articles in a corpus with a high degree of precision using anything less than a complete, in-depth .natural language processing system, it is realistic to try to identify a subset of texts that can be accurately classified using relatively simple techniques. 2 Intuitively, certain phrases seem to be very strong indicators of relevance for the terrorism domain. &amp;quot;X was assassinated&amp;quot; is very likely to be a reference to a terrorist event in which a civilian (politician, government leader, etc.) was killed. &amp;quot;X died&amp;quot; is a much weaker indicator of relevance because people often die in many ways that have nothing to do with terrorism. Linguistic expressions that predict relevance for a domain can be used to recognize and extract relevant texts from a large corpus. Identifying a reliable set of such expressions is an interesting problem and one that is addressed by relevancy feedback algorithms in information retrieval (Salton 1989).</Paragraph>
  </Section>
  <Section position="5" start_page="224" end_page="224" type="metho">
    <SectionTitle>
SELECTIVE CONCEPT EXTRACTION
USING CIRCUS
</SectionTitle>
    <Paragraph position="0"> Selective concept extraction is a sentence analysis technique that simulates the human ability to skim text and extract information in a selective manner. CIRCUS (Lehnert 1990) is a sentence analyzer designed to perform selective concept extraction in a robust manner. CIRCUS does not presume complete dictionary coverage for a particular domain, and does not rely on the application of a formal grammar for syntactic analysis. CIRCUS was the heart of the text analyzer underlying the UMass/MUC-3 system (Lehnert at al. 1991a, 1991b, 1991c), and it provided us with the sentence analysis capabilities used in the experiments we are about to describe. 3 The most important dictionary entries for CIRCUS are those thin contain a concept node definition. Concept nodes provide the case frames that are used to structure CIRCUS output. If a sentence contains no concept node triggers, CIRCUS will produce no output for that sentence. One of the research goals stimulated by our participation in MUC-3 was to gain a better understanding of these concept nodes and the vocabulary items associated with them.</Paragraph>
    <Paragraph position="1"> actually answered. If you only answered 3 of them, then your precision is 100%. But if you answered all 4 then our precision is only 75%.</Paragraph>
    <Paragraph position="2"> Many information retrieval tasks and message understanding applications are considered to be successful if low levels of recall are attained with high degrees of precision.</Paragraph>
  </Section>
  <Section position="6" start_page="224" end_page="224" type="metho">
    <SectionTitle>
3 The UMass/MUC-3 system posted the highest recall
</SectionTitle>
    <Paragraph position="0"> score and the highest combined scores for recall and precision of all the MUC-3 text analyzers (Sundheim 1991).</Paragraph>
    <Paragraph position="1"> Our UMass/MUC-3 dictionary was hand-crafted specifically for MUC-3. A preliminary analysis of our MUC-3 dictionary indicated that we had roughly equal numbers of verbs and nouns operating as concept node triggers (131 verbs and 125 nouns). Other parts of speech also acted as concept node triggers, but to a lesser extent than verbs and nouns. Out of roughly 6000 dictionary entries, a total of 286 lexical items were associated with concept node definitions.</Paragraph>
    <Paragraph position="2"> All concept node definitions contain a set of enablement conditions that must be met before the concept node can be considered valid. For example, if the lexical item &amp;quot;kill&amp;quot; is encountered in a sentence, a case frame associated with that item may be valid only if this instance of &amp;quot;kill&amp;quot; is operating as a verb in the sentence. Expectations for an agent and object will be useful for the verb &amp;quot;to kill&amp;quot; but not for a head noun as in &amp;quot;went in for the kill&amp;quot;. Enablements are typically organized as conjunctions of conditions, and there is no restriction on what types of enablements can be used.</Paragraph>
    <Paragraph position="3"> The enablement conditions for concept nodes effectively operate as filters that block further analysis when crucial sentence structures are not detected. If a filter is too strong, relevant information may be missed. If a filter is too weak, information may be extracted that is not valid. When sentence analysis fails due to poorly crafted enablement conditions, no other mechanisms can step in to override the consequences of that failure.</Paragraph>
  </Section>
  <Section position="7" start_page="224" end_page="225" type="metho">
    <SectionTitle>
RELEVANCY SIGNATURES
</SectionTitle>
    <Paragraph position="0"> It is often the case that a single phrase will make a text relevant. For instance, a single reference to a kidnapping anywhere in a text generally signals relevance in the terrorism domain regardless of what else is said in the remainder of the article. 4 One implication of this fact is that it is not always necessary to analyze an entire text in order to accurately assess relevance. This property makes the technique of selective concept extraction particularly well-suited for text classification tasks.</Paragraph>
    <Paragraph position="1"> We claim that specific linguistic expressions are reliable indicators of relevance for a particular domain. These expressions must be general enough to have broad applicability but specific enough to be consistently reliable 4In fact, there can be exceptions to any statement of this type. For example, an event that happened over 2 months ago was not considered to be relevant for MUC-3. Our approach assumes that these special cases are relatively infrequent and that key phrases can indicate relevance most of the time. Our technique will therefore produce weaker results under relevancy guidelines that detail special cases and exceptions if those conditions appear frequently in the target texts.</Paragraph>
    <Paragraph position="2">  over large numbers of texts. For example, the word &amp;quot;dead&amp;quot; often appears in a variety of linguistic contexts such as &amp;quot;he was found dead&amp;quot;, &amp;quot;leaving him dead&amp;quot;, &amp;quot;left him dead&amp;quot;, &amp;quot;they counted 15 dead&amp;quot;, etc. Some of these expressions may provide stronger relevancy cues than others. For example, &amp;quot;&lt;person&gt; was found dead&amp;quot; is a strong relevancy cue since there is a good chance that the person was the victim of a terrorist crime, whereas &amp;quot;&lt;number&gt; dead&amp;quot; is a much weaker cue since it is often used in articles describing military episodes that are not terrorist in nature. Similarly, the word &amp;quot;casualties&amp;quot; by itself is not a strong relevancy cue since many articles discuss casualties in the context of military acts. But the expression &amp;quot;no casualties&amp;quot;/s highly correlated with relevance since it often refers to civilians. We will refer to linguistic expressions that are strong relevancy cues as relevancy signatures.</Paragraph>
    <Paragraph position="3"> In our system, these linguistic expressions are represented by ordered pairs of lexical items and concept nodes where the lexical item acts as a trigger for the concept node. For example, the pattern &amp;quot;was found dead&amp;quot; is represented by the pair (&amp;quot;dead&amp;quot;, Sfound-dead-pass$) where dead is the key word that triggers the concept node Sfound-dead-pass$ which in turn activates enabling conditions that expect the passive form of the verb &amp;quot;found&amp;quot; to precede the word dead.</Paragraph>
    <Paragraph position="4"> By taking advantage of the text corpus and answer keys used in MUC-3, we can automatically derive a set of relevancy signatures that will reliably predict the relevance of new texts. The following section describes the algorithm that derives a set of relevancy signatures from a training corpus and then uses those signatures to classify new texts.</Paragraph>
  </Section>
  <Section position="8" start_page="225" end_page="226" type="metho">
    <SectionTitle>
THE RELEVANCY SIGNATURES
ALGORITHM
</SectionTitle>
    <Paragraph position="0"> MUC-3 provided its participants with a corpus of 1300 news articles for development purposes and two additional sets of 100 texts each that were made available for test runs (the TST1 and TST2 texts). All of the MUC-3 texts were supplied by the Foreign Broadcast Information Service and they were drawn from a variety of news sources including wire stories, transcripts of speeches, radio broadcasts, terrorist communiques, and interviews. The MUC-3 text corpus was supplemented by hand-coded case frame instanfiations (answer keys) for each text in the corpus.</Paragraph>
    <Paragraph position="1"> The MUC-3 text corpus and answer keys therefore gave us access to 1500 texts and their correct relevancy classifications. For our experiments, we set aside a small portion of this corpus for testing purposes and dedicated the remaining texts to the training set. The training set was then used to derive a set of relevancy signatures.</Paragraph>
    <Paragraph position="2"> The Relevancy Signatures Algorithm is fairly simple.</Paragraph>
    <Paragraph position="3"> Given a set of training texts, we parse each text using CIRCUS and save the concept nodes that are produced during the parse along with the lexical items that triggered those concept nodes. As we parse the training texts, we update two statistics for each word/concept node pair: \[1\] the number of times that the pair occurred in the training set (N), and \[2\] the number of times that it occurred in a relevant text (NR). The ratio of NR over N gives us a &amp;quot;reliability&amp;quot; measure. For example, .75 means that 75% of the instances (for that pair) appeared in relevant texts.</Paragraph>
    <Paragraph position="4"> Using these statistics, we then extract a set of &amp;quot;reliable&amp;quot; lexical item/concept node pairs by choosing two values: a reliability threshold (R) and a minimum number of occurrences (M). The reliability threshold specifies the minimum reliability measure that is acceptable. For example, R=90 dictates that a pair must have a reliability measure greater than 90% in order to be considered reliable.</Paragraph>
    <Paragraph position="5"> The minimum number of occurrences parameter specifies a minimum number of times that the pair must have occurred in the training set. For example, M=4 dictates that there must be more than 4 occurrences of a pair for it to be considered reliable. This parameter is used to eliminate pairs that may have a very high reliability measure but have dubious statistical merit because they appeared only a few times in the entire training set. Once these parameters have been selected, we then identify all pairs that meet the above criteria. We will refer to these reliable word/concept node pairs as our set of relevancy signatures.</Paragraph>
    <Paragraph position="6"> To illustrate, here are some relevancy signatures that were derived from the corpus using the parameter values, R=90 and M=10 along with some text samples that are recognized by these signatures:  To classify a new text, we parse the text and save the concept nodes that are produced during the parse, along with the lexical items that triggered them. The text is therefore represented as a set of these lexical item/concept  node pairs. We then consult our list of relevancy signatures to see if any of them are present in the current text If we find one, the text is deemed to be relevant If not, then the text is deemed to be irrelevant It is important to note that it only takes one relevancy signature to classify a text as relevant.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML