XML Viewer - c90-2034

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2034_metho.xml
Size: 17,236 bytes
Last Modified: 2025-10-06 14:12:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-2034">
  <Title>To Parse or Not to Parse: Relation-Driven Text Skimming</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Relation-Driven Skimming
</SectionTitle>
    <Paragraph position="0"> t{elation-driven skimming takes advantage of the theory that most conceptual information derives front linguistic relations that do not depend on a complete surface structure. Such relations, like subjeclpredicale and verb-complemenl, carry constraints such as agreement or selectional restrictions. Since the bulk of the complexity of most language analyzers comes from the combinatorics of parsing, finding relations without a complete syntactic analysis helps p (? r for Irl a, li ce.</Paragraph>
    <Paragraph position="1"> The relation-driven skimming algorithm has three components: (r) 73e coT~ccpl aclivalion component makes a first pass through the text and selects candidate concepts that may contribute to its semantic interpretation. null The scgmc~zlalion compone~t tells the program what to parse, what to skip, and where to use seinantic inlbrmation for attachment.</Paragraph>
    <Paragraph position="2"> , The allachmer, l cornpo'acnZ identifies linguistic relations in the input text that contribute to its semantic interpretation, even where segments have been skipped.</Paragraph>
    <Paragraph position="3"> The trick to relation-driven skimming is to perform attachment as accurately as possible with as little grammatical analysis as possible. This is no simple task, because phrases with no relevant semantic content can always affect the attachment of relevant phrases. In the sections that follow, we will give for each of the components above an observation of why it works, its main activity, and an example or two of its operation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Concept Activation
</SectionTitle>
      <Paragraph position="0"> Concept activation uses lexical analysis of words, combinations, and ()ther features to determine</Paragraph>
      <Paragraph position="2"> whether a portion of text is likely to be relevant.</Paragraph>
      <Paragraph position="3"> The concept activation component makes a single pass through the input text, producing a sequence of conceptual categories that may contribute to the conceptual interpretation.</Paragraph>
      <Paragraph position="4"> Observation: The density of relevant content words in a section of text generally determines the degree of processing required for semantic analysis. Activity: Divide content words into two categories: &amp;quot;triggers&amp;quot;, or relation heads, and role fillers. Scan the text using domain knowledge for words or combinations that might be triggers, and for words or combinations that might be fillers.</Paragraph>
      <Paragraph position="5"> Example: The following is the input text and output of concept activation for the Revere example: Input: Revere said it had received an offer from an investor group to be acquired for $16 a share, or about $127 million.</Paragraph>
      <Paragraph position="7"> This process is more than a lexicM lookup. Some words, like rumor or target in the corporate takeover stories, are indeed &amp;quot;triggers&amp;quot; directly associated with important concepts. However, considering all words that might contribute to an important concept is inefficient; words such as make, take, iss~te or increase require more analysis of the surrounding context. In these cases, the skimmer looks for combinations of words or concepts (such as received and offer above).</Paragraph>
      <Paragraph position="8"> This prevents the parser from doing a lot of processing around low-content words.</Paragraph>
      <Paragraph position="9"> Whether a word is contentful or not depends on context. Some words, like plan, do not themselves carry much information but must be understood because they distinguish the agent of anot, her action.</Paragraph>
      <Paragraph position="10"> &amp;quot;Acme rejects an offer&amp;quot; and &amp;quot;Acme plans an offer&amp;quot; place Acme in different roles (i.e. the target and suitor, respectively). A concept like plan, therefore, appears in the list. of activated concepts only when there are takeover events in the local context.</Paragraph>
      <Paragraph position="11"> Concept activation eliminates processing of unimportant sentences and clauses and helps efficiency in contentful sections, mainly by determining relations (such as role-filler) that help to guide syntactic analysis. The next phase, text segmentation, uses the results of concept activation to control parsing.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Text Segmentation
</SectionTitle>
      <Paragraph position="0"> The text segmentation phase groups the text around words that are concept activators, identifying noun groups and complement structures after verbs, and finding punctuation or words that separate segments of text. This phase determines (1) where to skip and (2) where to limit parsing.</Paragraph>
      <Paragraph position="1"> Observation: It is generally possible to reconstucL the important relations of a text in spite of skipping over intervening words and phrases.</Paragraph>
      <Paragraph position="2"> Aetivity: Skip over empty sentences and phrases, and break the combinatorics of parsing where a single parse will do.</Paragraph>
      <Paragraph position="3"> Example: In the Revere example, the segmented (and marked) text is as follows: Revere *skip* it }tad received an offer from a.n investor group *break* to be acquired for $16 a share *break* *skip* $127 million.</Paragraph>
      <Paragraph position="4"> The *skip* token indicates to the parser that there is intervening information, while the *break* token indicates that it should &amp;quot;reduce&amp;quot; or complete all active linguistic structures. Both help to limit complexity-skipping tends to avoid wasted parsing ~m well as the combinatorics of attachment, while the breaks help to avoid considering nmltiple attachments where syntax contributes little or no information.</Paragraph>
      <Paragraph position="5"> The segmentation algorithm includes most important noun phrases, even when separation information prevents them from being attached. This is because, as in the above example, these noun phrases implicitly play a role in anaphoric references or infinitive phrases.</Paragraph>
      <Paragraph position="6"> A side effect of text segmentation is to mark the original text, highlighting sections that are considered relevant. This has a dual effect: (1) It helps to debug the skimming algorithm by showing visually what sections of the text the program has read, and (2) It allows the users of the program quickly to spot key information.</Paragraph>
      <Paragraph position="7"> For example, a typical merger &amp; acquisition story from the Dow Jones examples will appear with relevant sections in boldface, as shown below:</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Mayfalr Gets Buyout Proposal
Maythlr Super Markets Ineo said
</SectionTitle>
      <Paragraph position="0"> that Stanley P. Kaufelt, its chairman, president and chief executive, has proposed a business combination with Mayfair in which the holders of Mayfair's outstanding common stock would reeelve $23.50 a share in cash.</Paragraph>
      <Paragraph position="1"> Text segnlentation confines processing to sections of text that contain important information. The correct semantic interpretation of these text sections often depends on correct syntactic attachment, as described below.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Attachment
</SectionTitle>
      <Paragraph position="0"> The attachment phase produces linguistic relations from the segrnented text. This phase is part of the bottom-up parsing process; the nmin difference between attachment and full parsing is that the parser nmst attempt to form linguistic relations where it has skipped sections of text. Attachment in tile absence of a complete parse relies on rules that combine linguistic and conceptuM information; for example, &amp;quot;a.ttach a verb phrase or infinitive to the most recent clause-level semanticly valid noun phrase&amp;quot;.</Paragraph>
      <Paragraph position="1"> Observation: Attachments by default are much less costly computationally than attachments by exhaustive consideration of possibilities.</Paragraph>
      <Paragraph position="2"> Activity: Prefer attachments within boundaries separated by breaks, and use semantics and recency to guide attachment otherwise.</Paragraph>
      <Paragraph position="3"> Example: In the Revere example, the following relations guide the interpretation process: Revere received an offer... (NP-VP) an offer from an investment group (NP-PP) 196 3 Revere to be acquired... (NP-INFPHR.) acquired for $16 a share (VP-PP) $160 million (NP) The combination of these simple relations permits the correct semantic interpretation without a complete parse (see \[Rau and Jacobs, 1988\] for a discussion of the use of these relations for analysis). In this example, the skimming algorithm reduces the number of parses considered by a factor of six. This is in spite of the fact that the R.evere sentence contains a fair amount of useful intbrmation; in less dense text the skimming program can sldp sentences entirely or extract only two or three relations from a complex sentence (as in the Fidelity exmnple given in the introduction). null The Revere example is more complex than most of the c~es that occur in these texts because it illustrates a number of interacting rules and preferences. It is, however, unusual in that flfll syntactic parsing of this example could be misleading because it would tend to attach &amp;quot;to be acquired&amp;quot; to &amp;quot;investor group&amp;quot; rather than &amp;quot;Revere&amp;quot;. The point of this example is 7~0t, however, that flflI syntactic processing is bad, but rather that skimming can make the necessary attachments without full parsing.</Paragraph>
      <Paragraph position="4"> Complications of Limited Attachment The attachment mechanism makes use of several heuristics for constructing relations fi'om the text, such as the infinitive phrase rule given earlier, resolving re.ferences before attaching pronouns, reconstructing sentences fl'om verb phrases in incomplete sentences, and determining voice before attaching conjunctive verb phrases. These rules have derived from the analysis of fairly large bodies of text. The following are some observations about the sorts of examples where &amp;quot;limited attachment&amp;quot; is necessary: o Dangling Phrases. In many longer texts, prepositional phrases, infinitives, and other adjunct information &amp;quot;hang off&amp;quot; the ends of sentences. Typically, such phrases can be attached syntacticly to rnnltiple heads. The effect of the skimming algorithm is to give more weight to the semantic attachment of these phrases. Since many such examples contain temporal, spatial, or other information associated with events, this semantic attachment seems to provide an advantage over syntactic preferences.</Paragraph>
      <Paragraph position="5"> o Conlunctive Clauses. Conjunctions introduce linguistic complexity. If only part of a conjunctive clause contains useful information, the program (:an identify a relation involving one portion of the coordinated clause without parsing the whole sentence. For example, one news story reads &amp;quot;Investor William Farley...said he plans to seek a special meeting to discuss his proposal and to wage a proxy fight tbr control of the board&amp;quot;.</Paragraph>
      <Paragraph position="6"> Only the second clause contains useful information, although the first clause can help to attach the second.</Paragraph>
      <Paragraph position="7"> Negative h~.formalion. The skimming algorithm, in its application of linguistic relations, must use both positive and negative intbrmation in determining where to attach phrases. Lack of agreement, for example, can override the attachment of a verb phrase to a sernanticly valid subject.</Paragraph>
      <Paragraph position="8"> Case constraints often guide the analysis of pronouns. Semantic infbrrnation tends to provide positive information in these cases, while syntactic information provides negative information.</Paragraph>
      <Paragraph position="9"> Oddly, this is tile reverse of the more typical parsing strategy of using semantics to filter out invalid interpretations.</Paragraph>
      <Paragraph position="10"> It might seem that these complications prcsent enough problems that it would be easier to perform full parsing than to try to derive new heuristics for attachment in all these examples. This is true in some cases, but the vast majority of examples we have encountered require only a few simple attachment preferences. In these &amp;quot;easier&amp;quot; examples, the performance payoff has been enough to keep us from degrading to :full parsing whenever possible.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Comparison with Other
Approaches
</SectionTitle>
    <Paragraph position="0"> Most work in skimming or partial parsing \[Deaong, 1979; Lebowitz, 1983; l,ytinen and Gershman, 1986; Young and Itayes, 1985\] uses template-based or memory-based strategies, effectively using conceptual information in place of linguistic constraints. This approach seems to work in highly constrained texts where conceptual knowledge is sufficient tbr determining role relationships. In the domains that we have tested, the pure template-based approach fails because some role relationships are determined almost entirely from linguistic intbrmat.ion such as complernent structure or agreement, l'br example, the target and suitor of corporate mergers are both companies; thus there is little conceptual information (other than the size of the companies) that helps to determine role-filling. In many classes of tactical operations reports, the agent and object are both military forces, thus correct linguistic attachmen~ is essential in this domain as well.</Paragraph>
    <Paragraph position="1"> Although the overall parsing style of our system integrates template-based and language-based strategies \[Rau and Jacobs, 1988\], the skimming algorithm is actually more bottom-up or language-based.</Paragraph>
    <Paragraph position="2"> Like some of the other major text processing systems such as PI{OTEUS, PUNDIT, and qACH'US \[tIobbs, 1986; Grishman and Ilirschmau, 1986\], tile skimming program applies linguistic constraints and maps linguistic structures into conceptual roles. In these other systems, however, the bottom-up approach may cause the program to waste time on irrelevant sections of text. The difference is that these programs do not really use conceplual infof mation until after the parser has generated its candidate structures. The relation-driven sldmming process shortcuts bottom-up analysis by firs.t using conceptual knowledge to block some fi'uitless paths. As we have had the benefit of comparing our system in some detail with these other programs after operation on a common task, we believe that many such systems could achieve an order of magnitude improvement in processing speed by incorporating a similar method.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 197
5 System Status and Current
Directions
</SectionTitle>
    <Paragraph position="0"> The SC\[SOR system \[Ran and Jacobs, 1988\] was the initial testbed for this algorithm, is a completed prototype that reads news stories at the rate of about 500 per hour. It extracts certain key information from stories about corporate takeovers (typically about 10% of the texts), identifying target, suitor, purchase price, and other information with about 90% accuracy. null The generic text processing components of SCISOR, known as the GE NLToolset \[Jacobs and Rau, 1990\], are used in applications in the operations of GE. Our group applied this core of text processing tools, including the skimming procedures described here, to the MUCK-II task, which consisted of generating database templates from naval operations messages, during a period of several weeks before the conference. The skimming algorithm of the NLToolset was the key to producing good results so rapidly. The same text processing system has since applied to a number of message sets in other domains.</Paragraph>
    <Paragraph position="1"> The improvements in speed from skimming have so far come without a degradation in accuracy. This does not, however, mean that the attachment heuristics are infallible. Clearly, examples can occur where text that has been skipped influences the attachement of key phra~ses, especially when texts contain ellipsis, anaphorie references, and complex coordinated structures. Future enhancements to our algorithm must refine the attachment rules for these cases, and degrade to full parsing where necessary.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML