File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/m93-1024_metho.xml

Size: 11,700 bytes

Last Modified: 2025-10-06 14:13:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="M93-1024">
  <Title>Tagged Sentences Tagger Lisp-readable sentences CompletedChart Template Generator, Target Templates Input Text System Knowledge Bases</Title>
  <Section position="3" start_page="295" end_page="295" type="metho">
    <SectionTitle>
SYSTEM WALKTHROUGH
</SectionTitle>
    <Paragraph position="0"> We now describe our system's processing of the walkthrough article, 2789568 : In the second quarter of 1991, Nikon Corp . (7731) plans to market the &amp;quot;NSR1755EX8A,&amp;quot; a new stepper intended for use in the production of 64- Mbit DRAMs . The stepper will use an 248-nm excimer laser as a light source and will have a resolution of 0.45 micron, compared to the 0.5 micron of the company's latest stepper.</Paragraph>
    <Paragraph position="1"> Nikon will price the excimer laser stepper at 300-350 million yen, and the compan y expects to sell 50 systems during the initial year of marketing.</Paragraph>
    <Paragraph position="2"> The response generated by LINK for this article and the answer key are shown in figures 2 and 3 .</Paragraph>
    <Paragraph position="3"> We will describe the behavior of each module on the example article . The tokenized walk-through file is shown below:  (In the second quarter of 1991 1,1 &amp;quot;Nikon Corp&amp;quot; 1(I 7731 1)1 plans to market the &amp;quot;NSR-1755EX8A&amp;quot; 1,1 a new stepper intended for use in the production of 64 &amp;quot;Mbit DRAMs&amp;quot; 1 .1 ) (The stepper will use an 248 nm excimer laser as a light source and will hav e a resolution of 0.45 micron 1,1 compared to the 0 .5 micron of the company I'SI latest stepper 1 .1 ) (Nikon will price the excimer laser stepper at 300 to 350 million yen 1,1 and the company expects to sell 50 systems during the initial year of marketing</Paragraph>
    <Paragraph position="5"> All three of the sentences from the walkthrough example are passed through the filter for further processing . The first sentence mentions &amp;quot;Nikon Corp&amp;quot; and has other meaningful words ; the second sentence has the word &amp;quot;use&amp;quot; and other meaningful words ; and the third sentence has the word &amp;quot;company&amp;quot; along with other meaningful words .</Paragraph>
    <Paragraph position="6"> Quoted strings are further analyzed by the tagger, to determine what type of object the y are likely to be. The completely tagged walkthrough file is shown below :</Paragraph>
  </Section>
  <Section position="4" start_page="295" end_page="299" type="metho">
    <SectionTitle>
(IN THE SECOND QUARTER OF 1991 1,1 ( :COMP-NAME NIKON CORP) 1(1 7731 1)1 PLANS
TO MARKET THE ( :NAME NSR-1755EX8A) 1,1 A NEW STEPPER INTENDED FOR USE IN THE
PRODUCTION OF 64 MBIT DRAMS \ .)
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="5" start_page="299" end_page="300" type="metho">
    <SectionTitle>
(THE STEPPER WILL USE AN 248 NM EXCIMER LASER AS A LIGHT SOURCE AND WILL HAVE
A RESOLUTION OF 0.45 MICRON 1,1 COMPARED TO THE 0 .5 MICRON OF THE COMPANY
I'SI LATEST STEPPER \ . )
(NIKON WILL PRICE THE EXCIMER LASER STEPPER AT 300 TO 350 MILLION YEN 1,1 AND
THE COMPANY EXPECTS TO SELL 50 SYSTEMS DURING THE INITIAL YEAR OF MARKETIN G
</SectionTitle>
    <Paragraph position="0"> \ ) The tagger has used the company indicator &amp;quot;Corp&amp;quot; to specify &amp;quot;Nikon Corp&amp;quot; as a compan y name. &amp;quot;NSR-1755EX8A&amp;quot; was not in the lexicon, nor did it have any additional indicators, s o it was assumed (correctly) to be a proper name . The string &amp;quot;Mbit DRAMs&amp;quot; was not tagge d because each word is known to the tagger to be an acronym / abbreviation . These words are simply passed along, and the lexicon provides the appropriate information for them .</Paragraph>
    <Paragraph position="1"> Before parsing, the chart for the parser (as described below) is built adding constituents fo r each word or tagged item. When the parser reads a tagged item from the input sentence, i t simply makes an entry in the chart at that position with the semantic type corresponding to th e tag and the words contained in the item . For example, ( :COMP-NAME NIKON CORP) turn s into an entry with type Company, and name &amp;quot;Nikon Corp&amp;quot; .</Paragraph>
    <Paragraph position="2"> The parser is not successful at completely parsing any of these sentences . This primaril y because the grammar and lexicon are lacking several necessary pieces of information. In the first sentence, &amp;quot;plan&amp;quot; is not marked in the lexicon as taking an infinitival complement . Thus, the construction cannot be parsed . There is also no grammar rule for parsing a determiner followe d by a name as a noun phrase (&amp;quot;the NSF-1766EX8A&amp;quot;) . Had this sentence read, &amp;quot; . . . market the NSF-1766EX8A stepper,&amp;quot; the partial parse would have been more complete . As it is, only th e following information can be extracted from this sentence :  Except for &amp;quot;market&amp;quot;, none of the verbs in this sentence were defined in our lexicon a s interesting ; thus, none of them are included in the partial parses sent on to the postprocessor .  Because &amp;quot;Nikon Corp&amp;quot; and the name of the stepper are not attached to anything, the post processor does not know where in the final template these should be placed . Thus, they ar e discarded . STEPPER, however, results in the production of a LITHOGRAPHY template, an d the DRAM is attached as the DEVICE, resulting in the response shown in figure 2 .</Paragraph>
    <Paragraph position="3"> No additional information is extracted from sentences 2 and 3 . In sentence 2, the tex t &amp;quot;will have a resolution of 0 .45 micron, compared to the 0.5 micron of the company's lates t stepper&amp;quot; was not parsed well enough for the system to realize that 2 different steppers are bein g described. Granularity specifications were not handled well by the postprocessing rules . Had the granularities been successfully attached to the representations of the two steppers, the n our system would have produced two different LITHOGRAPHY templates, because differen t granularities would have caused unification of the two steppers to fail . Thus, the respons e would have contained two separate templates . However, the granularities were not successfull y incorporated into the templates, resulting the steppers being merged into a single template .</Paragraph>
    <Paragraph position="4"> The final sentence provides another opportunity to identify &amp;quot;Nikon Corp&amp;quot; as being the MAN-UFACTURER and DISTRIBUTOR of the LITHOGRAPHY technique . However, again, the word &amp;quot;price&amp;quot; was not defined in our lexicon as a verb relevant to the domain, so the informatio n was ignored .</Paragraph>
  </Section>
  <Section position="6" start_page="300" end_page="301" type="metho">
    <SectionTitle>
ANALYSIS OF PERFORMANC E
</SectionTitle>
    <Paragraph position="0"> The LINK system's performance on the MUC-5 English microelectronics test set is shown i n figure 4. Our system's performance is relatively precision-oriented . We suspect that this is du e to the fact that our approach attempts complete analyses of each sentence . Thus, information which is extracted is relatively reliable, while additional information may be missed .</Paragraph>
    <Paragraph position="1">  Our system was tunable its use of partial parses that were used to generate templates . In its most conservative setting, only partial parses whose semantic interpretations involved importan t actions (e.g., DEVELOP, SELL, etc.) were used in postprocessing . The system could be mad e less conservative by expanding the types of partial parses that were used in tempalte generation . In its least conservative setting, even single words might be chosen as interesting partial parses ,  resulting in the generation of a template . For example, the appearance of the word &amp;quot;CVD &amp;quot; could result in the generation of a LAYERING template with TYPE field CVD .</Paragraph>
    <Paragraph position="2"> For the test run, we used the system in its least conservative setting . During development testing, we found that this setting resulted in approximately 50% improvement in recall rate s without adversely affecting precision. We believe that this reflects the English microelectronic s domain. Since the vocabulary used in articles in this domain consisted of a large number of technical terms not normally used in most English texts, the extraction of information base d on occurrence of these words without analysis of their surrounding context was a relatively saf e thing to do. In other domains, it is likely that the use of single-word partial parses would resul t in significant reduction in precision.</Paragraph>
    <Paragraph position="3"> Our system's precision results did suffer from the fact that templates were sometimes produced that contained so little information that they could not be matched by the scorer t o answer key templates. These templates were counted by the scorer as spurious, reducing ou r precision score . We plan to analyze our results further to calculate the system's precision ha d it not produced these unmatchable templates .</Paragraph>
    <Paragraph position="4"> Of interest is our system's performance on text filtration. The 99% recall, 75% precision performance is much higher than what might be expected given LINK's overall recall/precision rates . We suspect that these results are due to our system's full-analysis approach .</Paragraph>
    <Paragraph position="5"> Our system is far from mature . Due to lack of resources this year, the total developmen t time for the system totaled only about 6 person-months . This represents about 1/3 of the development time of our MUC-4 system . Thus, the knowledge base of the system is still quit e incomplete. This resulted in the low recall performance of the system . Further development of the knowledge base is likely to greatly improve system performance .</Paragraph>
    <Section position="1" start_page="301" end_page="301" type="sub_section">
      <SectionTitle>
System Training
</SectionTitle>
      <Paragraph position="0"> We used two specialized techniques to aid in the development of the system knowledge bases . The first was to use the development keys as a sort of pocket dictionary for some of th e important and often-used words . We did this by extracting all the slot fillers and their type s from the templates . For all the string fills, we added the string directly to the dictionary wit h the semantic type that was derived from the slot that it filled . Many of the set fills were also added verbatim to the lexicon, since in this domain set fills were often technical terms (e .g. , CVD). Other lexicon entries were simply created by either expanding the set-fill abbreviations or abbreviating the full-text set-fills.</Paragraph>
      <Paragraph position="1"> The other main training source came as a result of the tagger . Since the tagger made it possible to recognize proper names that were not in the lexicon by analyzing strings of capitalize d words, we used the tagged items to hypothesize new lexicon entries . This was only done fo r items that the tagger was sure of, like company names (strings that ended with &amp;quot;Corp&amp;quot;, &amp;quot;Co&amp;quot; , &amp;quot;Inc&amp;quot;, etc) and person names that started with &amp;quot;Mrs&amp;quot;, &amp;quot;Dr&amp;quot;, &amp;quot;VP&amp;quot;, etc . These definitions were not entered directly into the lexicon, but were put into a separate file so that they could b e reviewed by a knowledge engineer.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML