File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/m93-1010_metho.xml

Size: 37,571 bytes

Last Modified: 2025-10-06 14:13:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="M93-1010">
  <Title>Fl : &amp;quot;BRIDGESTONE SPORTS CO . SAID FRIDAY IT HAS SET UP A JOINT VENTURE &amp;quot; (S (NP (N (NAME &amp;quot;BRIDGESTONE SPORTS CO .&amp;quot;))) (VP (AUX )</Title>
  <Section position="4" start_page="0" end_page="95" type="metho">
    <SectionTitle>
&amp;quot;BRIDGESTONE SPORTS CO . SAID FRIDAY IT HAS SET UP A JOINT VENTURE IN TAIWAN WITH A LOCAL CONCERN
AND A JAPANESE TRADING HOUSE TO PRODUCE GOLF CLUBS TO BE SHIPPED TO JAPAN .&amp;quot;
</SectionTitle>
    <Paragraph position="0"> assigned by the pattern . In both joint ventures and microelectronics, patterns were used to group proper nouns int o company names, organization names, and person names . Continuing with the example sentence discussed above, a pattern recognized the sequence (BRIDGESTONE NP) (SPORTS NPS) (CO . NP) as a company; the pattern' s action substituted the single token (BRIDGESTONE SPORTS CO . CORP), with semantics of corporation .</Paragraph>
    <Section position="1" start_page="94" end_page="95" type="sub_section">
      <SectionTitle>
Fast Partial Parser (FPP)
</SectionTitle>
      <Paragraph position="0"> The FPP is a near-deterministic parser which generates one or more non-overlapping parse fragments spanning th e input sentence, deferring any difficult decisions on attachment ambiguities . When cases of permanent, predictabl e ambiguity arise, the parser finishes the analysis of the current phrase, and begins the analysis of a new phrase .</Paragraph>
      <Paragraph position="1"> Therefore, the entities mentioned and some relations between them are processed in every sentence, whether syntactically ill-formed, complex, novel, or straightforward . Furthermore, this parsing is done using essentiall y domain-independent syntactic information .</Paragraph>
      <Paragraph position="2"> FPP averages about 6 fragments for sentences as complex as in the EJV corpus ; this number is inflated since punctuation usually results in an isolated fragment. Continuing with the same example sentence, Figure 2 show s nine parse fragments as generated by FPP. The Japanese grammar produces smaller fragments by design.</Paragraph>
      <Paragraph position="3"> Semantic Interpreter The semantic interpreter contains two sub-components : a rule-based fragment interpreter and a pattern-base d sentence interpreter. The first was used in MUC-3 and MUC-4 . The rule-based fragment interpreter applies semanti c rules to each fragment produced by FPP in a bottom-up, compositional fashion . Semantic rules are matched based on general syntactic patterns, using wildcards and similar mechanisms to provide robustness . A semantic rule creates a semantic representation of the phrase as an annotation on the syntactic parse . A semantic formula includes a variable (e.g., ?l3), its type, and a collection of predicates pertaining to that variable . There are three basic types of semantic forms: entities in the domain, events, and states of affairs . Each of these can be further categorized as known, unknown, and referential . Entities correspond to the people, places, things, and time intervals of the domain . These are related in various ways, such as through events (who did what to whom) and states of affairs (properties o f the entities). Entity descriptions typically arise from noun phrases ; events and states of affairs are often described in clauses.</Paragraph>
      <Paragraph position="4"> The rule-based fragment interpreter encodes defaults so that missing semantic information does not produce errors , but simply marks elements or relationships as unknown. Partial understanding is critical to text processin g systems; missing data is normal. For example, the generic predicate PP-MODIFIER indicates that two entities are connected via a certain preposition . In this way, the system has a &amp;quot;placeholder&amp;quot; for the information that a certai n structural relation holds, even though it does not know what the actual semantic relation is . Sometimes understanding the relation more fully is of no consequence, since the information does not contribute to the template filling task . The information is maintained, however, so that later expectation-driven processing can use it i f  An important consequence of the fragmentation produced by FPP is that top-level constituents are typically mor e shallow and less varied than full sentence parses . As a result, a fairly high level of semantics coverage can be obtained quite quickly when the system is moved to a new domain . This would not be possible if the semantic rule s were required to cover a wider variety of syntactic structures before it could achieve reasonable performance . In this way, semantic coverage can be added gradually, while the rest of the system is progressing in parallel .</Paragraph>
      <Paragraph position="5"> The second sub-component of the semantic interpreter module is a pattern-based sentence interpreter which applies semantic pattern-action rules to the semantics of each fragment of the sentence . This replaces the fragment combining component used in MUC-4 . The semantic pattern matching component employs the same core engine as the concept-based pattern matcher . These semantic rules can add additional long-distance relations between semanti c entities in different fragments within a sentence. For example, in the English joint-venture domain, we have define d a rule which looks for a sequence of [&lt;ENTITY&gt; &amp;quot;capitalized at&amp;quot; &lt;MONETARY-AMOUNT&gt;] . This rule's actio n creates an OWNERSHIP semantic form, where &lt;ENTITY&gt; is related via the OWNERSHIP-OWNED role an d &lt;MONETARY-AMOUNT&gt; via the OWNERSHIP-CAPITALIZATION role .</Paragraph>
      <Paragraph position="6"> The semantic lexicon is separate from the parser's lexicon and has much less coverage . Lexical semantic entries indicate the word's semantic type (a domain model concept), as well as predicates pertaining to it . For example, here is the lexical semantics for the noun collocation &amp;quot;joint venture&amp;quot; . This entry indicates that the semantic type i s JOINT-VENTURE, and that a &amp;quot;with&amp;quot; or &amp;quot;between&amp;quot; PP argument whose type is ENTITY should be given the rol e PARENT-OF, and a &amp;quot;for&amp;quot; PP argument of type ACTIVITY should be given the role ACTIVITY-OF .</Paragraph>
      <Paragraph position="7"> (defnoun &amp;quot;joint venture &amp;quot; (JOINT-VENTURE ( :CASE ((&amp;quot;with&amp;quot; &amp;quot;between&amp;quot;) ENTITY PARENT-OF) (&amp;quot;for&amp;quot; ACTIVITY ACTIVITY-OF))) ) We used an automatic case frame induction procedure to construct an initial version of the lexicon [4] . Word senses in the semantic lexicon have probability assignments . For MUC-5 probabilities were (automatically) assigne d so that each word sense is more probable than the next sense, as entered in the lexicon .</Paragraph>
      <Paragraph position="8">  In Figure 3 we show the semantic representation that is built for the phrase &amp;quot;THE JOINT VENTURE ,</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="95" end_page="96" type="metho">
    <SectionTitle>
BRIDGESTONE SPORTS TAIWAN CO ., CAPITALIZED AT 20 MILLION NEW TAIWAN DOLLARS&amp;quot; i n
</SectionTitle>
    <Paragraph position="0"> EJV walkthrough article 0592 (this phrase is parsed within a single fragment by FPP). Notice that the JOINT-VENTURE is linked to the OWNERSHIP information via an unknown role, because the interpreter was unable t o determine a specific relationship between the NP &amp;quot;THE JOINT VENTURE, BRIDGESTONE SPORTS TAIWA N CO.,&amp;quot; and the participial modifier &amp;quot;CAPITALIZED AT .. .&amp;quot; The discourse component will further refine the relationship between these two semantic objects to the JV-OWNERSHIP-OF relation .</Paragraph>
    <Section position="1" start_page="95" end_page="96" type="sub_section">
      <SectionTitle>
Discourse Processing
</SectionTitle>
      <Paragraph position="0"> PLUM's discourse component [2] performs the operations necessary to create a meaning for the whole messag e from the meaning of each sentence. The message level representation is a list of discourse domain objects (DDOs) for the top-level events of interest in the message (e .g., JOINT-VENTURE events in the joint-venture domain or CAPABILITY events in the microelectronics domain). The semantic representation of a phrase in the text onl y includes information contained nearby in a sentence ; in creating a DDO, the discourse module must infer other long-distance or indirect relations not explicitly found by the semantic interpreter, and resolve any references in the text .  The discourse component creates two primary structures : a discourse predicate database and the DDOs . The database contains all the predicates mentioned in the semantic representation of the message . When references are resolved , corresponding semantic variables are unified. Any other inferences are also added to the database .</Paragraph>
      <Paragraph position="1"> To create the DDOs, the discourse component processes each semantic form produced by the interpreter, adding it s information to the database and performing reference resolution for pronouns and anaphoric definite NPs . Set- and member-type references may be treated . When a semantic form for an event of interest is encountered, a DDO i s generated, and any slots already found by the interpreter are filled in . The discourse processor then tries to merge th e new DDO with a previous DDO, in order to account for the possibility that the new DDO might be a repeate d reference to an earlier one .</Paragraph>
      <Paragraph position="2"> Once all the semantic forms have been processed, heuristic rules are applied to fill any empty slots by looking a t the text surrounding the forms that triggered a given DDO . Each filler found in the text is assigned a confidence scor e based on distance from trigger . Fillers found nearby are of high confidence, while those farther away receive wors e scores (low numbers represent high confidence ; high numbers low confidence ; thus 0 is the &amp;quot;highest&amp;quot; confidenc e score).</Paragraph>
      <Paragraph position="3"> Following is the DDO for the first JOINT-VENTURE in EJV walkthrough article 0592 :</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="96" end_page="97" type="metho">
    <SectionTitle>
DDO: JOINT-VENTURE
</SectionTitle>
    <Paragraph position="0"> Trigger fragments: &amp;quot;BRIDGESTONE SPORTS CO. SAID FRIDAY IT HAS SET UP A JOINT VENTURE&amp;quot; &amp;quot;THE JOINT VENTURE, BRIDGESTONE SPORTS TAIWAN CO ., CAPITALIZED AT 20 MILLION NEW TAIWAN DOLLARS, WILL START PRODUCTION IN JANUARY 1990 &amp;quot; --------------------------------------------------------------------------------------------------------------------------------JOINT-VENTURE-CO-OF : &amp;quot;BRIDGESTONE SPORTS TAIWAN CO .&amp;quot; (score = 0) JV-PARENT-OF: &amp;quot;BRIDGESTONE SPORTS CO.&amp;quot; (score =1 ) &amp;quot;A LOCAL CONCERN&amp;quot; (score = 2) &amp;quot;A JAPANESE TRADING HOUSE&amp;quot; (score = 2) &amp;quot;GOLF CLUBS&amp;quot; (score = 2) &amp;quot;CLUBS&amp;quot; (score = 2 ) JV-ACTIVITY-OF: &amp;quot;start production&amp;quot; (score = 1 ) &amp;quot;produce golf clubs&amp;quot; (score = 2)  &amp;quot;be shipped to Japan&amp;quot; (score = 2) &amp;quot;with production of 20,000 iron&amp;quot; (score = 2) JV-OWNERSHIP-OF: &amp;quot;capitalized at 20 million new Taiwan dollars&amp;quot; (score =1 ) Each trigger fragment contains one or more words whose semantics triggered this DDO . A DDO can have multipl e trigger fragments if the discourse component determines that the triggers corefer . In this example, a &amp;quot;joint venture &amp;quot; in the first fragment co-refers with &amp;quot;the joint venture&amp;quot; in the second fragment. A score of 0 indicates the filler was found directly by the semantics ; 1 that it was found in the same fragment as a trigger form ; and 2 in the same sentence.</Paragraph>
    <Section position="1" start_page="96" end_page="97" type="sub_section">
      <SectionTitle>
Template Generation
</SectionTitle>
      <Paragraph position="0"> The template generator takes the DDOs produced by discourse processing and fills out the application-specifi c templates. Clearly, much of this process is governed by the specific requirements of the application, considerations which have little to do with linguistic processing. The template generator must address any arbitrary constraints, a s well as deal with the basic details of formatting .</Paragraph>
      <Paragraph position="1"> The template generator uses a combination of data-driven and expectation-driven strategies . First the DDOs foun d by the discourse module are used to produce template objects . Next, the slots in those objects are filled usin g information in the DDO, the discourse predicate database, or other sources of information such as the message heade r (e.g., document number, document source, and date information), statistical models of slot filling (e .g., as in the microelectronics domain to choose among the slots: purchaser/user, developer, distributor, and manufacturer), or from heuristics (e .g., the status of an equipment object is most likely to be IN_USE, or the status of a joint ventur e object is most likely to be EXISTING).</Paragraph>
      <Paragraph position="2">  Parameters in PLU M Many aspects of PLUM's behavior can be controlled by simply varying the values of system parameters . For example, PLUM has parameters to control aspects of tagging, parsing, pattern matching, event merging and slo t filling by discourse, and template filling . An important goal has been to make our system as &amp;quot;parameterizable&amp;quot; a s possible, so that the same software can meet different demands for recall, precision, and overgeneration. TRAINING DATA AND TECHNIQUE S The entire development corpus was used in various ways as training data . PLUM was run over all messages t o detect, debug, and correct any causes of system breaks . The entity name slot for all messages was used to quickl y add names to the domain-dependent lexicon . For both microelectronics applications, statistics on the co-occurrence of particular entities in various roles (developer, manufacturer, etc .) were used as a fall-back model for low-confidenc e relationships detected in the texts.</Paragraph>
      <Paragraph position="3"> The TIPS 1 and TIPS2 sets for all applications were used as blind test sets to measure our progress at least once a week. Throughout, we used the summary output from the scoring procedure to guide our development, rather tha n adding to the lexicon or debugging the system based on particular messages.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="97" end_page="99" type="metho">
    <SectionTitle>
DEALING WITH MULTIPLE LANGUAGES AND MULTIPLE DOMAIN S
</SectionTitle>
    <Paragraph position="0"> Any system that participated in more than one domain in MUC-5 and/or in more than one language ha s demonstrated domain independence and language independence. In PLUM, the text zoner, morphological processing , parsing, and semantic interpretation employ language-independent and domain-independent algorithms driven by dat a (knowledge) bases . Similarly, the discourse algorithms and template generation algorithms are domain- an d language-independent, and are driven by knowledge that is predominantly declarative .</Paragraph>
    <Paragraph position="1"> The issue (or the goal) that all systems must address further is greater automation of the porting process. Our approach has been to rely on probabilistic learning algorithms . Based on our experience in the last two years, several conclusions have emerged :  1. Porting PLUM to a new domain, even in multiple languages, takes much less effort now . Table 1 shows  the effort expended in porting PLUM to the microelectronics domain. In 52 person-days, PLUM was processin g microelectronics articles in both English and Japanese, obtaining reasonable performance. Had we run PLUM at that time on the TIPS3 test sets, scores would already have been impressive in English (an ERR of 74) . For Japanese , performance was 73 on test set TIPS2 . (We quote the score for TIPS2, because it covered only the capabilities fo r which there was data at the time of the TIPS2 version of PLUM .)  was largely balanced ; the performance of the system across languages and domains was also remarkably balanced, a s shown graphically in Figure 5 .</Paragraph>
    <Paragraph position="2"> 2. Annotating data for PLUM's probabilistic model of a new language, even with little language-specific resources, proved easier than anticipated . The only resource available to us at the start was the JUMA N system from Kyoto University, which hypothesizes word segmentation and part of speech for Japanese text .. Our Japanese speakers were able to annotate part of speech and word boundaries at about 1,000 words per hour, and wer e able to annotate syntactic structure at about 750 words per hour . Initial annotation and testing were performed usin g only I6,(XX) words plus the JUMAN lexicon ; therefore, the initial port to Japanese required only about a personweek of annotation effort .</Paragraph>
    <Paragraph position="3">  3. Building lexical resources for a new language or a new domain took only a few person days usin g heuristics. In Japanese, a three step process for hypothesizing proper names reduced the labor involved . First, we ran JUMAN + POST over the training corpus to find the sequence of words and their most likely part of speech i n  context. Then, a finite-state process with a handful of language-specific patterns was run on the result t o hypothesize (previously unknown) proper nouns in the corpus . The patterns were designed for high recall of names, at the expense of low precision ; we measured the effectiveness of the technique as 90% recall at 20% precision . Lastly, a person ran through the hypothesized proper names using KWIC as a resource to quickly eliminate ba d hypotheses. The resulting list of names was made available to all the participants in JJV .</Paragraph>
    <Paragraph position="4"> A simple manual technique also enabled fast semantic categorization of the nouns and verbs of each domain in both languages. Using a KWIC index and the frequency of each noun and each verb in the corpus, we could define abou t 125 words per hour into categories such as HUMAN, CORPORATION, OFFICER, GOVERNMENT-ORGANIZATION, etc. The process could go so quickly by organizing the categories into small menus of at most 12 items, so that a perso n need only make simple discriminations in any pass through a list of words .</Paragraph>
    <Paragraph position="5"> 4. Training new staff to use PLUM effectively proved easier than anticipated. Our team faced trainin g new staff two months before the MUC-5 test, as our single Japanese programmer needed to reduce his involvemen t substantially. Starting at the beginning of June, two Japanese computer science majors, who had just complete d their junior year at college came to BBN . They had had no training in computational linguistics, but had had on e course in artificial intelligence and one in LISP . In June, they learned about data extraction, the joint venture and microelectronics tasks, and how to use PLUM . Since the Japanese articles on packaging and lithography had arrive d much later than the other data, and since we had not touched that data, they focussed on those two capabilitie s starting July 1 . Initially, of course, PLUM had near 100 as an ERR on sets composed primarily of thos e  microelectronics capabilities . As evident in Figure 6, the progress was rapid and dramatic, as the error rate droppe d by 25% in all cases and by almost 50% in some cases.</Paragraph>
    <Paragraph position="6"> Figure 6 : Progress in JME : For development messages involving packaging and lithography, progress o f new staff with minimal training was rapid and dramatic .</Paragraph>
  </Section>
  <Section position="8" start_page="99" end_page="100" type="metho">
    <SectionTitle>
CONCLUSIONS
</SectionTitle>
    <Paragraph position="0"> We began our research agenda approximately three years ago when we build PLUM for MUC-3 . During the past two years, we have focused much of our effort on techniques to facilitate porting our data extraction system (PLUM ) to new languages (Japanese) and to two new domains (joint ventures and microelectronics), as well as infrastructur e development.</Paragraph>
    <Paragraph position="1"> Some of the lessons we learned during our work include the following : Automatic training and acquisition of knowledge bases can yield relatively good performance at reduced labor, a s evidenced, for example, by a quick port to the microelectronics domain (in 2 languages) in 2 person-month s (after which further refinements were made) .</Paragraph>
    <Paragraph position="2"> Domains dominated by jargon (sub-language) may be easier than domains of normal vocabulary because there i s less ambiguity and more predictability . For TIPSTER this means that the microelectronics domain was easier than joint ventures .</Paragraph>
    <Paragraph position="3"> Japanese was easier to process than English because of strong clues provided by case-markers, and a less varie d linguistic structure in the articles.</Paragraph>
    <Paragraph position="4"> * Availability of a large text corpus was invaluable for quick knowledge acquisition . A smaller number of filled templates should still be adequate.</Paragraph>
    <Paragraph position="5"> * Our algorithms were already largely language- and domain-independent ; an important goal remains to further automate the porting process.</Paragraph>
    <Paragraph position="6"> * Finite-state pattern matching is a useful complement to linguistic processing, offering a good fall-back strategy for addressing language constructions that are hard to treat via general linguistically-based approaches . * Continued work on discourse processing is important to improving performance . Reliably determining whe n different descriptions of events or objects in fact refer to the same thing remains one of the hardest problems i n data extraction.</Paragraph>
    <Paragraph position="7"> * Improving syntactic coverage is a priority . Increased coverage normally leads to greater perceived ambiguity i n the system; we hope to counter this through the use of probabilistic models .</Paragraph>
    <Paragraph position="8">  We plan to continue our research agenda emphasizing the use of probabilistic modeling and learning algorithms fo r data extraction in order to continue improving robustness and portability .</Paragraph>
  </Section>
  <Section position="9" start_page="100" end_page="101" type="metho">
    <SectionTitle>
SYSTEM WALKTHROUGHS
</SectionTitle>
    <Paragraph position="0"> the &amp;quot;typo&amp;quot; alias, or the Japanese sport goods maker coreferences .</Paragraph>
    <Paragraph position="1"> * Of those, which could it have gotten 6 months ago (at the previous evaluation)?  [a] &amp; [b] : PLUM could not recognize these 6 months ago .</Paragraph>
    <Paragraph position="2"> [c]: 6 months ago, PLUM did find coreference between &amp;quot;a joint venture&amp;quot; and &amp;quot;the joint venture&amp;quot; . [d]: PLUM could not get any of these 6 months ago .</Paragraph>
    <Paragraph position="3"> * How can you improve the system to get the rest ? [a]: The phrase &amp;quot;local concern&amp;quot; is assigned a semantic type that is a superconcept of CORPORATION. If the discourse module allowed merging of a subconcept event into a superconcept event (something which i s allowed in the microelectronics domain but not currently in the joint venture domain), then PLUM could potentially find this coreference via discourse event merging . However, PLUM's company name recognizer would need to be adapted so that it would not misanalyze the company name &amp;quot;Union Precision Casting Co . &amp;quot; [b]: This is a harder case . In order to find this coreference, PLUM would probably need to recognize that bot h mentions are involved in trading .</Paragraph>
    <Paragraph position="4"> [c]: A more explicit treatment of definite references would help with these cases . Also, better recognition of locations would aid in establishing coreference between the two mentions of the Taiwan company . [d]: In order to recognize the other Bridgestone references, PLUM would need to try to treat misspellings, as wel l as treat the definite reference explicitly .</Paragraph>
    <Paragraph position="5"> (2) Did your system get the OWNERSHIPs, in particular from &amp;quot;. .. THE REMAINDER BY TAGA CO.&amp;quot;? , PLUM did produce an ownership object with 75 and 15 % ownership percentages ; however, the system filled in the owning entities incorrectly . PLUM did not attempt to handle phrases like &amp;quot;the remainder . ..&amp;quot;. PLUM also missed the capitalization information in this example .</Paragraph>
    <Paragraph position="6"> Other comments on walkthrough performance : The PLUM system found 3 tie-up objects instead of 1 . One of the spurious tie-ups resulted from the discours e event triggered by &amp;quot;the new company&amp;quot; not being correctly merged with the earlier mention of the joint ventur e company. The reason for the second spurious tie-up stems from PLUM having identified &amp;quot;Taiwan&amp;quot; (in the phrase &amp;quot;i n Taiwan &amp;quot;) as a corporation, and more precisely, as a joint venture company .</Paragraph>
    <Paragraph position="7"> The lexical entry for &amp;quot;Taiwan&amp;quot; incorrectly lists it as a corporation, as well as a country . Once &amp;quot;Taiwan&amp;quot; wa s identified as a corporation, the pattern [&amp;quot;set up&amp;quot; .. . &lt;company&gt; with &lt;company&gt;] matched the text &amp;quot;set up a join t venture in Taiwan with a local concern and ...&amp;quot;, and &amp;quot;Taiwan&amp;quot; was identified as the joint venture company . Since this joint venture company was found to be different from &amp;quot;Bridgestone Sports Taiwan Co .,&amp;quot; which was also identified as a joint venture company by the system, 2 separate tie-ups were generated .</Paragraph>
    <Paragraph position="8"> After the test was run, we removed the definition of &amp;quot;Taiwan&amp;quot; as a corporation . With this change, the system generated I less tie-up object, and it correctly found the reference between &amp;quot;a joint venture&amp;quot; and &amp;quot;the joint venture .&amp;quot; This correction is reflected in the sample event given in the discourse component description section .</Paragraph>
  </Section>
  <Section position="10" start_page="101" end_page="106" type="metho">
    <SectionTitle>
&amp;quot;UNION PRECISION CASTING CO&amp;quot; was missed because it was not recognized as a possible company name :
</SectionTitle>
    <Paragraph position="0"> capitalization information was not available to help with name recognition (the article was fully capitalized), and the tagging component tagged &amp;quot;casting&amp;quot; as a V, a category which is not allowed to be taken as part of a company name.</Paragraph>
    <Paragraph position="1">  (1) What information triggers the instantiation ofeach ofthe two LITHOGRAPHY objects?  The PLUM system generated 3 lithography objects, all of type UNKNOWN (the key contains 1 LASE R lithography and 1 UNKNOWN lithograpy) . The three triggering phrases are : &amp;quot;a new stepper,&amp;quot; &amp;quot;the stepper,&amp;quot; and &amp;quot;latest stepper&amp;quot; (2) What information indicates the role of Nikon Corp. for each Microelectronics Capability ? The PLUM system initially finds Nikon Corp . as the manufacturer of each of the 3 capabilities (in the key, Niko n is the manufacturer of the LASER lithography and the manufacturer and distributor of the UNKNOWN lithography) . Nikon was associated with each of the three capabilities because it occurred in the same sentence . Our statistical model of entity&lt;-&gt;capability relationships indicated that Nikon was most likely to be a manufacturer, so it wa s placed in this role.</Paragraph>
    <Paragraph position="2"> We actually found Nikon as a distributor of all 3 capabilities, but we removed this relation as it was determined t o be unlikely by our statistical model . Nikon was thought to be a distributor because of the trigger verbs &amp;quot;market&amp;quot; an d &amp;quot;sell .&amp;quot; The discourse rule then picked up Nikon Corp . (at score I) as the agent of this verb . (3) Explain how your system captured the GRANULARITY information for &amp;quot;The company's latest stepper . &amp;quot; The granularity phrase &amp;quot;a resolution of 0 .45 micron&amp;quot; was correctly understood by the semantics component an d was associated with the appropriate lithography object via a discourse rule. However, the granularity filler was ruled out by the template generator because its confidence score fell outside the threshold set for this slot (the threshol d setting is tailored to provide the best overall system performance) . Consequently, the granularity information did not appear in the response template .</Paragraph>
    <Paragraph position="3">  (4) How does your system determine EQUIPMENT TYPE for &amp;quot;the new stepper&amp;quot;? &amp;quot;the company's latest stepper&amp;quot; ? Equipment types are defined hierarchically in PLUM 's domain model . The word &amp;quot;stepper&amp;quot; is linked to the concep t STEPPER in the domain model and triggers a STEPPER discourse event . The template generator translate s STEPPER events into equipment objects of type STEPPER . So the equipment_type is based on the domain mode l concept that is associated with the trigger phrase.</Paragraph>
    <Paragraph position="4"> (5) How does your system determine the STATUS of each equipment object ? The equipment status is defaulted to IN_USE.</Paragraph>
    <Paragraph position="5"> (6) Why is the DEVICE object only instantiated for LITHOGRPAHY-I ? PLUM's discourse heuristic for finding a process's device only looks within the same sentence . In this article, the 64-mbit DRAM device is in the same sentence as the first lithography object, but no other .</Paragraph>
    <Paragraph position="6"> Other comments on walkthrough performance: The PLUM system found 3 microelectronics capabilities instead of the 2 in the answer key . The spurious capability results from a discourse referencing problem : the lithography object triggered by the definite phrase &amp;quot;th e stepper&amp;quot; was not found to be coreferential with the lithography object triggered by &amp;quot;a new stepper&amp;quot; in the previous sentence . PLUM's definite referencing mechanism is controlled by a parameter When this parameter is turned on , PLUM correctly resolves the definite reference in this example, and only 2 lithography capabilities are generated . However, turning the parameter on negatively affects scores overall, so it was off for the MUC-5 test . The walkthrough article exemplifies our use of the entity relation statistical model . The PLUM system, throug h discourse processing, had hypothesized Nikon Corp as both the distributor and manufacturer of &lt;LITHOGRAPHY 2789568-1&gt;, as the distributor of &lt;LITHOGRAPHY-2789568-2&gt;, and as the distributor and purchaser/user o f &lt;LITHOGRAPHY-2789568-3&gt;. However, these relations were found at a fairly low confidence by a discourse search rule looking around within the sentence . On the other hand, the statistical model (derived from training data ) indicated that Nikon is most likely to be a manufacturer . So the template generator removed the unlikely relation s (distributor and purchaser/user) and entered the likely relation of manufacturer . Compared against the key, th e statistical model was correct in removing the purchaser/user relation but incorrect in removing one of the distributo r relations.</Paragraph>
    <Paragraph position="7"> The PLUM system incorrectly generated only 1 stepper equipment object . This was because the discourse even t triggered by &amp;quot;the company's latest stepper&amp;quot; was incorrectly merged with the earlier stepper events . If PLUM could have recognized the two granularities and associated them with the 2 different stepper objects (at a high level o f confidence), this over-merging error could have been prevented.</Paragraph>
    <Paragraph position="8"> The device size information was missed because PLUM failed to correctly analyze the sequence &amp;quot;64- mbit dram .&amp;quot; Since running the walkthrough message for the MUC-5 test, this problem has been fixed, so the device siz e information in this article is now reported .</Paragraph>
    <Paragraph position="9">  (3) How does your system determine the entities in a tie-up ?  Some entities are picked up directly in the semantics when parsed within a fragment, or via lexical clues an d syntactic/semantics contexts within phrases . Others are picked up via discourse rules .</Paragraph>
    <Paragraph position="10"> (4) How many discourse entities were identified anywhere in the text, and how did the system determine which o f these were reportable? The template generator's parameters were set to only output objects directly related to a tie-up .</Paragraph>
    <Paragraph position="12"> (5) Explain any difficulties you had in identifying thefollowing: a) the correct number of reportable entities Since the system doesn't handle conjunction of company names well, it missed one company . b) the correct number of tie-ups (correct, for the sake of this walk-through allows BOTH interpretation s described in b) above, even though the key template does not.) There was some overgeneration due to under-merging by the discourse component .</Paragraph>
    <Paragraph position="13"> c) the correct links between reportable entities and reportable tie-ups.</Paragraph>
    <Paragraph position="14"> Since entities are only hypothesized through tie-up patterns, this is not a problem . (6) How does your system determine aliases for entities ? The system tests all noun phrases and their parts for concatenations of substrings of a company name . (7) What problems were there in detecting the alias for the ENTITY named Toukyou Kaijou Kasai Hoken ?  There was no problem .</Paragraph>
    <Paragraph position="15"> (8) Sentence 2 ends with a general statement about products developed in tie-ups between insurance companies an d securities companies. How would your system determine that this is a generic, not a specific reference ? No tie-up is generated when no proper names of companies are mentioned.</Paragraph>
    <Paragraph position="16">  Questions to Address : (1) How does the system determine the existence of a reportable microelectronics capability? If a sentence includes equipment and a verb expressing an ME activity, and it matches a sentence pattern, then aME capability object is created .</Paragraph>
    <Paragraph position="17"> (2) Three entities are mentioned in this article. How did your system determine which were involved in the ME capability? (If the joint venture company was not selected, was it rejected because its activity was in die .future, or some other basis?) Only company names which fit the ME capability patterns are considered . Our system did not select the joint venture company, since it did not match any company name patterns .</Paragraph>
    <Paragraph position="18"> (3) How does the system identify company names? How does it associate locations with entities? Locations are associated with entities by using patterns lik e &amp;quot;...-maker, XXXX ( headquarters YYYY ...&amp;quot; and &amp;quot; ...America's biggest . .. company, XXXX&amp;quot; where XXXX is a company name and YYYY is a location .</Paragraph>
    <Paragraph position="19"> (4) How does your system associate film type with each ME capability? (In this article &amp;quot;CVD&amp;quot; is immediatel y preceded by &amp;quot;metal film.&amp;quot; Will your present strategy allow more remote references? ) First, film names are extracted by means of clue words . Then, if these names match the sentence patterns, they ar e matched with film types according to the domain model . The order of &lt;film&gt;, &lt;equipment&gt;, and &lt;verb&gt; is not fixe d in the system, but currently must be within the same sentence .</Paragraph>
    <Paragraph position="20"> (5) How does you system determine the existence of reportable equipment? How is equipment type determined? (Would the determination of a new equipment type generate a new ME capability?) Equipment names are extracted by means of clue words . Their types are decided according to a hierarchy o f equipment types. A new equipment type would not by itself generate a new capability . Equipment objects are onl y reported if some slot besides STATUS is filled .</Paragraph>
  </Section>
  <Section position="11" start_page="106" end_page="106" type="metho">
    <SectionTitle>
ACKNOWLEDGMENT S
</SectionTitle>
    <Paragraph position="0"> The work reported here was supported in part by the Defense Advanced Research Projects Agency and was monitored by the Rome Air Development Center under Contract No . F30602-91-C-0051 . The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing th e official policies, either expressed or implied, of the Advanced Research Projects Agency or the United State s</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML