File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/m93-1005_metho.xml

Size: 16,540 bytes

Last Modified: 2025-10-06 14:13:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="M93-1005">
  <Title>DOMAIN AND LANGUAGE EVALUATION RESULTS</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DOMAIN PERFORMANC E
</SectionTitle>
    <Paragraph position="0"> Summary scores by domain for error per response fill (official All-objects scoring) averaged for MUC-5 site s in Table 1 indicate a slightly better performance in microelectronics than in joint ventures for both languages . tThis performance characteristic is also reflected in the individual languages within domains in the summary of languag e performance in Table 4 .</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="45" type="metho">
    <SectionTitle>
LANGUAGE JOINT VENTURE MICROELECTRONICS
AVERAGE RANGE AVERAGE RANGE
</SectionTitle>
    <Paragraph position="0"> ENGLISH/JAPANESE 74 54-84 70 58-86 TABLE 1: ERROR AVERAGE/RANGE BY DOMAINS FOR MUC-5 SITES  This performance difference by domain can be interpreted by examining differences between the domains i n the information defined for extraction . Domain differences for the only object type, ENTITY, that was defined for both domains will be presented first, followed by differences for unattempted slots, i .e. those slots that some or all systems left unfilled .</Paragraph>
    <Paragraph position="1"> Shared Slots in ENTITY Object Defining information extraction tasks entails identifying (1) the pieces of information to be extracted, (2) ho w the pieces are related, and (3) how those pieces are to be represented in a database. The two domains in MUC-5 define different tasks and so vary along those three parameters, which are collectively called the &amp;quot;reporting conditions .&amp;quot; This variation in reporting conditions must be taken into account when examining results for a shared object extracted i n two different domains. For example, in scoring performance for theENTITY object in the JV and ME domains, what is being evaluated is not just how systems extract entities, but how systems extract entities given the reportin g conditions of the domain. Whereas in the JV domain, systems mainly extract principals in tie-ups or newly forme d companies, in the ME domain, they extract entities in terms of their relation to processes as developers, manufacturers, distributors, purchasers, or users of microelectronics technology .</Paragraph>
    <Paragraph position="2"> Table 2 presents the error per response fill for the shared slots for the ENTITYobject for the three TIPSTER sites that participated in both languages for both domains . These scores have been averaged across the two languages . In general, sites have a slightly better performance for the four slots in the JV domain than in the ME domain . The effect of reporting conditions for the two domains may be evident here . In the JV domain, the identification of the tie up event is the central task.There are only two role distinctions to be made for the entities involved as either a principa l in the tie-up or a newly formed joint venture company. In contrast, in the ME domain, the identification of the process and its attributes is the central task. Entity recognition, though pre-requisite to instantiate an ME capability, is actuall y in many ways auxiliary to the ME process itself . In addition, the entity must be assigned one of four different roles (developer, manufacturer, distributor, or purchaser/user), where no one slot dominates in terms of the number o f expected fills. Thus, the entity recognition task in the ME domain is in some ways harder than in the JV domain .</Paragraph>
  </Section>
  <Section position="5" start_page="45" end_page="46" type="metho">
    <SectionTitle>
BBN GE/CMU/MM NMSUBRANDEIS
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="45" end_page="46" type="sub_section">
      <SectionTitle>
Unattempted Slots
</SectionTitle>
      <Paragraph position="0"> The effect of differences in the information defined for extraction on performance differences between the two domains can also be examined by reviewing unattempted slots . 2Here, unattempted slots are defined as slots where actual is 0 and possible is greater than 0. Although a wide range of factors affect whether a site attempts a particula r slot (e.g., its difficulty, fill frequency, clarity of definition in fill rules, or stability of definition in fill rules versions) , 2. This approach views a task independent of evaluation and its affect upon system development strategies.</Paragraph>
      <Paragraph position="1"> It ignores the fact that some objects appear more often and therefore contribute more to evaluation scores , which may shape the development efforts of sites seeking to maximize their scores by concentrating on hig h pay-off slots while ignoring slots with little pay-off in scoring .</Paragraph>
      <Paragraph position="2">  for this discussion I will take the position that for each application a task is defined in terms of a certain number o f objects with slots. A task requires a certain amount of work and each slot receives development effort . In the N domain, there are ten objects with a total of 44 scored slots ; in the ME domain there are nine objects with a total of 4 4 scored slots.</Paragraph>
      <Paragraph position="3"> Review of unattempted slots in each domain allows us to determine how sites redefine the task in each domai n by eliminating some subset of objects or slots from the task. Table 3 below indicates the task reduction averaged for MUC-5 sites for the two domains, calculated for each site by dividing the number of unattempted slots by the tota l number of slots. Even though performance differences for the ENTITY object indicate a somewhat better entity recognition for the JV domain (given reporting conditions) than for the ME domain, clearly, sites in the N domain ar e more likely to reduce the task definition regardless of language. In both languages for N. sites mainly redefine the task either by not filling slots in the Activity, Facility . Revenue, and Time Objects, or by not instantiating the objec t at all. In both languages for the ME domain, sites redefine the task mainly by not filling a subset of slots in the Etching . Packaging, and Equipment objects. There are no cases in ME where an entire object is not attempted .</Paragraph>
      <Paragraph position="4"> This discrepancy in the extent of task redefinition between domains offers evidence of differences in tas k complexity that help us interpret the performance differences between domains . The greater likelihood for sites in the JV domain to eliminate slots and/or objects offers support to the view that the task is more complex for the N domai n than for the ME domain . The JV template design is a more complex structure, with a deeper set of embedded objects . Most of the unattempted N slots are in the more deeply nested objects . The exception to this, the Activity object, i s not part of the core template task for N.</Paragraph>
      <Paragraph position="5"> Discrepancies in development effort between domains for the TIPSTER sites further support the apparen t greater complexity of the JV task . Notwithstanding the later start date for the ME domain and the more drastic revision process for the N domain, all of the TIPSTER sites reduced the N task more than the ME task. Moreover, the fact that three of the four sites working in both domains estimate that they expended considerably more development effort on N than ME may further support that view .</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="46" end_page="49" type="metho">
    <SectionTitle>
DOMAIN
ENG/JPN ENG JPN
AVG AVG RANGE AVG RANGE
</SectionTitle>
    <Paragraph position="0"> This performance difference by language can be interpreted by analyzing how information availability (i.e., the amount of data of a given kind in the text that can be extracted) and information presentation (i .e., the manner in which different kinds of information are expressed) reflect the similarities and differences in evaluation results  between the two languages . Language differences by object and slot will be presented first for the ME domain and then for the JV domain .</Paragraph>
    <Paragraph position="1"> Microelectronics Domain: Impact of Information Availability Evaluation results from the ME domain illustrate how the amount of information available in the corpu s affects performance . In the ME domain, the application is directed at tracking microelectronics capabilities as evidenced in advances in four specific chip fabrication processes (lithography, layering, etching, and packaging) . Identifying one of these processes associated with some entity triggers the tracking. Each of the four process objects is composed of a set of process-specific slots as well as a set of process-general slots shared by all four objects--Type , Device, Equipment slots. Error per response fill averaged by language for MUC-5 sites for the four process object s and their slots is presented in Tables 5-8 .</Paragraph>
    <Paragraph position="2">  For both languages error per response fill is considerably lower for most of the process-general slots than for the process-specific slots . This discrepancy can be traced to the fact that similar types of information for the proces s objects are available in both languages and similar development strategies are employed : Emphasize high-frequenc y slots and de-emphasize low-frequency slots. Process-specific slots contribute significantly less to the total objec t scores than do process-general slots. In EME, process-specific slots in lithography, layering, and etching onl y comprise around 25% of the total object, and in JME for the same objects less than 20% . The same frequency pattern occurs in the development data; process-specific slots have a lower frequency of occurrence than the process-genera l slots. Note also that DEVICE and EQUIPMENT slots are pointers to other objects (with more slots) and the TYPE slot is a required slot that is indicative of the actual existence of a particular process within a text . Since information is more likely to be available for the process-general slots in both languages, more effort is directed at these higher pay-off slots than for the process-specific slots.This accounts for the better performance on process-specific slots in bot h languages.</Paragraph>
    <Paragraph position="3"> But what accounts for the better Japanese than English performance in the EME domain? The Packaging object provides the fast clue--differences in the amount of information between Japanese and English. Table 8 indicates that no test data are available for three of the process-specific slots for Japanese . In comparison to Japanese, the number of possible fills in the test set for English is considerably higher on all slots, not just these three ; even factoring in the ratio of Japanese to English articles cannot account for this discrepancy . The development data also reflect this difference. Even though the task remained constant for these two languages in this domain, the type of dat a available for extraction for the Packaging object obviously differed for the two language pairs. There were simply fewer extractable items and thus fewer opportunities for error.</Paragraph>
    <Paragraph position="4"> The amount of extractable items within a text affects the degree of difficulty of managing extractable items . In a single text, managing all the data elements associated with different multiple processes (i .e. multiple ME capabilities) is more difficult than managing only data elements associated with a single process . The English test se t contained a higher proportion of texts with multiple processes (44%) than the Japanese test set (31%) .The test sets also differed in the distribution of the types of multiple processes occurring with a text, e .g. whether a single text contains  multiple processes of the same type or a combination of different process types. For the subset of texts containing multiple processes, Table 9 compares the percent distribution for the types of multiple processes . While the Japanese test set is more likely to contain a text with multiple layering or multiple lithography processes, the English test set is more likely to contain multiple packaging processes and combinations of process types .The average number of processes within a single text for layering, lithography, and etching types differs little across languages, but texts with multiple packaging processes contain twice as many processes in English as in Japanese .</Paragraph>
  </Section>
  <Section position="7" start_page="49" end_page="50" type="metho">
    <SectionTitle>
LANGUAGE
PROCESS TYPE MULTIPLE
PROCES S
TYPESMULTIPLELAYERING MULTIPLELITHOGRAPHY MUTLIPLEETCHING MULTIPLEPACKAGING
</SectionTitle>
    <Paragraph position="0"> Table 10 compares the average error per response fill for texts containing multiple processes with text s containing a single process for TIPSTER sites .With the exception of the GE/CMU/MIVI performance in Japanese , performance for all sites on texts with multiple processes was lower than on the texts with a single process .That a higher percentage of English test set texts contained multiple processes negatively affected the performance. Even though the Japanese test set contained more texts with multiple processes of the same type (and, in fact, an average lower performance on those text than on texts with multiple process types), the effect was ameliorated by the lowe r distribution of texts containing multiple processes in the Japanese test set . In other words, the greater likelihood o f multiple processes within a single text in English (i.e. greater amount of extractable items) and the accompanying data management problems contributed to the weaker performance in Fnglish.</Paragraph>
    <Paragraph position="1">  Evaluation results from the JV domain illustrate the impact of information presentation . The way in which information is expressed in a single domain for two languages may differ. If texts in one language are more or les s formulaic in structure and represent domain concepts in more or less standardized ways, then the texts in that language are more homogeneous in terms of discourse structure and terminology. As a result, texts in that language may be mor e easily exploited for information extraction than a more heterogeneous text corpus in a different language, even thoug h the domain and application are the same. This appears to be the case for the JV domain for Japanese and English .</Paragraph>
    <Paragraph position="2"> In the JV domain, the application is directed at tracking tie-ups among entities. Identifying entities engage d in some business activity in a tie-up relationship triggers the tracking . Error per response fill data by language averaged  In general, the performs cE characteristic of lower Japanese error per response fill is consistent across the slots in Table 11 . Systems perform better in Japanese in identifying the tie-up itself, participants in the tie-up, thei r relationship, and industry of the tie-up activity. This performance characteristic appears to be the result of the way i n which information is presented in the Japanese text .</Paragraph>
    <Paragraph position="3"> Preliminary analysis of the Japanese test set indicates that 60 percent or more of the articles have a prototypical text structure and that structure lends itself to a proficient extraction methodology . Typically, an article contains one tie-up, and the relevant tie-up occurs in the first few sentences . Moreover, the tie-up signal is characterized by a stereotypical pattern as defined below : X wa Y to... .teikei shita to .. ..happyo shita In this pattern, X and Y are tie-up principals with the verb phrase &amp;quot;teikei shita&amp;quot; indicating a tie-up relationship .The key element is the topic marker &amp;quot;wa.&amp;quot; That marker sets the stage for the entity to be the protagonist throughout the tex t and, in fact, for any other tie-ups mentioned in the article where only one of the entities is named . This prototypical structure gives the Japanese systems a headstart by providing a pattern into which missing or seemingly irrelevan t information may later be inserted. In short, the presentation of the information in Japanese may facilitate extraction fills throughout the template and therefore may lead to better overall performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML