File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/m91-1039_metho.xml

Size: 18,357 bytes

Last Modified: 2025-10-06 14:12:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="M91-1039">
  <Title>APPENDIX C : GUIDELINES FOR INTERACTIVE SCORIN G</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
APPENDIX C :
GUIDELINES FOR INTERACTIVE SCORIN G
1. INTRODUCTIO N
</SectionTitle>
    <Paragraph position="0"> This document, although fairly extensive, is not intended to give you an exhaustive list of &amp;quot;do's&amp;quot; and &amp;quot;don'ts&amp;quot; about doing the interactive scoring of th e templates . Instead, it presents you with guidelines and some examples, in order to imbue you with the spirit of the enterprise . It is up to you to carefully consider you r reasons before judging mismatching responses to be &amp;quot;completely&amp;quot; or &amp;quot;partially &amp;quot; correct .</Paragraph>
    <Paragraph position="1"> Thus, you should attempt to set aside a substantial amount of time to do the interactive scoring and should plan to do it when you are rested and can be a s objective as humanly possible about your system's performance . Please refer to the file key-tst2-notes for examples of decisions NOSC made in preparing the answer key .</Paragraph>
    <Paragraph position="2"> If you have any doubt whether any given system response deserves to be judge d completely/partially correct, count it incorrect .</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. SETTING UP THE SCORING PROGRAM IN INTERACTIVE MODE
</SectionTitle>
    <Paragraph position="0"> You must use the latest official version of the scoring program together with th e latest slotconfig .el file. You are not permitted to make any modifications of your ow n to the scoring software or the files it uses, except to define the pathnames in th e config .el file for the files that it reads in.</Paragraph>
    <Paragraph position="1"> The configuration (config .el) files supplied with the test package set the :queryverbose option on, which places the scoring program in interactive mode. (See MU C Scoring System User's Manual, section 5 .2.) The only feature of the interactiv e scoring that you are *not* permitted to take advantage of is the option to change a key or response template! This feature is controlled by the :disable-edit option, which is set on in the config .el files supplied in the test package and should not be modified . Although there may be errors in the key templates, you are not permitted to fi x them, as we do not have sufficient time to make the corrections known to all sites .</Paragraph>
    <Paragraph position="2"> Score your system under the assumption that the answer key is correct, make note o f any perceived errors in the key, and email them to NOSC along with your results . If there is sufficient evidence that errors were made that affect the scores obtained, a new key will be prepared after the conference, and sites will be given the opportunity to rescore their system responses . The new scores will replace the ol d ones as the official results .</Paragraph>
    <Paragraph position="3"> Included among your options for interactive scoring is the manual realignmen t of response templates with key templates (see section 3 .2 .1 below and section 4 .7 of User's Manual) . If you are not already comfortable using the interactive scorin g features of the scoring program, take some time to practice on some texts in th e training set before you attempt to do the scoring for the test set. Also be sure to rea d the document on test procedures carefully re saving your history buffer to a file fo r C-I use in other scoring sessions required for completing the test procedure . Reference to key-tst2-notes while you are doing the interactive scoring might help yo u understand the key better and give you ideas on cases when alternative fillers migh t be justified .</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. SCORING MISMATCHED SLOT FILLER S
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 BY TYPE OF FILL
</SectionTitle>
      <Paragraph position="0"> These subsections deal in turn with string fills, set fills, and other types of fills .</Paragraph>
      <Paragraph position="1"> Following that is a section concerning cross-reference tags .</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3.1.1 STRING FILL S
</SectionTitle>
    <Paragraph position="0"> Slots requiring string fills are slots 5, 6, 8, and 11 . In the case of a mismatch o n fillers for these slots, the scoring program will permit you to score the response a s fully correct, partially correct, or incorrec t  NOSC has attempted to provide a choice of good string options for each string slot . If you get a mismatch, before you score a filler fully correct you should conside r carefully whether your system's filler is both complete enough and precise enoug h to show that the system found exactly the right information .</Paragraph>
    <Paragraph position="1"> The most likely situation where &amp;quot;fully correct&amp;quot; would be justified is in a cas e where the system or the key includes &amp;quot;nonessential modifiers&amp;quot; such as articles , quantifiers, and adjectivals for nationalities (e .g., SALVADORAN) .</Paragraph>
    <Paragraph position="2"> EXAMPLE (slot 11): RESPONSE &amp;quot;THE 3 PEASANTS &amp;quot;</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
KEY
&amp;quot;PEASANTS &amp;quot;
</SectionTitle>
    <Paragraph position="0"> In filling the key templates, such nonessential modifiers were generally include d in slot 5 (since there are no slots specifically for the number and nationality of th e perpetrators) . They were generally excluded from fillers for the other string slots , unless they seemed to be part of a proper name (e .g. THE EXTRADITABLES) .</Paragraph>
    <Paragraph position="1"> &amp;quot;Fully correct&amp;quot; is also warranted if the system response contains more modifyin g words and phrases than the answer key, as long as all the modifiers are modifiers o f the noun phrase . However, in most cases the answer key should already contain options such as these.</Paragraph>
    <Paragraph position="2"> EXAMPLE (slot 11) : RESPONSE &amp;quot;OLD PEASANTS WHO WERE WITNESSES &amp;quot;</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
KEY
&amp;quot;PEASANTS&amp;quot; / &amp;quot;OLD PEASANTS &amp;quot;
</SectionTitle>
    <Paragraph position="0"> Finally, if your system does not generate an escape (backslash) character in fron t of the inner double quote marks of a filler that is surrounded by double doubl e quotes, you may score the system response as completely correct if it would otherwis e match the key .</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 .2 SET FILLS
</SectionTitle>
      <Paragraph position="0"> Slots requiring set fills are slots 3, 4, 7, 10, 13, 14, 15, 17, and 18 . (Slot 16, the LOCATION slot, is not treated by the scoring program as having set fills .) In the case of a mismatch on fillers for these slots, the scoring program will not permit you t o score them as fully correct . (But see section 3 .1 .4 below re an exception. Also, see 3 .2.7 and 3 .2.15 for information concerning automatic assignment of partial credit by the scoring program . ) NOSC has attempted to offer all the possible alternative correct fillers as options i n the key; however, scoring a filler partially correct may be justified in certain cases . See the appropriate subsections of section 3 .2 below.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 .1.3 OTHER TYPES OF FILL S
</SectionTitle>
    <Paragraph position="0"> Slots requiring other types of fills are slots 1, 2, 9, 12, and 16 . In the case of a mismatch on fillers for these slots, the scoring program will permit you to score the fillers as fully correct, partially correct, or incorrect .</Paragraph>
    <Paragraph position="1"> (But see section 3 .1 .4 below re an exception .</Paragraph>
    <Paragraph position="2"> Also, see 3 .2.16 for information concerning automatic assignment o f partial credit by the scoring program . ) NOSC has attempted to offer all the possible alternative correct fillers as options i n the key ; however, scoring a filler completely or partially correct may be justified i n certain cases .</Paragraph>
    <Paragraph position="3"> See the appropriate subsections of section 3 .2 below .</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3.1.4 FILLS THAT INCLUDE CROSS-REFERENCE TAG S
3 .1.4 .1 FULLY CORREC T
</SectionTitle>
    <Paragraph position="0"> The scoring program permits you to score a slot as fully correct in the case of a mismatch on the slots listed in 3 .1 .2 and 3 .1 .3 above where the only mismatch is on a cross-reference tag . In such cases, you may score the entire filler as fully correc t only if the filler of the slot indicated by the cross-reference tag was also scored a s fully correct .</Paragraph>
    <Paragraph position="1"> C--3</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 .1.4.2 PARTIALLY CORREC T
</SectionTitle>
    <Paragraph position="0"> If the non-tag portion of the filler is not judged completely correct (by th e criteria found in other sections of this set of guidelines), the best you can do is t o judge the entire filler partially correct . If the non-tag portion is *completely * correct and the tag is either missing or incorrect, it is appropriate to score the entir e filler partially correct .</Paragraph>
    <Paragraph position="1"> Scoring the entire filler partially correct may also be done if the non-tag portio n of the filler is judged *partially* correct and the tag is either missing or incorrect . In this case, however, you must re-read the text and judge the partial correctness o f the non-tag portion with respect to the way the text refers to the *KEY'S* tag, no t the system response tag . In other words, you must be able to show that the syste m got the non-tag portion partially correct for the right reason . (Note that thi s guideline is based on the assumption that some systems might intentionally, no t accidentally, generate a correct filler and, for independent reasons, give it a n incorrect tag. ) EXAMPLE (slot 7) : RESPONSE SUSPECTED OR ACCUSED : &amp;quot;RIGHT-WINGERS &amp;quot;</Paragraph>
  </Section>
  <Section position="10" start_page="0" end_page="0" type="metho">
    <SectionTitle>
KEY
REPORTED AS FACT: &amp;quot;LEFT-WINGERS &amp;quot;
</SectionTitle>
    <Paragraph position="0"> (where SUSPECTED OR ACCUSED has been judged partially correct with respec t to its *CORRECT* intended referent, &amp;quot;LEFT-WINGERS&amp;quot;, i .e ., on the basis of presuming that the whole system response was SUSPECTED OR ACCUSED : &amp;quot;LEFT-WINGERS&amp;quot; rather than SUSPECTED OR ACCUSED: &amp;quot;RIGHT-WINGERS&amp;quot; )</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 .4.3 INCORREC T
</SectionTitle>
      <Paragraph position="0"> If the non-tag portion of the filler is judged incorrect, then the entire filler mus t be judged incorrect, even if the tag portion is correct or partially correct .</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 BY INDIVIDUAL SLO T
</SectionTitle>
      <Paragraph position="0"> The guidelines here concern the manual realignment of templates in the cas e where the automatic template mapping facility provided by the scoring progra m fails to identify the optimal mapping between the set of response templates for a message and the set of key templates for that message . Guidelines are needed becaus e it is possible for the user to elect not to map a response template to any key templat e at all, i .e., to map a response template to NIL and a key template to NIL rather tha n mapping the templates to each other. The user may wish to do this in cases where the match between the response and the key is so poor and the number of mismatchin g fillers so large that the user would rather penalize the system's recall and overgeneration (by mapping to NIL) than penalize the system's precision .</Paragraph>
      <Paragraph position="1"> However, to ensure the validity of the performance measures and to ensur e comparability among the systems being evaluated, it is important that this option no t be overused . The basic rule is that the user must permit a mapping between a response template and a key template if there is a full or partial match on th e incident type . (The condition concerning a partial match covers the two basi c situations described in section 3 .2.3 below.) If there is no match on the incident type , manually mapping to NIL is allowed, at the discretion of the user .</Paragraph>
    </Section>
  </Section>
  <Section position="11" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FULLY CORRECT OR PARTIALLY CORRECT :
</SectionTitle>
    <Paragraph position="0"> System response is close to the key's date or range of dates (if the date i s difficult to calculate) . In the example below, the system's response may be judge d fully correct, since the system has calculated a more precise date than what wa s expected by the key .</Paragraph>
    <Paragraph position="1">  In general, the guidelines in section 3 .1 .1 .1 do not apply to this slot, since thi s slot is intended to be filled only with proper names . However, the ter m &amp;quot;proper names&amp;quot; is not completely defined, especially with respect to th e expected fillers in the case of STATE-SPONSORED TERRORISM . You have more leeway to score fillers as fully correct in such cases.</Paragraph>
    <Paragraph position="2">  The number of cases where it is justifiable to score this slot partially correc t should be limited, especially for situations other than the following : System determines a a lesser confidence than actually warranted : POSSIBLE (syste m response) instead of CLAIMED OR ADMITTED, SUSPECTED OR ACCUSED, or SUSPECTED O R ACCUSED BY AUTHORITIES (key). Even in these cases, there has to be some stron g justification based on e .g . a difference of opinion as to how a human would interpre t the text in order to justify partial correctness .</Paragraph>
    <Paragraph position="3"> NOTE : The scoring program will automatically score the system response partially correct in the case where the system generates SUSPECTED OR ACCUSED instead o f  1. See section 3 .1 .1 .2 .</Paragraph>
    <Paragraph position="4"> 2. Response string is good enough to corroborate categorization made in TYP E  slot (assuming system response for TYPE slot is correct) . Note that the strin g in the key may sometimes not be good enough by this criterion ; in such case s you must decide for yourself whether the system response is as good as h e filler in the key is.</Paragraph>
  </Section>
  <Section position="12" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 .2.9 Slot 9 -- PHYSICAL TARGET : TOTAL NUM
PARTIALLY CORRECT :
</SectionTitle>
    <Paragraph position="0"> System response is PLURAL instead of a specific number in the key, in case s where filler had to be summed up, especially where approximate numbers are given , e.g., &amp;quot;some 20 power stations and over 30 banks&amp;quot; .</Paragraph>
    <Paragraph position="1">  where filler had to be summed up, especially where approximate numbers are given, e.g., &amp;quot;some 20 employees and over 30 other people&amp;quot; .</Paragraph>
  </Section>
  <Section position="13" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 .2.13 Slot 13 -- HUMAN TARGET : TYPE(S )
FULLY CORRECT :
</SectionTitle>
    <Paragraph position="0"> Mismatch not allowed to be scored fully correct .</Paragraph>
  </Section>
  <Section position="14" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PARTIALLY CORRECT :
</SectionTitle>
    <Paragraph position="0"> The number of cases where it is justifiable to score this slot partially correc t should be limited, especially for situations other than the following, where &amp;quot;partiall y correct&amp;quot; may be justified if the text is particularly unclear:  The number of cases where it is justifiable to score this slot partially correc t should be extremely limited, except in those cases that are handled automatically by the scoring program, i .e., where the system response is a set list item that is a superset of the filler in the key, as determined by the shallow hierarchy o f instrument types provided in the task documentation . .</Paragraph>
  </Section>
  <Section position="15" start_page="0" end_page="0" type="metho">
    <SectionTitle>
UNITED STATES
NOTE :
</SectionTitle>
    <Paragraph position="0"> The scoring program will automatically score a response partiall y correct when it contains correct country but no specific place or an incorrec t  The number of cases where it is justifiable to score this slot partially correc t should be limited, especially for situations other than the following : System response correctly indicates that damage was done but under- or overestimate s amount of damage .</Paragraph>
  </Section>
  <Section position="16" start_page="0" end_page="0" type="metho">
    <SectionTitle>
EXAMPLE :
RESPONSE SOME DAMAGE
KEY
DESTROYED
3.2.18 Slot 18 -- EFFECT ON HUMAN TARGET(S )
FULLY CORRECT:
</SectionTitle>
    <Paragraph position="0"> Mismatch not allowed to be scored fully correct .</Paragraph>
  </Section>
  <Section position="17" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PARTIALLY CORRECT :
</SectionTitle>
    <Paragraph position="0"> The number of cases where it is justifiable to score this slot partially correc t should be limited, especially for situations other than the following : System respons e contains less information than the key .</Paragraph>
  </Section>
  <Section position="18" start_page="0" end_page="0" type="metho">
    <SectionTitle>
EXAMPLE :
RESPONSE NO INJUR Y
KEY
NO INJURY OR DEATH
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
class="xml-element"></Paper>
Download Original XML