File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/m98-1002_metho.xml

Size: 4,652 bytes

Last Modified: 2025-10-06 14:14:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="M98-1002">
  <Title>ST Results on Walkthrough</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
IE Evaluation Tasks
* Named Entity Task [NE]: Insert SGML tags into
</SectionTitle>
    <Paragraph position="0"> the text to mark each string that represents a person, organization, or location name, or a date or time stamp, or a currency or percentage figure  information related to organization, person, and artifact entities, drawing evidence from anywhere in the text  prespecified event information and relate the event information to particular organization, person, or artifact entities involved in the event. * Coreference Task [CO]: Capture information on coreferring expressions: all mentions of a given entity, including those tagged in NE, TE tasks  relevant terms.</Paragraph>
    <Paragraph position="1"> - 2 sets of 100 articles (aircraft accident domain) preliminary training, including dryrun. - 2 sets of 100 articles selected balanced for relevancy, type and source for formal run (launch event domain).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Training and Data Sets (con't)
Training Set
</SectionTitle>
      <Paragraph position="0"> Training keys for NE, TE, TR available from preliminary set of  * NE mirrored Multilingual Entity Task * SGML tagging in text stream from SLUG, DATE, PREAMBLE, TEXT, TRAILER - Elements: ENAMEX, NUMEX, TIMEX - Attributes: TYPE, STATUS (keys), MIN (keys) * Markables - Names of organizations, persons, locations - Mentions of dates and times (relative and absolute) - Direct mentions of currency/percentage Named Entity (NE) (con't) * Non-markables - Artifacts (Wall Street Journal, MTV) - Common nouns used in anaphoric reference (the plane, the company,) - Names of groups of people and laws named after people (Republicans, Gramm-Rudman amendment, the Nobel prize) - Adjectival forms of location names (American, Japanese) - Miscellaneous uses of numbers which are not specifically currency or percentages (1 1/2 points, 1.5 times) * Caveats: &amp;quot;newspaper&amp;quot; style, domain bias toward ST  - Common mistakes on TIMEX: missed early Thursday morning, within six months - Common mistakes on ENAMEX: missed Globo, MURDOCH, Xichang; Long March as TIMEX, ENAMEX - One site missed only one entity in whole document within six  * TEs are independent or neutral wrt scenario: generic objects and slots.</Paragraph>
      <Paragraph position="1"> * Separates domain-independent from domain-dependent aspects of extraction.</Paragraph>
      <Paragraph position="2"> * Consists of object types defined for a given scenario, but unconcerned with relevance.</Paragraph>
      <Paragraph position="3"> * Answer key contains objects for all organizations, persons, and vehicle artifacts mentioned in the texts, whether relevant to scenario or not.  persons, and artifacts that enter into these relations, whether relevant to scenario or not.</Paragraph>
      <Paragraph position="4">  - Relational objects have pointers to Template Elements, setfills. null - Set fills require inferences from the text. * Test set statistics: 63/100 documents relevant to the scenario.</Paragraph>
      <Paragraph position="5"> ST Overall Results * Systems scored points lower (F-measure) on ST than on TE.</Paragraph>
      <Paragraph position="6"> * Interannotator variability (measured on all articles) was between 85.15 and 96.64 on the Fmeasures. null * Document-level relevance judgments (Text  Filtering scores), were similar to those for MUC-6, although percentage of relevant articles in text set was greater.</Paragraph>
      <Paragraph position="7">  expressions: all mentions of a given entity, including those tagged in NE, TE tasks. * Focused on the IDENTITY (IDENT) relation: symmetrical and transitive relation, equivalence classes used for scoring.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
* Markables: Nouns, Noun Phrases, Pronouns
CO Results Overall
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
CO Results for Walkthrough
</SectionTitle>
    <Paragraph position="0"> * Walkthrough article non-relevant for ST * F-measures range from 23.2-62.3% * Missing: - Dates: Thursday, Sept. 10 - Money: $30 Million - Unusual Conjunctions: GM, GE PROJECTS - Miscellaneous:  Thursday's meeting, agency's meeting, FCC's allocation..., transmissions from satellites to earth stations US satellite industry, federal regulators</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML