File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/m98-1006_intro.xml

Size: 1,295 bytes

Last Modified: 2025-10-06 14:06:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="M98-1006">
  <Title>Using Collocation Statistics in Information Extraction</Title>
  <Section position="1" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Our main objective in participating MUC-7 is to investigate and experiment with the use of collocation statistics in information extraction. A collocation is a habitual word combination, such as #5Cweather a storm&amp;quot;, #5C#0Cle a lawsuit&amp;quot;, and #5Cthe falling yen&amp;quot;. Collocation statistics refers to the frequency counts of the collocational relations extracted from a parsed corpus. For example, out of 6577 instances of #5Caddition&amp;quot; in a corpus, 5190 was used as the object of #5Cin&amp;quot;. Out of 3214 instances of #5Chire&amp;quot;, 12 of them take #5Calien&amp;quot; as the object.</Paragraph>
    <Paragraph position="1"> We participated in two tasks: Named Entity and Coreference. In both tasks, the input text is processed in two passes. During the #0Crst pass we use the parse trees of input texts, combined with collocation statistics obtained from a large corpus, to automatically acquire or enrich lexical entries which are then used in the second pass.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML