File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/m98-1006_intro.xml
Size: 1,295 bytes
Last Modified: 2025-10-06 14:06:30
<?xml version="1.0" standalone="yes"?> <Paper uid="M98-1006"> <Title>Using Collocation Statistics in Information Extraction</Title> <Section position="1" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> Our main objective in participating MUC-7 is to investigate and experiment with the use of collocation statistics in information extraction. A collocation is a habitual word combination, such as #5Cweather a storm&quot;, #5C#0Cle a lawsuit&quot;, and #5Cthe falling yen&quot;. Collocation statistics refers to the frequency counts of the collocational relations extracted from a parsed corpus. For example, out of 6577 instances of #5Caddition&quot; in a corpus, 5190 was used as the object of #5Cin&quot;. Out of 3214 instances of #5Chire&quot;, 12 of them take #5Calien&quot; as the object.</Paragraph> <Paragraph position="1"> We participated in two tasks: Named Entity and Coreference. In both tasks, the input text is processed in two passes. During the #0Crst pass we use the parse trees of input texts, combined with collocation statistics obtained from a large corpus, to automatically acquire or enrich lexical entries which are then used in the second pass.</Paragraph> </Section> class="xml-element"></Paper>