File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/p94-1032_intro.xml

Size: 2,203 bytes

Last Modified: 2025-10-06 14:05:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="P94-1032">
  <Title>Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation</Title>
  <Section position="4" start_page="0" end_page="234" type="intro">
    <SectionTitle>
2. Previous Works
</SectionTitle>
    <Paragraph position="0"> Church (1988) proposes a part of speech tagger and a simple noun phrase extractor. His noun phrase extractor brackets the noun phrases of input tagged texts according to two probability matrices: one is starting noun phrase matrix; the other is ending noun phrase matrix. The methodology is a simple version of Garside and Leech's probabilistic parser (1985). Church lists a sample text in the Appendix of his paper to show the performance of his work. It demonstrates only 5 out of 248 noun phrases are omitted. Because the tested text is too small to assess the results, the experiment for large volume of texts is needed.</Paragraph>
    <Paragraph position="1">  Bourigault (1992) reports a tool, LEXTER, for extracting terminologies from texts. LEXTER triggers two-stage processing: 1) analysis (by identification of frontiers), which extracts the maximal-length noun phrase: 2) parsing (the maximal-length noun phrases), which, furthermore, acquires the terminology embedded in the noun phrases. Bourigault declares the LEXTER extracts 95deg/'0 maximal-length noun phrases, that is, 43500 out of 46000 from test corpus. The result is validated by an expert. However, the precision is not reported in the Boruigault's paper.</Paragraph>
    <Paragraph position="2"> Voutilainen (1993) announces NPtool for acquisition of maximal-length noun phrases. NPtool applies two finite state mechanisms (one is NP-hostile; the other is NP-friendly) to the task. The two mechanisms produce two NP sets and any NP candidate with at least one occurrence in both sets will be labeled as the &amp;quot;ok&amp;quot; NP. The reported recall is 98.5-100% and the precision is 9598% validated manually by some 20000 words. But from the sample text listed in Appendix of his paper, the recall is about 85%, and we can find some inconsistencies among these extracted noun phrases.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML