File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/w95-0113_concl.xml

Size: 2,136 bytes

Last Modified: 2025-10-06 13:57:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0113">
  <Title>Development of a Partially Bracketed Corpus with Part-of-Speech Information Only</Title>
  <Section position="8" start_page="171" end_page="171" type="concl">
    <SectionTitle>
6. Concluding Remarks
</SectionTitle>
    <Paragraph position="0"> To process real text is indispensable for a practical natural language system. Probabilistic method provides a robust way to tackle with the unrestricted text. This paper proposes a probabilistic chunker to help the development of a partially bracketed corpus. Rather than using a treebank as our training corpus, LOB Corpus which is tagged with part-of-speech information only is used. The experimental results show the probabilistic chunker has more than 92% correct rate in outside test. The well-formed partially bracketed corpus is a milestone in the development of a treebank. In addition, the simple but effective chunker can also be applied to many natural language applications such as extracting the predicate-argument structures \[9,10\], grouping words \[11\] and gathering collocation \[12\].</Paragraph>
    <Paragraph position="1"> The evaluation criterion adopted in this paper is not very strict. Under a strict criterion, the method proposed in this paper may not be suitable for short-fat trees. That is, it is suitable for tall-thin trees. To solve this problem, a more general definition which considers more parts of speech in contingency table is needed. However, that introduces another problem: the more the general definitions we use, the larger the tagged corpus we need. This paper also presents a tag mapper. It sets up the mapping between different tagging sets. Such an algorithm facilitates the development of a large-scale tagged corpus from different sources. By the way, much more reliable statistic information can be trained from the large-scale tagged corpus, so that the feasibility of the chunker is assured. Besides the above problem, the critical points for local minimum are not obvious in some cases. Thus their determination is also demanded in the future.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML