File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-1212_evalu.xml

Size: 3,934 bytes

Last Modified: 2025-10-06 13:58:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1212">
  <Title>A Block-Based Robust Dependency Parser for Unrestricted Chinese Text,</Title>
  <Section position="5" start_page="82" end_page="83" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> A parsing system was implemented and extensive experiments have been performed.</Paragraph>
    <Paragraph position="1"> The system is written in C and tested on Pentium PC. A total of over 1,000 phrase structure rules and over 3,00 dependency rules were used for block-based parsing. We built a large lexicon of 220,000 word entries, with word category information and necessary syntactical and semantic features. This approach has been incorporated as Chinese parsing model in a successful commercial Chinese-Japanese machine translation system J-Beijing (Zhou, 1999).</Paragraph>
    <Paragraph position="2"> This system accepts Chinese text and output the parsing result for each sentence. Each input sentence is defined as a word string ending with period, comma, question mark, semicolon, exclamation mark.</Paragraph>
    <Paragraph position="3"> We evaluated the parsing result with two corpus: (~) &amp;quot;primary school textbook of Singapore&amp;quot;(~:~l~J~:Jx~), a corpus consists of single sentences of modern Chinese, including 1842 sentences, which not only covers most Chinese sentence types, but also includes various of morphological phenomena, such as word duplication, affix, suffix, etc. (~)Some news articles collected from People's Daily(1998,1999,2000). The sentences are real text, so there are lots of unknown words (mainly proper nouns), long sentences, complicated sentences, ellipsis, etc. The evaluation results are listed in table 3.</Paragraph>
    <Paragraph position="4">  Although this model has produced satisfactory initial results, some natural difficulties for the Chinese language still remain, such that further improvement is highly desired. Through mistake analysis, we found that some of main issues affecting the system performance seriously, as is listed below.</Paragraph>
    <Paragraph position="5">  also function as two words with totally different meaning.</Paragraph>
    <Paragraph position="7"> Since compound nouns cannot exhaustively numerated, errors will be inevitable.</Paragraph>
    <Paragraph position="8"> 4) Identification of proper noun</Paragraph>
    <Paragraph position="10"> For pattern of &amp;quot;V+A+N', there are usually two kinds of reduction methods: \[\[V+A\]vp+~ ~.A~/~ \[V+\[A+N\]np\] ~j~z~ All of these problems need further improvements in the future.</Paragraph>
    <Paragraph position="11"> Conclusion In this paper, a practical Chinese parser is presented. The block-based dependency parsing strategy is a novel integration of phrase structure partial approach and dependency parsing approach. The partial parsing approach and dependency parsing approach can cope with  ungrammatical or faulty, or complicated sentences, therefore making the system highly robust. Furthermore, our top-down strategy of identifying the Chinese special structures such as frame structures, preposition structures, post-preposition structures produces a simplified sentence skeleton, thereby improving the efficiency of parsing.</Paragraph>
    <Paragraph position="12"> Although this model has shown satisfactory initial results, some natural difficulties for the Chinese language still remain, and further work will be needed. We currently determine the word category by a set of linguistics rules compiled by human which limits the precision of identification precision. Therefore, other approaches such as statistical approach or some kind of hybrid approach will be adopted in the future. In addition, new methods in handling ambiguous word segmentation, proper noun and compound noun identification, block analysis, predicate identification and dependency analysis will be studied.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML