File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1212_intro.xml

Size: 3,834 bytes

Last Modified: 2025-10-06 14:00:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1212">
  <Title>A Block-Based Robust Dependency Parser for Unrestricted Chinese Text,</Title>
  <Section position="3" start_page="79" end_page="79" type="intro">
    <SectionTitle>
2 Block-based Chinese dependency analysis
</SectionTitle>
    <Paragraph position="0"> As indicated in Fig. 3, block-based dependency analysis consists of four modules, i.e., word segmentation, part-of-speech tagging, block analysis and dependency analysis. A bi-directional heuristic longest matching method is applied to decide the optimal word sequence.</Paragraph>
    <Paragraph position="1"> A set of manually compiled linguistic rules is applied to decide the optimal word category sequence. In a partial parsing process, first, local structures (such as duplication, prefix and suffix) are identified by a set of word formation rules, and proper names are identified by a set of construction rules. This kind of local structures are called meta-blocks. Then frame structures (DP), which have paired starting word and ending word, such as &amp;quot;~'&amp;quot;'. &amp;quot;I~&amp;quot;, &amp;quot;~'&amp;quot;'. &amp;quot;qa&amp;quot; ete are identified, but its internal structure analysis is delayed. Then ATN network is used to identify the basic blocks, called level-1 blocks (these blocks don't contain IP, LP and DP). Then we use a set of heuristic rules to identify the boundaries of IP and LP. Then ATN network will use again to identify the complicated blocks, called level-2 blocks, which may contain LP, DP, IP as its components. Then a sequence of blocks obtained is then transported to dependency parser, which will generate dependency relations among blocks. ARer that, we will recursively parse the internal parts of IP,  defined by a set of rules in the form of phrase structure rule. All of these rules combined with syntactic and semantic constraints are implemented as an ATN network (Allen, 1995).</Paragraph>
    <Paragraph position="2"> We also define 17 kinds of dependency relations for Chinese as shown in table 2.</Paragraph>
    <Paragraph position="3">  For an Input: S = w~, w2,---, w,,, the expected parse result includes two parts as described  below: (!) T : a set of sub-trees, each sub-tree represents a block.</Paragraph>
    <Paragraph position="4"> T={ Ti,T2,T3 ..... In } (~) D: a set of 3-tuple in the form of {governor, dependant, dependency-relation}, which represents dependency relations between blocks. D={ &lt; go~,de11,relq &gt;,&lt; gov 2, dep2 , reIa 2 &gt;,... &lt; gov., dep., rela m &gt;} Algorithm 1: The block-based parsing algorithm 1) Identification DP by matching the starting word and ending word; 2)Identification of meta-blocks by bottom-up analysis; 3) Identification of NP, UP, UG, NTL, NTP, AP, FP, VP of level 1 by bottom-up analysis; 4)Identification LP, PP by looking for left boundary for LP and right boundary for IP, by using a set of Chinese linguistic rules; 5) Identification of NP, UP, UG, NTL, NTP, AP, FP, VP of level 2 by bottom-up analysis; 6) Dependency parsing with the blocks identified; 7) For blocks LP, DP and LP, recursively do 1 thorough 6.</Paragraph>
    <Paragraph position="5">  In the following, we will illustrate the parsing process with an example.</Paragraph>
    <Paragraph position="6">  process is omitted).</Paragraph>
    <Paragraph position="7"> Lots of efforts have been made to parse languages into phrase structure, and many powerful computational models have been proposed (Gazdar, et al, 1987, Tomita, M, 1986). We build up an ATN like network to identifFy these blocks. Since the ATN approaches can be found in the literatures (Allen, 1995), we will not describe this algorithm in details here. In the next section, we will focus on a new efficient algorithm for Chinese dependency parsing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML