File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0202_metho.xml
Size: 29,328 bytes
Last Modified: 2025-10-06 14:14:24
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0202"> <Title>Parsing Chinese with an Almost-Context-Free Grammar</Title> <Section position="2" start_page="13" end_page="14" type="metho"> <SectionTitle> 1. Right-hand-side contexts 2. Nonterminal functions </SectionTitle> <Paragraph position="0"> We would like to note at the outset that from the formal language standpoint, the complications introduced by the form of our production rules have so far hindered theoretical analyses of the formal expressiveness characteristics of this grammar. Because of the nature of the constraints, it is unclear how the expressiveness relates to, for example, the more powerful unification-based grammars that are widespread for English.</Paragraph> <Paragraph position="1"> At the same time, however, we will show that the natural format of the rules has greatly facilitated the writing of robust large grammars. Also, an efficient Earley-style parser can be constructed as discussed below for grammars of this form. For our applications, we therefore feel the effectiveness of the grammar form compensates for the theoretical complications.</Paragraph> <Paragraph position="2"> We now describe the extensions, but first define some notation used throughout the paper.</Paragraph> <Paragraph position="3"> A traditional context-free grammar (CFG) is a four-tuple G = (N, ~, P, S), where N is a finite set of nonterminal symbols, ~ is a finite set of terminal symbols such that N N ~ = O, p is a finite set of productions and S E N is a special designated start symbol. Productions in P are denoted by symbol Pr, 1 < r < IPI, and have the form Dr ~ Zr,I Z,. 2 &quot; &quot; &quot; Zr,~r~ , ~'r ~ O, whereDr ENandZr,j E NU~, l_~ j ~_~rr.</Paragraph> <Paragraph position="4"> Right-hand-side contexts We introduce right-hand-side contexts to improve rule applicability decisions for complex compounding phenomena. The difficulty that ordinary CFGs have with complex compounding phenomena can be seen from the following example grammar fragment: 1. RelPh~ NP vn ~(de) 2. Nom ~ NP vn ~J(de) 3. NP ~ Nom 4. NP ~ RelPh NP 5. NP ~ NP NP Here, RelPh is a relative phrase, Nom is a nominalization (similar to a gerund), vn is lexical verb category requiring an NP argument, and ~J(de) is a genitive particle.</Paragraph> <Paragraph position="5"> The sequence (1) a. ~ ~L:~ ~J ~ b. j~ngwfichfi t~g6ng de dgdfi c. police provide -- answer d. the answer provided by police can be parsed either by \[\[\[~\] Np\[~,\] vn ~\] RelPh \[~\] NP\] NP or by \[\[\[\[~-~\] Np\[~.\] vn~J\] Nom\] NP\[ ~\] NP\] NP However the latter parse is not linguistically meaningful, and is rather an artifact of the overly general noun compounding rule 5. The problem is that it becomes quite cumbersome in a pure CFG to specify accurately which types of noun phrases are permitted to compound, and this usually leads to excessive proliferation of features and/or nonterminal categories.</Paragraph> <Paragraph position="6"> Instead, the approach described here augments the CFG rules with a restricted set of contextual applicability conditions. A production in our extended formalism may have left and/or right context, and is denoted as</Paragraph> <Paragraph position="8"> (N U E)* and the left context condition L and the right context condition R are of a form described below. These context conditions help cut the parser's search space by eliminating many possible parse trees, increasing both parsing speed and accuracy. Though ambiguities remain, the smaller number of parses per sentence makes it more likely that most-probable parsing can pick out the correct parse.</Paragraph> <Paragraph position="9"> Nonterminal functions In addition, a second extension is the introduction of a variety of nonterminal functions that may be attached to any nonterminal or terminal symbol3 These functions are de1The term nonterminal \]unctions was chosen for mnemonic purposes; it is actually a misnomer since they can be apphed to terminal symbols as well.</Paragraph> <Paragraph position="10"> signed to facilitate natural expression of conditions for reducing ambiguities. Some of the functions are simply notational sugar for standard CFGs, while others are context-sensitive extensions. These functions are list in the following sections. By convention, we will use a and b for symbols that can be either terminals or nonterminals, c for terminal symbols only, d for the semantic domain of a terminal, and i for an integer index.</Paragraph> <Paragraph position="11"> The not function The not function is denoted as /!b, which means any constituent not labeled b. Note that this feature must not be used with rules that can cause circular derivations of the type A =V* A, since this would lead to a logical contradiction.</Paragraph> <Paragraph position="12"> In the previous example, if we change rule</Paragraph> </Section> <Section position="3" start_page="14" end_page="15" type="metho"> <SectionTitle> 2 to </SectionTitle> <Paragraph position="0"> Nom ~ NP vn ~ {/!NP) the new right condition/!NP prevents rule 2 from being used within cases such as rule 5, where the immediately following constituent is an NP. This causes the the correct parse to be chosen: \[\[\[~\] Np\[~,\] vn\] RelPh \[~\] NP\] NP We have only found this function useful for left and right contexts, rather than the main body of production right-hand-sides.</Paragraph> <Paragraph position="1"> The excluded-category function The excluded-category function is denoted as a/!b that means a constituent labeled a, which moreover cannot be labeled as b. Again, not to be used with rules that can cause circular derivations.</Paragraph> <Paragraph position="2"> The main purpose of the excluded-category function is to improve robustness when the grammar coverage inadequacies prevent a full parse tree from being found. In such cases, our parser will instead return a partial parse tree, as discussed further in Section 5. The excluded-category function can help improve the chances of choosing the correct rules within the partial parse tree. For example, consider its use with the verb phrase construction NP verb (Obj) which is known as the ~t~(ba)-construction. If the verb has part of speech vn, then it is monotransitive and only one object is needed to form a VP, but if the verb is a ditransitive vnn, then a second object is needed to form the VP.</Paragraph> <Paragraph position="3"> An example of the monotransitive case is ~ (2) a.~ ~ y b. b~ fPSu ch~ le c. -- food eat -d. have eaten the food while an example of the ditransitive case is (3) a.~J~ T b. b~t fPSu sbng rdn le c. -- food give somebody -- null d. give food to somebody The former phrase can be correctly parsed by the monotransitive rule</Paragraph> <Paragraph position="5"> Suppose that the parser is unable to find any full parse tree for some sentence that includes the latter phrase. The above monotransitive rule would still be considered by the parser, since it is performing partial parsing, and this rule matches the subsequence ~\[~ ~ ~. In fact this is not the correct rule for the ditransitive phrase--the VP is not ~ ~ ~ but rather g~ ~ J~ ~--but we would not be able to distinguish the monotransitive and ditransitive cases ~ ~.~ ~g and ~\[~ ~ ~, because both ~g and ~ can have part of speech vn. Thus the monotransitive subparse might incorrectly be chosen for the partial parse output (whether this happens depends rather arbitrarily on the possible subparses found over the rest of the sentence).</Paragraph> <Paragraph position="6"> The key to eliminating the incorrect possibility altogether is that only ~ can also have the part of speech vnn. We refine the rule with our excluded-category function:</Paragraph> <Paragraph position="8"> 2For this and all subsequent examples, (a) is the Chinese written form, (b) is its pronunciation, (c) is its word gloss ('--' means there is no directly corresponding word in Engfish), and (d) is its approximate English translation.</Paragraph> <Paragraph position="9"> The monotransitive phrase can still be parsed by this new rule since ~ cannot have the part of speech vnn: 3 \[~\[~\[~\] Np\[~Y~\] vn\] Vp J'.</Paragraph> <Paragraph position="10"> But because ~ can be labeled as either vn or vnn, it does not match vn//vnn, and therefore the rule cannot be applied to the ditransitive phrase. This leaves the ditransitive production null VP~ ~ NP vnn NP as the only possibility, forcing the correct sub-parse to be chosen here. In a sense, this function allows a measure of redundancy in the grammar specification and thereby improves robustness.</Paragraph> <Paragraph position="11"> The substring-linking function The substring-linking function is denoted a/i. This is used to remember the string that was matched to a constituent a, so that the string can be compared to a subsequent appearance of a/i in the same production. In general, we may have several occurrences of the same nonterminal, and it is occasionally useful to be able to constrain those occurrences to match exactly the same string.</Paragraph> <Paragraph position="12"> One important use of substring-linking in Chinese is for reduplicative patterns. Another use can be seen in the following two sentences: (4) a. ~ ~ ~ f~ ~ ~ g b. t~ zub bfl zu6 zh& ji~n sh\] c. he do not do this -- thing d. will he do this thing (5) a. ~ ~ :~ ~iJ ~ ~ ;~ b. ta zub bh dPSo zh~ jiPSn sh~ c. he do not do this -- thing d. he cannot do this thing Let us consider two sequences {~ ~ {~ and {~ ~I\] in (4) and (5) respectively, where {5 and ill can both be labeled as vn, but they have a different role. The former indicates a question, and the latter a negative declaration; clearly the parses must differentiate these two cases. If the only rule in the grammar to handle these examples is</Paragraph> <Paragraph position="14"> then the two sequences will be parsed identically. However, with the substring-linking function we can refine the rule to queslion_verb ~ vn/1 Yg vn/1 Now the first vn/1 is defined as (~ in both cases when the first {~ is parsed. For the first sequence, the second ~ matches the second vn/1 when it is compared to the earlierdefined value of vn/1. Because the substrings match, the first sequence can be parsed by this rule as \[\[C/~\] vn~\[~\] ~n\] q=~io~_~rb In contrast, for the second sequence, when ~sJ is compared with the defined value of vn/1 -- f~ -- they are different, and therefore the second sequence cannot be parsed by the rule. In this example, the defined value of a nonterminal is only one word. However, in the general case it can be an arbitrarily long string of words spanned by a nonterminal (vnl in this example).</Paragraph> <Paragraph position="15"> The semantic-domain function The semantic-domain function is denoted by c/$zd and designates a terminal c whose semantic domain is restricted to d. This is an ordinary feature, that we use in conjunction with the BDC dictionary which defines semantic domains.</Paragraph> <Paragraph position="16"> Given two sentences, (6) a. ~ ~&quot; ~ ~ b. zki gu~ngd6ngsh@ng de t6uz~ c. in Guangdong -- investment d. the investment in Guangdong Province (7) a. ~ 'J'~E ~ b. z~i xi~ozh~ng de ji~ c. in XiaoZhang -- house d. in XiaoZhang's house they have the same surface structure</Paragraph> </Section> <Section position="4" start_page="15" end_page="21" type="metho"> <SectionTitle> NP ~J NP </SectionTitle> <Paragraph position="0"> but they are quite different. In (6), :~ ~ -~ is the modifier of ~. In (7), tJx~ is a modifier of 5, and they together form a NP as the object of ~.</Paragraph> <Paragraph position="1"> It is very hard to distinguish these two cases in general. With traditional CFGs, this is problematic because both ~-~i&quot; and ,'J~ have the part of speech up, and both ~.~ and have part of speech nc. We can do a somewhat better job by using the domain knowledge supplied by a dictionary with semantic classes.</Paragraph> <Paragraph position="2"> The difference between the two phrases is that although ~-~&quot; and ~ are both loca-tion nouns, not all NPs following a ~ can be formed into locative phrase--only if the head noun of the NP is a location noun can it can be parsed as a locative phrase. (6) is parsed as \[\[\[:~\[~\] NP\] LocPh ~\] ModPh \[~\] NP\] NP because :~PS ~g~&quot; is a locative phrase, where LocPh stands for locative phrase, and ModPh stands for modifier phrase. But in (7), the entire phrase :i~ dx~ ~J ~ forms a locative phrase, and is parsed as \[:~ \[\[\['J'~ NP ~I'~\] ModPh \[~\] NP\] NP\] LocPh The key point here is how to define a location noun. We have rules</Paragraph> <Paragraph position="4"> where GE is the abbreviation of geology. Because the domain of ~&quot; is GE, it is parsed as a location_noun, and together with the leader ~ is parsed as a locative phrase. But ~J~ cannot be parsed as a locative phrase with the leader ~ since its domain is not GE; instead it is parsed as the modifier of , at which point the parser will further check whether :i~ plus ~J~ ~ ~ can be parsed as a locative phrase.</Paragraph> <Paragraph position="5"> The has-subconstituent function This function is denoted as a/@b, which means a constituent labeled a with any descendant of category b, where a is a nonterminal and b can be either a terminal or a nonterminal. In other words, this matches an internal node labeled a, which has a subtree with root labeled b.</Paragraph> <Paragraph position="6"> Consider the two sentences (s) a. 4~-~ 7 ~ ~ ~ b. t~ xu~ le li-~ng g~ :~ngq{ c. he learn -- two -- week d. he has learned it for two weeks (9) b. t~ xu~ le li~ng pi~n k~w~n c. he learn -- two -- lesson d. he has learned two lessons In Sentence (8), ~ ~ ~\[~ is the complement of-~-, while in Sentence (9), ~ ~ -~ is the object of ~. However, both NPs ~ ~ and ~ ~ ~ superficially have the same structure, and the parser may assign Sentence 8 the wrong parse tree \[\[~\] Np\[\[--~-\] vn T \[\[~ ~\] ClPh\[~ \] NP\] NP\] VP\] clause instead of the correct one \[\[~\] Np\[\[-~\] vn~ \[\[\[\[~ ~\] CIPh\[\[~ \]time_particle \] NP\] NP\] TP\] Comp\] VP\] clause where ClPh stands for classifier phrase, TP stands for time phrase, and Comp stands for the complement of a verb.</Paragraph> <Paragraph position="7"> The difference between them lies in that ~ is a time particle, and therefore is parsed with its classifier ~ ~ as a time phrase, whereas -~ is a general noun, and is parsed with its classifier ~ ~ as a general NP. With the rule time_phrase --~ NP/@time_particle we can parse ~ ~ ~\] as a time phrase, and since it is a time phrase, it will be parsed as the complement of ~a. But becase ~ ~ ~5~ is a just general NP, it can not be parsed with this rule, and it will serve only as the object of ~.</Paragraph> <Section position="1" start_page="16" end_page="18" type="sub_section"> <SectionTitle> Earley Parsing </SectionTitle> <Paragraph position="0"> We use a generalization of the Earley algorithm (3, 2) to parse grammars of our form.</Paragraph> <Paragraph position="1"> Although the time complexity rises compared to the Earley algorithm, it remains polynomial in the worst case.</Paragraph> <Paragraph position="2"> Algorithm The key to modifying the Earley algorithm to handle the left and right context conditions is that our rules can be rewritten into a full form which includes all symbols including the contexts, plus indices indicating the left and/or right context boundaries. For example, let A~{L} B {R}andC~D E {R}betwo production rules. They are rewritten respectively as A ~ L B R, start = 2, len = 1 andC~D E R, start = 1, len=2. Once this transformation has been made, the machinery from the Earley algorithm carries over remarkably smoothly.</Paragraph> <Paragraph position="3"> The main loop of the parsing algorithm employs the following schema.</Paragraph> <Paragraph position="4"> 1. Pop the first entry from the agenda; call the popped entry c.</Paragraph> <Paragraph position="5"> 2. If c is already in chart, go to 1.</Paragraph> <Paragraph position="6"> 3. Add c to chart.</Paragraph> <Paragraph position="7"> 4. For all rules whose left corner is b, call match(b, c). If the return value is 1, add an initial edge e for that rule to chart; for all the chart entries (subtrees) d beginning at end(e)/l, if g is the active symbol in the RHS (right-hand-side) of e and match(g, c') returns 1, then call extend(e, cl).</Paragraph> <Paragraph position="8"> 5. If the edge e is finished, add an entry to the agenda.</Paragraph> <Paragraph position="9"> 6. For all edges d, if g is the active symbol in the RHS of d and match(g, c) returns 1, then call extend(d, c) and add the resulting edge.</Paragraph> <Paragraph position="10"> 7. Go to 1.</Paragraph> <Paragraph position="11"> extend(e, c): (extends an edge c with the chart entry (subtree) c) 1. Create a new edge e'.</Paragraph> <Paragraph position="12"> 2. Set start(e') to start(e).</Paragraph> <Paragraph position="13"> 3. Set end(e') to end(e).</Paragraph> <Paragraph position="14"> 4. Set rule(e') to rule(e) with. moved beyond C.</Paragraph> <Paragraph position="15"> 5. If the edge e / is finished (i.e., a subtree) then add e I to the agenda, else for all chart sub-trees c I beginning at end(el)+1, if g is the active symbol in the RHS of e I and match(g, c') returns 1, call extend(e I, c').</Paragraph> <Paragraph position="16"> match(g,c): (checks whether a subtree c can be matched by a symbol g) 1. If c's category does not equal to g's category, return 0.</Paragraph> <Paragraph position="17"> 2. Check whether g's associated functions are satisfied by c -(a) If g has the form a/!b or /!b, check all the entries in the chart that span the same range as c, returning 0 if any have category b.</Paragraph> <Paragraph position="18"> (b) If g has the form a/i, if a/i is not defined, link it to c and return 1. Otherwise, compare c with the defined value of aft; if not the same, return 0.</Paragraph> <Paragraph position="19"> (c) If g has the form c/&d, if the semantic domain of c is not d, return 0.</Paragraph> <Paragraph position="20"> (d) If g has the form a/@b, check all the nodes of the subtree c; if no node of category b is found, return 0.</Paragraph> <Paragraph position="21"> 3. Return 1.</Paragraph> <Paragraph position="22"> The difference from standard Earley parsing (aside from the rule transformation mentioned above) lies is in match. To check whether an entry matches the left corner of a rule or whether an edge can be extended by an entry, we need to check not only that the category of the constituent is matched, but also that the attached function if any is satisfied. Recall that our application for the parsing algorithm is as the first stage of a robust bracketer. We therefore use an extension of this parsing approach that permits partial parsing. In this version, if the sentence cannot be parsed, a minimum-size subset of subtrees that cover the entire sentence is produced. In the following, we will use an example sentence to demonstrate how the algorithm works. The sentence and the grammar we use here are oversimplified, but show how a right context is handled.</Paragraph> <Paragraph position="23"> The sentence to be parsed is (10) a. ~ ~ fl,~ :~ b. t~ m~i de y~ffi c. he buy - clothes d. the clothes bought by him and the grammar is 1. NP ~ pron 2. NP---~ nc 3. RelPh---~ NP vn ~l~ (NP} 4. NP--~ RelPh NP 5. pron ~ ~ 6. nc ~ ~ 7. vn---~ The first portion of the parsing for this example is identical to standard Earley parsing. We pop the first the entry from the agenda, ~ , and since it is not already there we add it to the chart. The only initial edge to be added is pron ~ ~ Since this edge is finished, we add it to the agenda.</Paragraph> <Paragraph position="24"> Next we pop pron from agenda, create an initial edge NP ~ pron * and find it is also finished, and so add the NP to the agenda.</Paragraph> <Paragraph position="25"> Again we pop NP from the agenda, and create the initial edge RelPh --~ NP vn ~ { NP} We find this edge cannot be extended by any entry and is not finished, so we go to step 1 and pop the next entry ~ from the agenda. We continue this step until we pop :~ from the agenda, and add nc and later NP to the agenda. Up to this point, all we are doing is standard Earley parsing.</Paragraph> <Paragraph position="26"> Now we pop NP which spans :~n~ from the agenda, and find that the edge RelPh -+ NP vn t\]'~ { NP} can be extended by this entry. We find the extended edge is finished, so we add the RelPh to the agenda, then pop it, creating a new edge NP ~ ReIPh NP An entry (subtree) NP which spans ;iJ~\]~ is already in the chart when the last edge is created. Thus the last edge can be extended, creating a finished edge, so we have created an subtree NP that spans the whole sentence. Since there is now a nonterminal that spans the whole sentence, we can write down a parse tree of the sentence in a subscripted bracket form as \[\[\[\[~\]pr0 n\] Np\[~\] vn ~J\] RelPh \[\[:~\] n c\] NP\] NP We do not yet have a tight upper-bound for this parsing algorithm in the worst case. Clearly the algorithm will be more time consuming than for CFGs because the match procedure will need to check not only the categories of the constituents, but also their associated functions, and this check will not tak@ constant time as for CFGs.</Paragraph> <Paragraph position="27"> But though the algorithm is clearly worse than CFG in the worst case, in practice, the complexity in practice will depend heavily on particular sentences and the grammar. The number and type of context conditions used in the grammar, and the kind of nonterminal functions, will greatly affect the efficiency of parsing. Thus empirical performance is the true judge, and our experience as described next has been quite encouraging.</Paragraph> </Section> <Section position="2" start_page="18" end_page="21" type="sub_section"> <SectionTitle> Results </SectionTitle> <Paragraph position="0"> We are currently developing a robust grammar of this form for the Chinese bracketing application. Although the number of rules is changing daily, the evaluation was performed on a version of the grammar containing 948 rules. The lexicon used was the BDC dictionary containing approximately 100,000 entries with 33 part of speech categories (1).</Paragraph> <Paragraph position="1"> To evaluate our progress, we have evaluated precision on a previously unseen sample of 250 sentences drawn from our corpus, which contains Hong Kong legislative proceedings.</Paragraph> <Paragraph position="2"> The sentences were randomly selected in various length ranges of 4-10, 11-20, 21-30, 3140, and 41-50 words, such that each of the five ranges contained 50 sentences. All those sentences were segmented by hand, though we will use an automatic segmenter in the future.</Paragraph> <Paragraph position="3"> We evaluated three factors: .</Paragraph> <Paragraph position="4"> .</Paragraph> <Paragraph position="5"> The percentage of labeled words. A word is unlabeled if it can not form deeper structure with at least one other word. Unlabeled words often indicate inadequacies with lexicon coverage rather than the grammar.</Paragraph> <Paragraph position="6"> Weighted constituent precision, i.e., the percentage of incorrectly identified syntactic constituents. A constituent is judged to be correct only if both its bracketing and its syntactic label are correct.</Paragraph> <Paragraph position="7"> Because we don't give a single parse tree if there is for a sentence at the current stage, we uniformly weight the precision over all the parse trees for the sentence. Therefore this measure is a kind of weighted precision (6).</Paragraph> <Paragraph position="8"> O: (final (clause (clause (advph (sadv ~ ) , ) (clause (nounph (nounph (noun (pron ~J~ )) (noun (nc ~fi~ )))) (verbph (zaiph ~ (nounph (modph (relph (nounph (noun (up ~ ))) (vppart (vn (vadv ~ ) (vn ~ ))) fl-~ )) (nounph (modph (aa (vil -~ ))) (nounph (noun (nc ~:~ ))))) (locat_part ~ ))))) (punc , ) (clause (verbph (vn (auxvb (aux ~ )) (vn ~ )) (nounph (assocph (nounph (d ~.~ ) (nounph (noun (nc ~Ji~i )))) fl'~ ) (nounph (noun (nc ~P-4 ))))))) o ) O: (final (clause (clause (advph (sadv :~}~ ) , ) (clause (nounph (nounph (noun (pron ~J~ )) (noun (nc ~ )))) (verbph (zaiph ~ (nounph (modph (relph (nounph (noun (up ~-~4~ ))) (vppart (vn (vadv ~1~-~ ) (vn ~{~ ))) t~J )) (nounph (modph (aa (vil -~- ))) (nounph (noun (nc ~:~ ))))) (locat_part ~ ))))) (punc, ) (clause (verbph (vn (auxvb (aux ~ )) (vn f~ )) (nounph (assocph (nounph (d ~ ) (nounph (noun (nc 9~ )))) ~ ) (nounph (noun (nc ~P-4 ))))))) o ) O: (final (clause (clause (advph (sadv ~i~ ) , ) (clause (nounph (d ~\] ) (nounph (noun (nc Jk )))) (cjs ~ ) (verbph (vn (vadv ~iE ) (vn ~ )) (nounph (noun (nc I~ )))))) (punc,) (clause (verbph (verbph (vn (vadv ~,\[1 ) (vn (auxvb (aux ~A )) (vn ~ ))) (nounph (noun (nc ~ )))) (verbph (vn ~ ) (nounph (noun (nc ~ ))))))) o ) O: (final (clause (nounph (noun (nc iTi~ ))) (clause (clause (nounph (clph (d ~ ) (cl (auxvb (aux o/)) (vn ~ )) (nounph (noun (verbph (vs (vadv ~ ) (vs (vadv ~:~ ) (vs ~ ))) )) (nounph (noun (nc ~j~ )))) (verbph (verbph (vn (up ~&/~j )))) (verbph (vil ~g~ )))) ~ ))) ? ) (nounph (nounph (noun (nc ~)~ )) (noun (nc ~ )))) , (clause (nounph (pron ~J~ )) (verbph (vil (neg ~ ) (vil ~1\]~ )))) (verbph (covph (p ~ ) (nounph (pron ~J~ ))) (verbph (vn ~ ) (nounph (clph (d ~ ) (cl ~ )) (nounph (noun (nc I~ )))))) , (clause (verbph (vil (vadv ~:~ ) (vil (vadv ) (vil ~ (vil ~ )))))) o (nounph (nounph (noun (nc ~1~ )) (noun (nc ~J~ )))) , (clause (advph (sadv ~ )) (clause (nounph (pron ~ )) (verbph (vv ~ ) (verbph (covph (p ~,~ ) (nounph (modph (relph (nounph (d ) (nounph (noun (nc ~\]~ )))) (vppart (verbph (vn ~,-~\] ) (nounph (noun (up ~Jx~\]t )) (noun (nc ~ )))) (vn (vadv ~ ) (vn ~ ))) ~ )) (nounph (modph (aa (a --t)J ))) (nounph (noun (nc ~J3 ))))) (punc,)) (verbph (vnv ~ ) (nounph (d ~ ) (nounph (nounph (noun (nc (~ff~)) (noun (nc -~))))) (verbph (vil (vadv ~ ) (vil ~ )))))))) o (nounph (assocph (nounph (q --~=~ ) (noun (nc ~l~ ))) (C/3) (nounph (noun (nc 1\]~.~ )))) , (advph (sadv ffljPl:l )) (nounph (nounph (nounph (noun (nc ~ )) (noun (nc ~ )))) (cjw $~ ) (nounph (nounph (noun (nc ~Jk )) (noun (nc ~ ))))) , (clause (verbph (vn (vadv ~ ) (vn (auxvb (aux :~ ~ )) (vn i~l\] ))) (nounph (assocph (nounph (nounph (nounph (noun (pron ~J~ )) (noun (nc ,~'l~l )))) (locat_part ~ )) ~J ) (nounph (noun (nc x_k~ )))))) o O: (final (clause (clause (clause (nounph (q --~ ) (noun (nc ,~,~=~ ))) (verbph (vn ~ ) (nounph (modph (relph (vppart (vn ~ )) (nounph (noun (nc ~E\] )) (noun (nc .-~-~ ))) ~ )) (nounph (nounph (nounph (noun (nc ~t )) (noun (nc ~ ))) (noun (nc ~ ))))))) (punc , ) (clause (verbph (vs ~ ) (clause (nounph (pron ~J~ )) (verbph (covph (p ~ ) (nounph (d ~_ ) (nounph In the future, we will give a single most probable parse tree for a sentence if it can be parsed. Note that the precision in this case is likely to be lower bounded by the weighted precision reported here, since we currently assign equal weight to all parses, even if they are improbable.</Paragraph> <Paragraph position="9"> 3. The average run time per sentence.</Paragraph> <Paragraph position="10"> Results are shown in Table 1. We have unfortunately found it impossible to perform comparison evaluations against other systems, due to the unavailability of Chinese parsers in general. However, we believe these performance levels to be quite competitive and promising.</Paragraph> <Paragraph position="11"> Meaningful baseline evaluations are currently difficult to design for Chinese parsing because of the unavailability of comparison standards. Examples of the Chinese output still give by far the most important indication of parsing quality. Some representative examples are shown in Figures 2 and 2. The parser produces two kinds of outputs. If no complete parse tree is found for the input sentence, a partial parse is returned; such examples are shown without a number preceding the parse. Otherwise, the first complete parse tree is shown, preceded by the number 0 (indicating that it was the first alternative produced). null Conclusion We have described an extension to context-free grammars that admits a practical parsing algorithm. We have found the notation and the increased expressiveness to be well-suited for writing large robust grammars for Chinese, particularly for handling compounding phenomena without incurring the level of parsing ambiguity common to pure context-free grammars. Experiments show promising performance on Chinese sentences.</Paragraph> <Paragraph position="12"> With regard to the theme of this conference, we are clearly emphasizing representation over algorithms. We have developed a new representation that neatly captures the domain characteristics, and in our experience, greatly improves the coverage and accuracy of our bracketer. Algorithms follow naturally as a consequence of the representational features. It will be interesting to explore the relationships between our grammar and other context-sensitive grammar formalisms, a topic we are currently pursuing.</Paragraph> </Section> </Section> class="xml-element"></Paper>