File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1107_intro.xml

Size: 3,304 bytes

Last Modified: 2025-10-06 14:02:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1107">
  <Title>Chinese Chunking with another Type of Spec</Title>
  <Section position="3" start_page="1" end_page="2" type="intro">
    <SectionTitle>
(1) Noun-noun compounds
</SectionTitle>
    <Paragraph position="0"> Compounds formed by more than two neighboring nouns are very common in Chinese and not always all the left nouns modify the head of the compound. Some compounds consist of several shorter sub-compounds. For example: (g19750g5192/younger g5547g5907g13785/volunteer g12197g6228 /science and technology g7393g2165g19443/service team) 'young volunteer service team of science and technology' 'g19750g5192 g5547g5907g13785' and 'g12197g6228 g7393g2165g19443' are two sub-compounds and the former modifies the latter.</Paragraph>
    <Paragraph position="1"> But sometimes it is impossible to distinguish the inner structures, for example: g1002g11040/world g2656g5191/peace g1119g1006/career It is impossible to distinguish whether it is {{g1002 g11040 g2656g5191} g1119g1006} or {g1002g11040 {g2656g5191 g1119g1006}}. English chunking also shows such problem, and the common solution for English is not to identify their inner structure and treat them as a flat noun phrase. Following is an example in CoNLL2000 shared task: [NP employee assistance program directors] (2) Coordination Coordination in all cases can be divided into two types: with conjunctions and without conjunctions. The former can be further divided into two subcategories: word-level and phrase-level coordinations. For example: {g6931g12586g5627/policy g19146g15904/bank g994 /and g2842g1006 /commercial g19146g15904/bank} g11352 /of {g13864g13007 /relationship g994/and g2524g1328/cooperation} 'the relationship and cooperation between policy banks and commercial banks'.</Paragraph>
    <Paragraph position="2"> The former coordination is phrase-level and the latter is word-level. Unfortunately, sometimes it is difficult or even impossible to distinguish whether it is word-level or phrase-level at all, for example: g7380g1314/least g5049g17176/salary g2656/and g10995g8975g17165/living maintenance 'the least salary and living maintenance' It is impossible to distinguish 'g7380g1314' is a shared modifier or not. English chunking also has such kind of problems. The solution of CoNLL2000 is to leave the conjunctions outside chunks for phrase-level coordinations and to group the conjunction inside a chunk when it is word-level or impossibly distinguished phrase-level. For example: [NP enough food and water] In Chinese, some coordinate construction has no conjunction or punctuation inside, and also could not be distinguished from a modifier-head construction with syntactic knowledge only. For  n, v, a, d, m, q, p, f , c are the POS tags of noun, verb, adjective, adverb, number, measure, preposition, localizer, conjunction respectively, '_' means neighboring, 'g11352/of' is a common auxiliary word in Chinese.  This statistical work is done on our test corpus whose setting is shown in Table 3.  wagons, caution lights and alarm whistles' Such problem does not exist in English because almost all coordinations have certain conjunctions or punctuations between words or phrases of the same syntactic categories in formal English.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML