File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-2203_intro.xml

Size: 2,493 bytes

Last Modified: 2025-10-06 14:05:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2203">
  <Title>CHINESE SEGMENTATION DISAMBIGUATION</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Processing (7hincsc texts is spccifi('ally difficult in its computation because liol'mally sentc.nces in Chinese texts arc rcp:rcscnt(;d as strings of Chiucse characters without spacc's to indica.t(: wor( |boundaries. This (;auscs a problem for Chinese machine translation&gt; sl, atistical analysis of (Jhincse corpora, (lhincse informal;ion rctrieva,l, ct(:.; a.s usually these projects axe I)e~scd on the a.ssurilt)tion t,tmt rill lexicon (lisl, iIictions have \[)(;Cll i'ccoglliZ(',d iU ,~dva, il ('(:.</Paragraph>
    <Paragraph position="1"> Several a.pproachcs a.iYrled t;o lir a.tl s \['(:r a, @ h i-nese ciia.ra,(;ter stri:ng into a. word sl, ring ha.ve I)ecn studied in recent decade's. Two coinpeting approaches cominonly used for Chinese l;cxl, scanlent&amp;lion are the st~l;isl;ical a f)proach (Cilang, (;t a.l, 1!)91; Sproat and Shih, 1991; Chiang, et al, 1992) and the heuristic N)proach (Chcn and l,iu, 1992; lie, ct al, 1991; ,\]in arid Nie, 1993; diil, ;1992; I,iang and Zhcn, 1991; Wang, ctal, 1991). AI thougi~ ~t high degree of precision }las l)ecn reporl;cd for both :inel;hods, c~t(;h has its linl.iliatiions particularly ill identifying ill/known words and disamMgu~ting mulLiplo .ql.~l~IilCilrations, l/,ccently, a hybrid N)l)roach incof pora,ting heuristics with statistics h~s l:)een studied in an at;lieinpl, l;o solve ltllkllOWll word 17ccognil, ion prol)lems (Chen an(1 lAu, \]992; Nic and Jin, 199/1). l{owevcr, ambiguous scg-menial;ion is still a difIicult problem.</Paragraph>
    <Paragraph position="2"> In t, his paper a Iriel, hod of r(;&amp;SOlling illlder un(;o.rl;a,inty iul,ondirlg l;o disalnl)iguate (Jillnose scgmcul, aliion is prcscnl;ed. A model ot! cvid(mtia,i sl, rengl;h in inex~mt rea.soning has been studied by (lhl('han;m and Short liffc, I {)8/I). hi the process of (\]hiricsc segmentation know\]trig(' ill tnot'phology., syl:ll, a x, Sel~nant;it:s gild pra,gma|,ics is used as evidcnco, to support t hc (lisalnl)igual, ion hypotheses. '\]'lle silnilm'ity of uut;('.rl;a.irl kuowh:dg(; and iucxacl; rca soning l)cl;wccn medical dbtgnosis and natu-raJ \]migti;~ge intic'rpl'el;al, ion lnakcs it, po~siblc t,o apply MY(71N l;echnique to Chinese t;cxl, scgmcnl;at;ion.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML