File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1060_intro.xml
Size: 4,822 bytes
Last Modified: 2025-10-06 14:00:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1060"> <Title>A Hybrid Japanese Parser with Hand-crafted Grammar and Statistics</Title> <Section position="3" start_page="411" end_page="412" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"> In this section, we describe several models for Japanese dependency analysis and works on statistical approaches with gramlnars. Next, we introduce SLUNG, the HPSG-based Japanese grammar which is used in our hybrid parser.</Paragraph> <Section position="1" start_page="411" end_page="412" type="sub_section"> <SectionTitle> 2.1 Previous Dependency Analysis Models of Japanese </SectionTitle> <Paragraph position="0"> Several statistical models for Japanese dependency analysis which do not utilize a lland-crafted granlmar have been proposed. We evaluate the accuracy of bunsetsu-dependencies as they do, thus here we introduce thenl for comparison. All models introduced below are based on the likelihood value of the dependency between two bunsetsus. But they differ from each other in the attributes or outputs which are considered when a likelihood value is calculated.</Paragraph> <Paragraph position="1"> There are some models which calculate the likelihood values of a dependency between bunsetsu i and j as in (6), such as a decision tree model (Haruno et al., 1998), a maximum entropy model (Uchimoto et al., 1999), a model based on distance and lexical information (Pujio and Matsumoto, 1998). Attributes (I)i and ~I,j consist of a part-of-speech (POS), a lexical item, presence of a comma, and so on. And Ai,j is the number of intervening bnnscts'us between i and</Paragraph> <Paragraph position="3"> However, these lnodels Nil to reftect contextual information because attributes of the surrounding bunsets,tts are not considered.</Paragraph> <Paragraph position="4"> Uchimoto et al. (2000) proposed a model using posterior context;. The model utilizes not only attributes about bunscts~s i, j but also attributes about all bunsets~> (including j) wlfich tbllow bunsetsu i. That is, instead of learning two output values &quot;T(true)&quot; or ':F(false)&quot; for the del)endency between two bunsets~zs, three output values are used *br leanfing: the b~m.setsu i is &quot;bynd (dependent on a bunsctsu beyond j)&quot;, &quot;dpnd (del)endent on the b~tsets~t 3)&quot; or &quot;btwn (dependent on a b'unscts~t between i and j)&quot;. The 1)robability is calculated by multiplying probabilities for all bunscts,~ls which tbllow b~trtsctsu i as in (7). 'l'hey report that this kind of contextual information improves accuracy. However, the model has to assume, the independency of all the random variables, which may cause some errors. null</Paragraph> <Paragraph position="6"> The difference between our model and these previous models are discussed in Section 3.</Paragraph> </Section> <Section position="2" start_page="412" end_page="412" type="sub_section"> <SectionTitle> 2.2 Statistical Approaches with a grmnnmr </SectionTitle> <Paragraph position="0"> There have been nlally l)rOl)osals tbr statistical t'rameworks particularly designed tbr 1)arsers with hand-crafted grmnmars (Schal)es, 1992; Briscoe and Carroll, 1993; Abney, 1996; Inui et al., 1!)97). The main issue in tiffs type of research is how to assign likelihoods to a single linguistic structure generated by a gramlnar. Some of tlmm (Briscoe and Carroll, 1!)93; hmi et al., 1997) treat information on contexts, but the contextual intbrmation is de.rived only fl'om a structure to wlfich the parser is trying to assign a likelihood value. Then, tim major difference be.tween their method and ours is that we consider the attributes of alternative linguistic structures generated by the grammar in order to deternfine the likelihood for linguistic structures.</Paragraph> </Section> <Section position="3" start_page="412" end_page="412" type="sub_section"> <SectionTitle> 2.3 SLUNG : Japanese Grammar </SectionTitle> <Paragraph position="0"> The Japanese grammar which we adopted, SLUNG (Mitsuishi et al., 1998), is an HPSG-based under-specified grammar. It consists of 8 rule schemata, 48 lexical templates for POSs and 105 lexical entries for functional words. As can be seen fl'om these figures, the granmmr does not contain detailed lexk:al information that needs intensive labor for development. However, it is precise in the sense that it aclfieves 83.7% dependency accuracy with a silnple heuristics 2 for the El)I{ almotated corl)us , and it can produce at least one parse tree for 98.4% sentences in the EDR annotated corpus. We use the grammar for generating parse tree forests, and our 'l~'iplet/Quadruplet Model is used tbr picking Ul) a single tree fl'om a forest.</Paragraph> </Section> </Section> class="xml-element"></Paper>