File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-1014_evalu.xml
Size: 6,240 bytes
Last Modified: 2025-10-06 13:58:33
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1014"> <Title>Reusing an ontology to generate numeral classifiers</Title> <Section position="8" start_page="92" end_page="95" type="evalu"> <SectionTitle> 5 Evaluation and Discussion </SectionTitle> <Paragraph position="0"> The algorithm was tested oil a 3700 sentence tnaclaine translation test set of Japanese with English translatious, although we only used the JapaneseJ ~The test set is available at www.kecl.ntt.co.jp/ icl/mtg/resources.</Paragraph> <Paragraph position="1"> We only considered sentences with a noun phrase modified by a sortal classifier. Noun phrases modilied by group classifiers, such as -soku &quot;pair&quot; were not evaluated, as we reasoned that the presence of such a classifier would be marked in the input to the generator. We also did not consider the anaphoric use of numeral classifiers. Although there were ninny anaphoric examples, resolving them requires robust anaphor resolution, which is a separate problem. We estimate that we would achieve the same accuracy with the anaphoric examples if their referents were known, unfortunately the test set did not always include the full context, so we could not identify the referents and test this. A typical example of anaphoric use is (10).</Paragraph> <Paragraph position="2"> (1o) shukka-ga ruiseki-de 500-hon-wo shipment-NOM cumulative 500-CL-ACC toppa-shita roached Cumulative shipments reached 500 ?barrels/rolls/logs/... null In total, there were 90 noun phrases modified by a sortal classilier. Our test of the algoritlml was done by hand, as we have no Japanese generator. We assumed as input only the fact that a classifier was required, and the semantic classes of the head noun given in the lexicon. Using only the default classitiers predicted by the senmntic class, we were able to generate 73 (81%) correctly. A classifier was only judged to be correct if it was exactly the stone as that in the original test set. This was ahnost double the base line of generating the most common classifter (-nin) for all noun phrases, which would have achieved 41%. The results, with a breakdown of the errors, are summarized in Table 2.</Paragraph> <Paragraph position="3"> In this small sample, 6 out of 90 (6.7%) of noun phrases needed to have tim default classifier marked for the nouu. In fact, there were only 4 different nouns, as two were repeated. We therefore estinmte that fewer than 6% of nouns will need to have their own default classifier marked. Had the default classifier for these nouns been marked in the lexicon, our accuracy would have been 88%, the maxinmm achievable for our method.</Paragraph> <Paragraph position="4"> Looking at it from allolher point of view, the Goi-Taikei ontology, although initially designed i'or .lapanese analysis, was also useftfl for generating Japanese numeral chtssifiers. We consider that it would be equally useful for the same task with Kol'can, or even lhe tmrelaled language Mahty.</Paragraph> <Paragraph position="5"> We generated the residual classilier -tsu for nouns not in the lexicon, this proved to be a bad choice lbr three unknown words. If we had a me(hod o1: deducing senlanlic chtsses for tlnknown words wc couM have used it to predict the classiiicr more successfully. 1;or example, kikan-l&vhika &quot;institutional investor ''5 was not in the dictionary, and so we used the senmntic class for lOshika &quot;investor&quot;, which was 175 : investor, a sub-type of 5 :person. Had kikan-toshika &quot;institutional investor&quot; been marked as a subtype of company, or if we had deduced the semantic class from the modifier, then we would have been able to gener5hmlitufional illvcStOl'S are \[inancial institutions tha! invest savin~,s of individuals and non-lina.ncial companies in the financial nmrkets.</Paragraph> <Paragraph position="6"> ate tho correct classifior -sha. In ono case, wc felt lho default ordering of the semantic classes should have been reversed: 673:tree was listed before 854 : edible fruit for ringo &quot;apple&quot;.</Paragraph> <Paragraph position="7"> The remaining errors were moro problematic.</Paragraph> <Paragraph position="8"> There was one cxamplc, 80,O00-nin-amari-no .vl,#nlei &quot;about 80,000 signatures&quot;, wlfich could be ueated as rel:ercnt tlansfof: shomei &quot;signature&quot; was being counted wilh the classifier for people.</Paragraph> <Paragraph position="9"> Another l)ossiblc analysis is that the classilier is the head of a referential noun phrase with deictic/almphoric reference, equivalent to the si,qnaluJws oJ'ahold SO, 000 people. A COUlJe were quile literary in slylc: for example lOnen-no loshi &quot;10 years (Lit: 10 years of years)&quot;, where the loshi &quot;year&quot; lmrt is redundant, and would not normally be used. in two of the errors the residual classilier was used instead of (he more specific default. Shhnoio (1997) prc~ dicls flint this will happen in expressions where lhe amotlnl is being emphasized more than what is being counted. Intuitively, lifts applied in both cases, but we were ul\]able to identify any features we could exploit 1o make this judgment autolnatically.</Paragraph> <Paragraph position="10"> A more adwmced semantic analysis may be able lo dynamically delermine the appropriate semantic class for cases of rel'ercnt transfer, unknown words, or words whose semantic class can be restricted by context Our algorithm, which ideally generates the classifier from this dynamically determined semantic class allows us to generate the correct classilier in context, whereas using a default listed for a noun does not. This was our original mot|wit|on 1'oi&quot; generating chtssitiers 1Y=o111 seman(ic classes, rather than using a classifier lis(ed wilh each noun as Sornlert- null lamvanich et al. (1994) do.</Paragraph> <Paragraph position="11"> In this paper we have concentrated on solving the problem of generating appropriate Japanese numeral classifiers using an ontology. 11\] future work, we would like to investigate in more detail the conditions under which a classifier needs to be generate& null</Paragraph> </Section> class="xml-element"></Paper>