File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2116_metho.xml

Size: 20,088 bytes

Last Modified: 2025-10-06 14:13:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2116">
  <Title>VERBAL CASL FRAME ACQUISITION FROM A BIIAN(?,UAL C()RPUS: GRADUAL KNOWI,EDGE ACQUISITION</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
VERBAL CASL FRAME ACQUISITION FROM A BIIAN(?,UAL C()RPUS:
GRADUAL KNOWI,EDGE ACQUISITION
ltideki Tanaka-t
NHK Science and Technical Research Laboratories
</SectionTitle>
    <Paragraph position="0"> hanakah@ strl .nhk.o r.jp</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper describes acquisilion of English stillace case flames from a corpus, based on a gradual knowledge acquisition approach. To acquire and unambiguously accumulate precise knowledge, the process is divided inln three steps which are assigned to the most appropriate processor: either a human or a computer. The data is prepared by human workers and the knowledge is acquired and accumulated by a leaning program. By using this method, inconsistent hunmn judgement is minimized. The acquired case frames basically duplicate Imman work, but are more precise and intelligible.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="727" type="metho">
    <SectionTitle>
1 Gradual Knowledge Acquisition
</SectionTitle>
    <Paragraph position="0"> We have been developing an English-to-Japanese nutchine translation (MT) system (i~t news reports in l-nglish (Aizawa T., 1990) (Tanaka I I., 1991) and have so far studted the translation selection problem for common English verbs (Tanaka I1., 1992). Recently, we examined the problem of multiple translatkms for COllll/lOl\] English verbs (Tanaka \[1., 1993). Our MT system uses surface verbal case flames (simply written its case frames) to selccl a Japanese translation for an English verb. The need to acqtuirc and accumttlate case frames leads directly to three problems.</Paragraph>
    <Paragraph position="1">  (1) How to obtain detailed case frames which are accurate enough to mmslate highly polysemous verbs? (2) ltow to accumnlate a number o1' case frames in an unambiguous way.</Paragraph>
    <Paragraph position="2"> (3) Manual case frame acquisition tends to yield inconsis- null tent results since human judgements are changeable. \[Iow can we maintain cousistency? We need to devise a cleat' methodology lor acquiring suf-ficient case flames and accuumlating them in a way that is unambiguous and consistent.</Paragraph>
    <Paragraph position="3"> In this paper, we propose a gradually building up a knowledge base from a bilingual corpus to cope with these three problems. The knowledge base is a collection of case fiames. Fig. 1 shows an overall view of otn approach.</Paragraph>
    <Paragraph position="4"> The process is divided into three steps which arc assigned to the most appropriate processor: a hmnan or a computer. Using this method, detailed knowledge is obtained fiom the Fig. 1 : Case-Frame Tree Acquisition from a</Paragraph>
    <Section position="1" start_page="0" end_page="727" type="sub_section">
      <SectionTitle>
Bilingual Corpus
</SectionTitle>
      <Paragraph position="0"> target &amp;)main tents, unstable hmnan judgement is confined, and case IYames are accumtdated unambiguously by using a lemning algorithm.</Paragraph>
      <Paragraph position="1"> We begin by preparing a tagged bilingual corpus seeking detailed knowledge in target domain texts. The annotation described in the corpus is tile syntactic information of tile texts and tile translaliot~. They are assigned manually since hnman translators can do such jobs as syalactic lagging and translation with far more cousistency than writing case frames directly.</Paragraph>
      <Paragraph position="2"> Next, tile corpus is converted into an intermediate data form called the primitive case-flame table (PCI'T). Finally a stalistical learning algorilhm is used to extract the case frames from the PC\['T and accuuuulate them in a clear-cut fashion.</Paragraph>
      <Paragraph position="3"> While this approach let us avoid writing case flames directly using linguistic contemplation, human activity plays an important role in designing and constructing the corpus and converling it into the PCIq' (Fig. 1).</Paragraph>
      <Paragraph position="4"> The case frames are represented in a discrimination tcee, which has sev01al attractive features lor word-sense selection (Okunmra M., 1990). The biggest attraction of the learning algorithm, we think, is its intelligibility; compared with the algorithms for neural networks, for example, it produces highly intelligible results if the inpul is appmpri- null ate.</Paragraph>
      <Paragraph position="5"> Knowledge acquisition by machine learning from a corpus has recently been getting more attention than ever in some natural language processing fields. Cardie(1992, 1993) applied this approach to predict the antecedent of relative pronouns and attributes of unknown words.</Paragraph>
      <Paragraph position="6"> Utsuro(1993) introduced a methodology for autonmtically acquiring the verbal case frames from bilingual corpora in a different way than our methodology.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="727" end_page="727" type="metho">
    <SectionTitle>
2 Case Frames for Translation
</SectionTitle>
    <Paragraph position="0"> Ore&amp;quot; machine translation system uses case frames for the translation of English verbs. Fig. 2 shows illustrative case frames for the word take.</Paragraph>
    <Paragraph position="1">  SN \[man\] take ON \[boy\] ~..~ (select) SN \[I\] take ON \[him\] PN\[to\] PNc\[BUILD\] ~tL-Cb~ &lt; (escort) SN \[HUMAN\] take ON \[CON\] PN\[to\] PNc\[BUILD\] ~ -~ -( L, ~ &lt; (bring)  We write case categories (SN (subject noun) and PN(preposition) here) and specify their restrictions. The restriction can be a semantic category like tfUMAN or a word form itself like boy. There may be several hundred case frames for the most common English verbs.</Paragraph>
    <Paragraph position="2"> The translation selection is performed after the parser produces a syntactic structure for the input sentence. The system compares the syntactic structure with the case frames and selects the translation from the best-matching case frame. Translation selection is performed without considering the context. Our new case fiames are designed to follow the same protocol.</Paragraph>
    <Paragraph position="3"> There are three factors to consider at this point.  (1) How many and what kinds of case categories should be used? (2) In which order should the system compare the syntactic structure and the case categories in a case fl'ame? (3) What kind of restriction should we use?  In this paper, we will deal mainly with the first two factors. Our solution is to use a discrimination tree for the case-flame representation and a statistical algorithm for learning. The necessary case categories are selected and stacked in a tree form, one by one, according to their contribution to the translation selection. We call the obtained tree the case-flame tree. Fig. 3a is an example of a case-frame tree for take.</Paragraph>
    <Paragraph position="5"> Fig. 3b: Linear Case Franles for Fig. 3a Comparison with the syntactic structure is made fi'om the root node to the leaf nodes of the case frame-tree and no backtracking is allowed. The comp,'u'ison is executed deterministical\[y. If we read the tree fiom the root to the leafs, it can be expanded into a linear ease fiame, as shown in Fig.</Paragraph>
    <Paragraph position="6"> 3b. This increases the intelligibility of the case-fiame tree enabling a human lexicographer to evaluate it from a linguistic viewpoint.</Paragraph>
  </Section>
  <Section position="5" start_page="727" end_page="728" type="metho">
    <SectionTitle>
3 Learning from the PCFT
</SectionTitle>
    <Paragraph position="0"> A case-fralne tree can be regarded as a decision tree.</Paragraph>
    <Paragraph position="1"> l)ecision-lree learning has a long research history and many algorithms have been developed. Among them, the ID3 group (Quinlan J., 1993) of programs and its descendants satisfy our solution in Sec. 2. We apply the latest program, C4.5 (Quinlan J., 1993), to our problem. This algorithm learns a decision tree from an attribute-value and class table.</Paragraph>
    <Paragraph position="2"> An exatnple of such a table is shown in Table 1.</Paragraph>
    <Paragraph position="3"> Tal)le 1: Example nf a Primitive Case Franle Table SN V ON PN PNc translation I take him to theater you take him to school you take him to park you take box to theater you take box to park I take box to school</Paragraph>
    <Paragraph position="5"> The first row of the table represents the attributes or the case categories. The values of the attribntes arc the restrictions of the case categories. Word forms are used in this.</Paragraph>
    <Paragraph position="6"> Since the algorithm produces a case-liame tree fi'om this table, we term the table a &amp;quot;Primitive Case-fl'ame Table (PCIq').&amp;quot;  The (;4.5 first puts all translations listed in the PCI:f under a root node then recursively selects one case category and pmtitions the translations according to the word forms of the selected category. For the case category selection, a criteria based on the entropy reduction of translations gained by the partitioning ix used. See (Quinlan J., 1993) for more details. In a word, this algorithm places case categories from the root node to the leaf nodes according to the category's ability for translation discrimination. The case-frame tree in Vig. 3a was produced fl'om Table 1. It does not have a node corresponding to a subject. This simply means the subject information is redundant in selecting the translation o1' take in 'fable I.</Paragraph>
  </Section>
  <Section position="6" start_page="728" end_page="728" type="metho">
    <SectionTitle>
4 Data Preparation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="728" end_page="728" type="sub_section">
      <SectionTitle>
4.1 Construction of the Bilingual Corpus
</SectionTitle>
      <Paragraph position="0"> As mentioned in Sec. l, the data for nmchine learning is prepared in two steps: construction of a bilingual corpus and its conversion into a PCITF. l&amp;quot;ollowing are the factors consklered and the steps taken to put together our corpus.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="728" end_page="728" type="metho">
    <SectionTitle>
* Sollrce
</SectionTitle>
    <Paragraph position="0"> Since we couM not find a readily awfilable bilingual corpus from the news domain, we decided to make one ourselves by using the Associated Press (AP) wire service news text and adding a Japanese translation to it.</Paragraph>
  </Section>
  <Section position="8" start_page="728" end_page="729" type="metho">
    <SectionTitle>
* Target
</SectionTitle>
    <Paragraph position="0"> We selected 15 verbs known to be problematic verbs for maclfine translation: come, get, give, go, make, take, run, call, cut,fi, ll, keep, look, put, stand, and turn.</Paragraph>
    <Paragraph position="1"> Since case frames correspond to simple sentences, we did not deal with long sentences. The maximum sentence length was set at 15 words.</Paragraph>
    <Section position="1" start_page="728" end_page="728" type="sub_section">
      <SectionTitle>
* Quantity of Data
</SectionTitle>
      <Paragraph position="0"> To estimate the necessary amount of data, we investigated the monthly frequency of each verb appearing over six months. The \[}equency showed a fixed tendency over the measurement periods, suggesting that the data for one month ix a good starting point. We decided to use two months, January 1990 and January 1991, for the English  sentence extraction.</Paragraph>
      <Paragraph position="1"> deg Construction (1) Preparing the English text  Sentences up to 15 words long which contain one or more of the 15 target verbs were autonmticatly extracted fi'om Ihe two-month AP sonrce text.</Paragraph>
      <Paragraph position="2"> (2) Identifying the range governed by the verh The range which the target verb directly governs in the English text was manually identified. The two lines starting with FNG in Fig. 4 are an example.</Paragraph>
      <Paragraph position="3">  (3) Constructing the English case data  The a priori-defined category labels for each part of the ENG data were manually marked and the head word and functional word in each category were identified. The lines stm'ting with CASF, in Fig. 4 correspond to this data. We had defined 34 category labels beforehand. Twelve of them (sentence category labels) were assigned to verbs to identify the sentence category from which the verb was extracted. Example categories are: V (declarative sentence), PVQ (polm question), IMV (imperative sentence), PASV (passive sentence), and IV (to-infinitive clause). Twenty-two of the category labels (case category labels) identify the surface cases or the syntactic categories of other compo~ nents in the sentence, l,~xamples are: SN (subject noun clause), SIN (subject to-infinitive clause), and PN (prepositional phrase Imodifying the target verb\]). (4) Constructing the Japanese data Japanese translations were assigned to each of the F, nglish head words and functional words. When translation was not possible simply reading the English sentence, its context was given to the translators. The two lines starting with JAP in Fig. 4 show the translations.</Paragraph>
      <Paragraph position="4"> The complete corpus took about 12 nmn-months of labor to construct. Table 2 shows the corpus statistics for seven verbs. Row (2) shows the percentage of sentences thal required the context for translation. This figure indicates the limitations of manual translation without context. Most of these sentences had pronotms like it and the translators needed the context to clarify the referents.</Paragraph>
    </Section>
    <Section position="2" start_page="728" end_page="729" type="sub_section">
      <SectionTitle>
4.2 Conversion into a PCFT
</SectionTitle>
      <Paragraph position="0"> The bilingual corpus must be converted into a PCVF be- null come get give go make (1) 795 867 635 1204 1024 (2) 3.4% 5.2% 4.1% 3.7% 6.6% (3) 782 849 637 941 1020 (1) Number of English sentences run take (2) Percentage requiring context to translate 440 1062 (3)Number of obtained quadruplets 6.0% 4.0%  fore a case fiame can be learned. We can now directly control the information used lbr learning. We followed the principals below.</Paragraph>
      <Paragraph position="1"> * Develop one case-fi'ame tree fi'om each sentence category This was intended to observe how the sentence category affects the appearance of case-frame trees. * Use all case categories in the corpus as attributes This was to select effective case categories without any bias.</Paragraph>
      <Paragraph position="2"> * Use head words and functional words as values for case categories These words are the primary elements representing each case category so it is reasonable to use them as the value.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="729" end_page="729" type="metho">
    <SectionTitle>
5. Case-frame Tree Learning Experiments
</SectionTitle>
    <Paragraph position="0"> Several learning experiments were conducted on the PCFT obtained from each sentence category of the target verbs. Complete results fiom the experiments are not presented here due to space limitations. Table 3 shows the statistical results for seven verbs.</Paragraph>
    <Paragraph position="1">  (4) Number of translations (class size) (5) Number of case categories appearing in the case-frame tree (6) Error rate when the tree was used to re-classify the training data  We are now increasing the corpus for give, make, and take by 4,000 sets.</Paragraph>
    <Paragraph position="2"> Translations occun'ing less than ten times were not included in the PcIq' for this experiment. The overall error rate in Table 3 was quite low. Part oflhe take tree is shown in Fig. 5. The figures at the end of each line show the result of the reclassification of the training PCIq&amp;quot; by the learned tree: (nnmber o1' data items which fell on this leaf/number of errors, if any). As is shown, the case-frame tree is highly intelligible.</Paragraph>
    <Paragraph position="4"/>
  </Section>
  <Section position="10" start_page="729" end_page="729" type="metho">
    <SectionTitle>
* Similarity
</SectionTitle>
    <Paragraph position="0"> The number of case categories actually used in the easeflame tree was drastically smaller than the number used in the PCFF, ( row (3) vs. row (5) of Table 3). In the case-fi'ame tree tbr lake, for example, the following case categories were used: AX (adverb equivalents), D (adverbial particles), ON (object noun clause), SIN ( subject to-infinitive chmse), and SN (subject noun clause). The top node, i.e. the most important node, became D, the adverbial particle, following the description in an ordinary dictionary. Most of these syntactical categories are usually used to describe the verb patterns in ordinary dictionaries. The case-frame tree basically duplicates the verb patterns found in an ordinmy dictionary.</Paragraph>
  </Section>
  <Section position="11" start_page="729" end_page="730" type="metho">
    <SectionTitle>
* Precision
</SectionTitle>
    <Paragraph position="0"> From the line marked A in Fig. 5. the translation became kakaru (need time) under the condition of (ON=0) though lake is usually used as a transitive verb, so the lack of an object noun looks nnnatural; this part of the tree, however, corresponds to time expressions like &amp;quot;take long&amp;quot; and &amp;quot;take awhile&amp;quot; which do not have object nouns. This is reasonable learning.</Paragraph>
    <Paragraph position="1"> From the line marked B, the idiomatic expression &amp;quot;take  pm't in&amp;quot; was learned as &amp;quot;take part.&amp;quot; The word in was judged to be redundant and thus an ineffective element. While our corpus did reveal one example thal did not have in it still had the same translation: &amp;quot;sanka suru.&amp;quot; Ttfis learning is more precise than the description in an ordinary dictionary.</Paragraph>
  </Section>
  <Section position="12" start_page="730" end_page="730" type="metho">
    <SectionTitle>
* Complementary learning
</SectionTitle>
    <Paragraph position="0"> The lines marked C in Fig. 5 show an exan@e of what we call complementary learning. The case-frame tree surprisingly distinguished &amp;quot;kakutoku stlrtf' (will) from &amp;quot;okolmwa reru&amp;quot; (happen). The former was learned from &amp;quot;lake ttfird place.&amp;quot; The latter corresllonds to an idiomatic expression, &amp;quot;take place&amp;quot;. Tim way tile algorithm learns is tmiquc. The key to discrimination was found in SN, the subject noun, which sounds reasonable. Discrimination is done in terms of the subject's nature: person vs. actiou notlll, llowever, this could also be distinguished by the existence of the modifier to place, since in the idiomatic sense, no modification is allowed between take andplace. In our PCbT, moditiers were not iocluded and the system found complementary knowledge to distinguish the translations. The same phenomenon was fotmd in many paris of the flees. The learning algorithm does its best to sub-categorize the translations within the given case categories. While this can yiekl linguistically-skewed case frames, tttey are still effective, at least in the corpus.</Paragraph>
    <Paragraph position="1"> * Differences among sentence categories The results flom other sentence categories had a mttch different appearance. Trees for make and take which were obtained from the PCFT for tile to-infinitive chmse contained only one case category, ON (object noun clause).</Paragraph>
    <Paragraph position="2"> The case categories effective ill the declarative sentence, like the adverbial particle, were not effective for this sentence category. This strongly suggests that translations should be selected by using lhe case frames for the sentence type.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML