File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/p93-1004_evalu.xml

Size: 8,555 bytes

Last Modified: 2025-10-06 14:00:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="P93-1004">
  <Title>STRUCTURAL MATCHING OF PARALLEL TEXTS</Title>
  <Section position="9" start_page="27" end_page="27" type="evalu">
    <SectionTitle>
EXPERIMENTS
</SectionTitle>
    <Paragraph position="0"> We have tested the structural matching algorithm with 82 pairs of sample sentences randomly selected froln a Japanese-English dictionary.</Paragraph>
    <Paragraph position="1"> We used a machine readable Japanese-English dictionary (Shimizu 79) and Roget's thesaurus (Roget 11) to measure the silnilarity of pairs of content words, which are used to define the fimctiou f.</Paragraph>
    <Paragraph position="2"> Similarity of word pairs Given a pair of Japanese and English sentences, we take two methods to lneasure the similarity between Japanese and English content words appearing in the sentences.</Paragraph>
    <Paragraph position="3"> For each Japanese content word wj apl)earing in the Japanese sentence, we can find a set of translatable English words fl'om the Japanese-Ellglish dietionary. When the Japanese word is a. polysemous word, we select an English word fi'om each polysemous entry. Let CE\] be the set of such translatable English words of wj. Suppose CE is the set of contents words in the English sentence. The translatable pairs of w j, Tp(u u), is de.fined as follows: Tp(wj) = {(wj,'wE) \['we E CE., n C.'L,} We use Roget's thesaurus to measure similarity of other word pairs. Roget's t.hesaurtls is regarded as a tree structure where words are a.llocated at the leaves of the tree: For each Japanese content word 'wj appearing in tim Japanese sentence, we can define the set of translatable English words of wa, CEj. From each English word in the set., the minimum distance to each of the English content words appearing in the English sentence is measured. 5 This minimum distance defines the similarity between pairs of Japanese and English words.</Paragraph>
    <Paragraph position="4"> We decided to use this similarity only for estimating dissimilarity between Japanese and English word pairs. We set a predetermined threshold distance. If the minimal distance exceeds the threshold, the exceeded distance is counted as the negative similarity.</Paragraph>
    <Paragraph position="5"> The similarity of two words Wl and w2 appearing in the given pair of sentences, sim((wl, w~)), is defined as follows:</Paragraph>
    <Paragraph position="7"> and the distance between wl and w., exceeds the threshold by k.</Paragraph>
  </Section>
  <Section position="10" start_page="27" end_page="28" type="evalu">
    <SectionTitle>
0 otherwise
</SectionTitle>
    <Paragraph position="0"> Similarity of semi-complete DSs The similarity between corresponding semi-complete DSs is defined based on the similarity between the content words. Suppose that s and t are semi-colnplete DSs to be matched, and that Vs and Vt are the sets of content words in s and t. Let A be the less larger set of l~ and Vt and B be the other (I A I&lt;l B I). For each injection p from A into B, the set of word pairs D derived from p can be defined as follows.</Paragraph>
    <Paragraph position="1"> Now, we define the similarity fimction f over Japaaese and English semi-colnplete DSs to give the naa.xinmm value to the following expression for all possible injections:</Paragraph>
    <Paragraph position="3"> The summation gives the maximuna sum of the similarity of the content words in s and t. 0.95 is the penalty when the semi-complete DSs with more than one content words are used in the matching.</Paragraph>
    <Paragraph position="4"> Figures 1, 2 and 3 shows the results of the structural matching algorithm, in which the translatable pairs obtained fi'om the Japanese-English dictionary are shown by the equations.</Paragraph>
    <Paragraph position="5"> 5 The dlstaame between words is tile length of tile shortest path in the thesatu'us tree.</Paragraph>
    <Section position="1" start_page="28" end_page="28" type="sub_section">
      <SectionTitle>
Results of the experiments
</SectionTitle>
      <Paragraph position="0"> We used 82 pairs of Japanese and English sentences appearing in a Japanese-English dictionary.</Paragraph>
      <Paragraph position="1"> The results were checked and examined in detail by hand. Some of the sentences are not parsable because of the limited coverage of our current grammars. Although 59 pairs of them are parsable, 6 out of them do not include correct parse results.</Paragraph>
      <Paragraph position="2"> The structural matchi,lg algorithm with the setting described above is applied to the 53 pairs. The cases where the correct, matchilig is not included in the best rated answers are 6 out of them. The remaining 47 pairs include the correct matching, of which 31 pairs result in the correct matching uniquely. Tal)le 1 sumnaarizes tile results.</Paragraph>
    </Section>
  </Section>
  <Section position="11" start_page="28" end_page="29" type="evalu">
    <SectionTitle>
EVALUATION AND DISCUSSION
</SectionTitle>
    <Paragraph position="0"> Although the number of sentences used in tile experiments is small, the result, shows that about two third of the pairs give the unique matching, in which every syntactic ambiguity is resolved.</Paragraph>
    <Paragraph position="1"> The cases where no correct matching was obtained needs be examined. Some sentences contain an idiomatic expression that has coml)letely different syntactic structures fl'om the sentence structure of the other. Such an expression will 110 way be matched correctly except that the whole structures are matched intact. Other cases are caused by complex sentences that include an embedded sentence. When the verbs at the roots of the dependency trees are irrelevant, extraordinary matchings are produced. We intend not to use our method to match complex or compound sentences as a whole.</Paragraph>
    <Paragraph position="2"> ~,Y=e will rather use our method to find structural matching between simple sentences or verb phrases of two languages.</Paragraph>
    <Paragraph position="3"> Tile matching problmn of complex sentences are regarded as a different problem though the similar technique is usable. We think that the scores of matched phrases will help to identify tile corresponding phrases when we match complex sentences. null Taking the sources of other errors into consideration, possible improvements are:  1. Enhancement of English and Japanese grammars for wider coverage and lower error rate.</Paragraph>
    <Paragraph position="4"> 2. Introduction of more precise similarity measurement of content words.</Paragraph>
    <Paragraph position="5"> 3. Utilization of grammatical information:  voice, etc.</Paragraph>
    <Paragraph position="6"> The first two iml)rovements are undoubtedly important. As for the similarity measurement of content words, completely different approaches such as statistical methods may be useful to get good translatable pairs (Brown 90), (Gale 91).</Paragraph>
    <Paragraph position="7"> Various grammatical information is kept in the feature descriptions produced in the parsing process. However, we should be very prudent in using it. Since English and Japanese are grammatically quite different, some grammatical rela.tion may not be preserved between them. In Figure 3, solid arrows and circles show the correct matching. While 'benefit' matches with the structure consisting of ' ,~,,~ ' and ' ~_.~ ~ ', their dependent words 'trade' and ' H~:~' modify them as a verb modifier and as a noun modifier, the grammatical relation of which are quite different.</Paragraph>
    <Paragraph position="8"> This example highlights another interesting point. Dotted arrows and circles show another matching with the salne highest score. In this case, 'japan' is taken as a verb. This rather strange interpretation insists that 'japan' matches with ' H~ ' and ' .~ 6 '. Since 'japan' as a verb has little selnantic relation with ' \[\]:~ ' as a country, discrimination of part-of-speech seems to be useful. On the other hand, the correspondence between 'benefit' and ' ~,~ ' is found in their noun entry in the dictionary. Since 'benefit' is used as a verb in the  sentence, taking part-of-speech into consideration may jeopardize the correct matching, either. The fact that the verb and noun usages of 'benefit' bear common concept implies that more precise similarity measurement will solve this particular probleln. Since the interpretations of the sample English sentences are in different mood, imperative and declarative, the mood of a. sentence is also usefnl to remove irrelevant interpretations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML