File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/88/c88-2133_abstr.xml
Size: 15,554 bytes
Last Modified: 2025-10-06 13:46:35
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-2133"> <Title>S|,M~TIC AND Sg~A6TIC ~1~ OF ~ FUNCTION</Title> <Section position="1" start_page="0" end_page="643" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> In a Machine Translation System (MTS), the number of possible analyses for a given sentence is largely dve to the ambiguous characteristics of the source language. In this paper, a mechanism, called &quot;Score Function&quot;, is proposed for measuring the &quot;quality&quot; of the ambiguous syntax trees such that the one that best fits interpretation by human is selected. It is featured by incorporating the objectiveness of the probability theory and the subjective expertise of linguists. The underlying uncertainty that is fundamental to \]inguistic knowledge is also allowed to be incorporated into this system. This feature proposes an easy resolution to select the best syntax tree and provides some strategic advantages for scored parsing.</Paragraph> <Paragraph position="1"> The linguists can also be relieved of the necessity to describe the language in strictly &quot;correct&quot; linguistic rules, which, if not impossible, is a very hard task.</Paragraph> <Paragraph position="2"> Motivation In a Machine Translation System (Mrs), where the underlying grammar is large, there are many sources which may cause the system to become highly ambiguous.</Paragraph> <Paragraph position="3"> The system must choose a better syntax tree among all the possible ones to reduce the load of the posteditor. Some systems attack this problem by arranging the gram,~r rules in a descending order of their relative frequency, following the .parsing paths in a depth-first manner, and selecting the first syntax tree successfully parsed as the desired one. However, rule ordering is just a locally preferred static scoring of the rule usage. Therefore, the possibility is small that the first tree selected is the correct one.</Paragraph> <Paragraph position="4"> Several MT systems based on the ATN formalism \[Wood 70\] adopt another approach. They impose condition cheeks to prevent the parser from trying all possible states allowed by the underlying grammar. This approach has been widely accepted and is useful in eliminating the unnecessary trials. However, there are times when legal paths are blocked inadvertently by condition checks. Therefore, the system must be tuned fre.~luently to achieve an equilibrium between the over-generative grammar and the over-restrictive condition checks. This kind of &quot;hard rejection&quot; is obviously too variant and too restrictive.</Paragraph> <Paragraph position="5"> A better solution is to adopt the &quot;Truncation Strategy&quot; (proposed by \[Su 87a, 87b\] for biT system) to restrict the number of parsing paths to be tried according to the relative preference of all the possible paths. The measuring mechanism of preference for the truncation strategy is called the &quot;Score Function&quot;. It bears similaritY to the select-by-preference found in other scored MT systems like the DIAGPCLM grammar system \[Robi 82\] and METAL system \[Benn 82\].</Paragraph> <Paragraph position="6"> Under a scoring mechanism, the parsing paths are not rejected because of the over-restrictive condition checks but rather for their low scores. This kind of &quot;soft-rejection&quot; prevents legal path from being blocked too early because of unsuitable condition checks. Different scoring mechanisms may be required at lexicon, syntax and semantics levels, and score can be computed during parsing or after parsing. In this paper, we propose an approach to the semantic and syntactic aspects of the score function.</Paragraph> <Paragraph position="7"> Criteria for Score Function In order to define a reasonable score function, it is essential to set up some criteria first. Eight basic criteria are listed here.</Paragraph> <Paragraph position="8"> \[ I\] 'l~le score function should reflect the absolute degree of preference of two ambiguous (sub)trees as well as their relative preferences.</Paragraph> <Paragraph position="9"> \[2\] A good score function should be applicable either locally to a subtree or globally to a complete tree.</Paragraph> <Paragraph position="10"> \[3\] The Score function should be compositional. This means the score of a tree should be directly evaluated from the scores of its constituent subtrees. null \[4\] Relative rule application frequency should be included in the score function. The rule that is used most frequently should receive a higher preference.</Paragraph> <Paragraph position="11"> \[5\] The score function should also include the semantic information embedded in the sentence, so that the semantic preference can be involved in the score function. (Since our present translation unit is a single sentence, no discourse information need to be included) \[6\] The implementation of the score function should not be too complicated. In our case, it should be practical for a large-scale bit system.</Paragraph> <Paragraph position="12"> \[7\] The database for score computation should be easy to build and easy to maintain.</Paragraph> <Paragraph position="13"> \[8\] The preference order of ambiguous trees assigned by the score function should match those assigned by the human. In addition, the way the scores are given had better match the way that people give their preference to the ambiguous trees. (i.e. how people recognize the true meaning of a given sentence from several different interpretations) Keeping these criteria in mind, we define a score function as follows. The score function for a subtree Xo, with derivation sequence D of Xo(i,j) =D=></Paragraph> <Paragraph position="15"> In the above, Xo(i,j) is a subtree made up of terminals X1 to Xnl i to j are the word index in the sentence; and SCORE is the score of the subtree Xo.</Paragraph> <Paragraph position="16"> SCsyn is the ttnweighted syntax score. SCsem is the semantic weighting. KI is defined as the knowledge about the inherent properties of the nodes. And KC is the well-formedness condition, either syntactic or semantic, of the Xi under the given syntactic construction. To decrease the computational complexity, we can convert this multiplication equati'on into an addition equation with logarithmic entries.</Paragraph> <Paragraph position="18"> In order to obtain the score without excessive c~irputation and complicated algorithal, the probability model is probably one of the most c~n and promising approach. Under this approach, the preference measurement in a scoring mechanism can be seen as a probability assigDment. The best syntax tree should be the one with highest preference probability assigned to it. This probability model c~m be divided into two parts. One is the syntactic score model, which is SCsyn, an~ the other is the semantic score model, which is SCsem. The syntactic score model uses the syntax probability as the base to generate an unweighted syntactic score for each syntax tree. The semantic ~:core model then supplements the unweighted score witl~ weights derived from the semantic blowledge. Incorporation of semantic information is essential for a good score function because pure synt~tx probability can only provide partial information for sentence preference.</Paragraph> <Paragraph position="19"> Syntactic \[~zore Model For a syntax tree given below, we define a phrase level as a sequence of terminals and nonterminals that are being reduced at a single step of &quot;derivation, or reduction sequence&quot;. The following example shows the reduction sequence of a bottom-up parsing. The sequence iE: indicated by the time series t\] .... t7 .</Paragraph> <Paragraph position="21"> 'i~e unweighted score for this tree A is modeled as the following conditional probability.</Paragraph> <Paragraph position="23"> An assumption was made in the above equation. We assumed terms like P(Xi:Xi-l, Xi-2, ... Xl) can be simplified into P(XilXi-l). This is reasonable because at phrase level Xi-I it will contains most of the information percolated from lower levels and needed by Xi. So, extra information needed by Xi from Xi-2 is little. We completed a simulation for testing this m(xlel and also conducted several tests on the context sensitivity of this probability model. First, we checked whether a left context (i.e. L) is relevant to the probability assignment. Using the P(X3:X2)=P(E}D,w2,w3,w4) as an example, with D as the left context of t/~e current derivation symbol w2, we checked if P(X31X2)=P(E:D,w2) is true? We also checked whether a )right context (i.e. R) has influence on the assigrmlent ~ Or is P(X3 :X2 ) =P (E: w2, w3) true? Other test cases are LL, LR, RR, LRR, LLR, LLL, RRR, LLRR and LLLR.</Paragraph> <Paragraph position="24"> Semsntic ~re Model The weight-assigning process of the semantic score can /~ seen as an expert task whs'~ the linguist is giving ~he syntax tree a diagnosis. The linguist, will assign a preference to a tree according to some linguistic knowledge or heuristic rules. Very often these linguistic rules are not very precise. Therefore, a good semantic score model must allow this type of inexact knowledge. Now, the problem is transformed into building a rule-based expert system that can calculate sem~mtic scores (weightings) and handle inexact knowledge encountered during calculation. We propose a model similar to the CF model (certainty factor model } in MYCIN \[ Buch 85 \] system. It has a knowledge-rule base where each rule has a certainty factor based on the degree of belief and disbelief.</Paragraph> <Paragraph position="25"> The confirmation of a hypothesis then is calculated from the applicable rules and from other pieces of evidence. The CF of' a hypothesis is then accumulated gradually with each additional evidence.</Paragraph> <Paragraph position="26"> Each tree node will have a we\]l-formedness factor (WFF), which is the CF for the derivation of this node, associated with it. As the knowledge, which may contain the word sense, syntactic category, attribute, etc., of leaf nodes propagates up along the syntax structure, every node's WFF will be calculated according to the rules stored in the lu~ow\]edge rule-base. This WFF then becomes the semantic score of the subtree. null</Paragraph> <Paragraph position="28"> where derivation sequence D : Xo =D=> XI, .. Xn.</Paragraph> <Paragraph position="29"> There are three major advantages of this scheme.</Paragraph> <Paragraph position="30"> First, linguists do not have to write a single exact rule to include all possible exceptions, because CF are given in accordance with its degree of confirmation or disconfirmation. When an exception appears, all that needs to be done is to add necessary rules and alter CF of certain existing rules. Second, the CF model simplifies the implementation of &quot;soft-rejection&quot; for inexact knowledge. For example, conditions (like those in A'I~) can be included for disambiguation even if it is not absolute in its generality. C/l%lird, we can combine various traditiorml techniques in analyzing semantics with CF model to construct a uniform and flexible control strategy. This allows the inclusion of uncertain factors like sen~mtic marker of lexicon, assignment of case ro\]e \[from case gra/mnar), and restriction of case filler. Under this control strategy, word sense disamb~guation and struc-ture disambiguation are also possible. The relative preference will be given accerding to the CF associated with different word sense and by the \] ~nguistJc rules from the knowledge base.</Paragraph> <Paragraph position="31"> All in all, with the score function defined as above, it satisfies all eight criteria we had set initially and it is a good systematic approach for assigning references to a set of ambiguous trees.</Paragraph> <Paragraph position="32"> Simulation l~esult A simulation, based on \[408 source sentences, was conducted to test the syntactic score mode\]. The pro~ bability assigned to the entries ,e.g. P(E:w2,w3), in the SCsyn equation is estimated with the relative frequency of these entries. That is, we approximate P(E :w2,w3 ) by the ratio of the number of events {E,w2,w3} in the database and the number of events {w2,w3}. Several tests are conducted to check the influence of the context on the probability assignment. These tests include L, R, LL, LR, RR, LLL, LLR, LRR, RR~, LLRR and LLLR. Table 1 is some of the result of the simulation using sentences in the data-base as the test inputs.</Paragraph> <Paragraph position="33"> The number of entries in the t~ble is the number of different conditional probability, e.g. P(Elw2,w3), in the database. F~ch entry is assigned a probability according to its usage frequency as we explained before. The preference of a tree is the parmneter that we want to estimate from these entries. If the size of database is not large enough then these probability can not be approximated by the relative frequency. In general, as %.he size of a database increases so is the accuracy of approximation. But how big should the dai~base be is diffJeu\]t to determine. This leads us to built two databases, one having 1468 source sentences and the other having 820 sentences. If the simulation result from different base is close then we may assume that the databdlse size is large enough.</Paragraph> <Paragraph position="34"> Comparing the results from these two databases, its is apparent that the size is adequate for the present simulation. Furthermore, it is also apparent that a context-sensitive scoring function must be adopted for a good preference estimation.</Paragraph> <Paragraph position="35"> Two conclusions can be drawn from this simulation resu\]t. First, we should adopt three constituents in c~Iculating the probability. The reason is that although the result of LLRR case is better than that of LRIt case, the size of entries required by LLRR is considerab\]e greater. Second, approximately 85% of syntax trees is accurately selected with only syntactic information available. Therefore, if we want to improve this result further we must include the semantic information.</Paragraph> <Paragraph position="36"> Conclusion and Perspective In a Machine Translation System, to reduce the load of the post-editor we must select the best syntax tree from a set of ambiguous trees and pass it to the post-editor. There are systems that rely on a set of ordered grammar rules or on a set of restrictive condition checks to achieve this. Unfortunately9 they all }lave some drawbacks: one being too uncertain and the other being too restrictive. In this paper we have proposed a score mechanism for the truncation strategy to perform disambiguation during parsing. The score function, with the adoption of three context symbols, gives the power of context-sensitive grammar to an efficient context-free parser. From our simulation, the score function with just syntactic information will achieve an accuracy rate of 85%. In the near future when the semantic information is included, this accuracy rate is expected to increase. Currently, two databases, one for unweighted score computation and the other for linguistic rule base (for weighting assignment), are under the development at the BTC R&D center. After completion they will be incorporated into the truncation parsing algorithm for our third generation parser.</Paragraph> </Section> class="xml-element"></Paper>