File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/87/e87-1023_intro.xml

Size: 17,481 bytes

Last Modified: 2025-10-06 14:04:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="E87-1023">
  <Title>A MODEL FOR PREFERENCE</Title>
  <Section position="3" start_page="0" end_page="137" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper we address the problem of choosing the best solution(s) from a set of interpretations of the same text segment (For the sake of brevity, throughout this text we use the term interpretation, where in fact we should write representation of an interpretation). Although developed in the context of a machine translation system (the Eurotra project, Arnold 1986, Arnold and des Tombe 1987), we believe that our approach is suited to many other fields of computational linguistics and even outside (pattern recognition, etc.).</Paragraph>
    <Paragraph position="1"> After a brief overview of the problem (section 2), we suggest a general method to deal with preference (section 3) and then describe a possible implementation (section 4). An appendix gives actual examples of preference statements.</Paragraph>
    <Paragraph position="2"> 2. What is preference? In the computational linguistics literature, the term 'preference' has been used in different contexts. We shall mention a few, selectively, (in section 2.1 which may be skipped) and then state our own view (in section 2.2).</Paragraph>
    <Section position="1" start_page="0" end_page="134" type="sub_section">
      <SectionTitle>
2.1. Various approaches
</SectionTitle>
      <Paragraph position="0"> Preference strategies have often been used for dealing with the problem of ill-formed input (a particular case of robustness, cf below section 2.2) (AJCL 1983, Charniak 1983). Following Weischedel and Sondheimer (1983) we distinguish the cases  where preference is part of the particular computation being performed (Wilks 1973, Fass and Wilks 1983, Pereira 1985) from the case where it is a separate process, run after the results of the computation have been obtained (Jensen et al 1983, Weischedel and Sondheimer 1983).</Paragraph>
      <Paragraph position="1"> A frequent approach to preference is scoring. A numeric score is calculated, independently, for each competing interpretation and is then used to rank the interpretations. The best interpretations are then chosen. The score can be the number of constraints satisfied by the interpretation (Wilks 1973, Fass &amp; Wilks 1983), where these constraints might be assigned relative weights by the linguist (Robinson 1982, Charniak 1983, Bennett and Slocum 1985) or calculated by the computer (Papegaaij 1986). Such techniques have been used extensively for speech recognition (Paxton 1977, Walker et al 1978) and in the field of expert systems (such as Mycin, Buchanan &amp; Shortliffe 1984), where the calculation of both score and ranking become quite complex with probabilities and thresholds.</Paragraph>
      <Paragraph position="2"> The problem with scoring is that it seems quite unnatural for a linguist to associate a score (or weight or probability) to a particular rule or piece of data when the knowledge being encoded is in fact qualitative. Furthermore, combining the scores based on different types of reasoning to calculate a global score for a representation seems a rather arbitrary procedure. Such a uniform metric, even if it can model actual linguistic knowledge, forces the grammar writer to juggle with numbers to get the behaviour he wants, thus making the preference process obscure.</Paragraph>
      <Paragraph position="3"> A further disadvantage of this approach is that the score is often based on the way interpretations are built, rather than on the properties of the interpretations themselves.</Paragraph>
      <Paragraph position="4"> Preference is also mentioned in a linguistic controversy started by Frazier and Fodor (1979) with their principles of right association and minimal attachment (Schubert 1984). There the problem is to disambiguate many readings (or interpretations) of a sentence in order to find the good (preferred) one(s). Various contributions on that issue have in common that bad interpretations are abandoned before being finished, during computation (Shieber 1983, Pereira 1985). Although this method speeds up the computation, there is a risk that a possiblity will be abandoned too early, before the relevant information has been found. This is shown by Wilks et al (1985) who claim to have the ideal solution in Preference Semantics, which uses as part of its computation scoring and ranking.</Paragraph>
    </Section>
    <Section position="2" start_page="134" end_page="136" type="sub_section">
      <SectionTitle>
2.2. Our notion of preference
</SectionTitle>
      <Paragraph position="0"> Our approach, although stemming from earlier work in the Eurotra project (McNaught et al 1983, Johnson et al 1985), is, we believe, new and original.</Paragraph>
      <Paragraph position="1"> We make the following assumptions: i the relation 'translation of' between texts as established by a machine translation system has to be one to one (1-1)? ii There is apriori no formal or linguistic guarantee that this will be the case for the relation as a whole or for the translation steps between intermediate levels of representation. (An attempt to formalize this can be found in Krauwer and des Tombe 1984 or in section 4 of Johnson et al 1985).</Paragraph>
      <Paragraph position="2"> The problem we want to address here is the following: Given the fact that one to many (l-n) translations do occur, how do we ensure that the final result is still I-1.</Paragraph>
      <Paragraph position="3"> This problem is not restricted to machine translation: Often a program (for example a parser or a text generator) produces many interpretations of the same object (usually a text segment) when in the ideal case only one is wanted. In the following we refer to a 'l-n translation' for this general phenomenon.</Paragraph>
      <Paragraph position="4"> We see two types of solutions to this problem, each of them applicable to specific classes of cases: i Spurious results can be eliminated on the basis of their own individual properties (e.g. well-formedness, completeness); for this we will use the term 'filtering'.</Paragraph>
      <Paragraph position="5"> ii Spurious results can be eliminated via comparison of competing representations, where only the best one(s) will have the right to survive; for this we will use the term 'preference'.</Paragraph>
      <Paragraph position="6"> It is important to note that we restrict ourselves to reducing l-n translations to (ideally) i-i. We will assume that the 'good' translation is one of the candidates. The problem of forcing the system to come up with at least 1 translation (i.e. do something about possible 1-0 cases) will not be addressed here. In order to avoid confusion we will use the term 'robustness' to refer to this type of problem. We are aware of the fact that we deviate slightly from the standard use of the term preference.</Paragraph>
      <Paragraph position="7">  There are two main types of l-n -ness: i linguistically motivated (i.e. real ambiguity in analysis, or true synonymy in generation).</Paragraph>
      <Paragraph position="8"> ii accidental, caused by overgeneration of the descriptive devices that define the resulting (or intermediate) interpretations. null Note that overgeneration and ambiguity or synonymy may hide cases of undergeneration (cf the robustness problem).</Paragraph>
      <Paragraph position="9"> We define the application of preference as the selection of the best element(s) from a set of competing interpretations of the same object.</Paragraph>
      <Paragraph position="10"> According to this definition the scoring and ranking mechanism described in the previous section is a case of preference. In the rest of this paper we will describe a preference device that is different from the scoring and ranking mechanism in the sense that it is not based on the way interpretations are built, but rather on linguistic properties Of the objects themselves. Its main characteristics are that: it applies to complete and sound (well formed) interpretations only. That is, all the other modules of construction, transformation and filtering have been applied (Ex: parsing, Wh-movement, etc). Thus, for these modules all competing representations are equivalent, and all the information needed for comparing them has been found.</Paragraph>
      <Paragraph position="11"> ii it is based on pairwise comparison between alternative (competing) interpretations of the same object.</Paragraph>
      <Paragraph position="12"> The problem can then be stated as follows: null How do we make use of the linguistic knowledge in order to insure a i-i translation? null It is our basic belief that it is impossible for the linguist to know the exact nature of a class of competing interpretations in advance. This implies that he cannot in general formulate one single rule that picks out the best one.</Paragraph>
      <Paragraph position="13">  - It should be possible to make (linguistic) statements of the type: if representation A has property X, and B property Y, then A is to be preferred over B (e.g. 'in law texts declarative sentences are better than questions', or 'sentences with a main verb are better than sentences without one').</Paragraph>
      <Paragraph position="14"> - On the basis of a set of such statements it should be possible to establish a partial order over the set of competing representations.</Paragraph>
      <Paragraph position="15"> - And in that case the number of candi null dates can be reduced by, for example, letting only the maximal elements survive, or discarding the minimal ones.</Paragraph>
      <Paragraph position="16"> 3.2. Problems with the method The first (but least serious) problem is that it is not certain that linguists will always be able to make such statements (we will call them 'preference statements') over pairs of representations. Experimentation is necessary. The second one is more serious: it would be highly unrealistic to expect that the result of applying of the preference statements will be a linear order, in fact there is not even a guarantee that the order will be partial. In general the outcome will be a directed graph. There are three ways of tackling this problem: The linguist should try to make the set of preference statements homogeneous and constrained, and should have control over the way in which they are applied, so that he can avoid contradictory statements.</Paragraph>
      <Paragraph position="17"> ii One tries to make a formal device that checks whether contradictions can Occur.</Paragraph>
      <Paragraph position="18"> iii One tries to compare pairs of competitors in a specific order such that it can be guaranteed that the result is always a partial order.</Paragraph>
      <Paragraph position="19"> At the moment (iii) is the most feasible, (ii) the most ambitious, and (i) the most desirable solution. Currently we envisage a combination of (i) and (iii).</Paragraph>
      <Paragraph position="20"> The third problem is that of the maximal elements. Ideally there would be just one maximal element, i.e. the preferred representation. This cannot be guaranteed to be true.</Paragraph>
      <Paragraph position="21"> The problems sketched here are by no means trivial. That is why we want to experiment with a first implementation of this method, to identify the various relevant parameters in the specific context of Eurotra.</Paragraph>
      <Paragraph position="22"> 4. The proposed implementation The implementation proposed here is described in very general terms, and can  be adapted for a wide range of applications. We give in the appendix some commented examples specific to our particular context.</Paragraph>
    </Section>
    <Section position="3" start_page="136" end_page="137" type="sub_section">
      <SectionTitle>
4.1. Preference rules
</SectionTitle>
      <Paragraph position="0"> Preference statements are expressed by the user in the form of rules (preference rules). There are three types of preference rules: simple rules, Dredefined rules and composite rules. A preference rule applied to two representations of interpretation tries to decide which one is better than the other (preferred to the other). It is not guaranteed that a rule can always take a decision.</Paragraph>
      <Paragraph position="1"> A simple preference rule is of the form</Paragraph>
      <Paragraph position="3"> The name of the rule is p, and Patternl and Pattern2 are current patterns. When given two arguments (two representations or subparts) A and B (written p(A,B)) the system will try to match Patternl with A and Pattern2 with B. If this succeeds then A is better than B (or A is preferred to B or A&gt;B). If it fails then the system will try to match A with Pattern2 and B with Patternl. If this succeeds then B is better than A.</Paragraph>
      <Paragraph position="4"> Predefined rules are provided for the cases where simple rules cannot express some useful basic preference statement.</Paragraph>
      <Paragraph position="5"> For example, in our actual implementation (cf appendix), two Dredefined rules say that a tree structure with fewer (more) branches than the other is to be preferred to one with more (fewer) branches. This cannot be expressed with the particular language for patterns.</Paragraph>
      <Paragraph position="6"> A composite preference rule is of</Paragraph>
      <Paragraph position="8"> and SV, $W, $X, $Y, ... are variable identifiers, that should also occur in Patternl ($V,$X) and Pattern2 ($W,$Y) where they identify sub-parts of the interpretations. When given two arguments A and B, the system tries to match A with Patternl and B with Pattern2. If this succeeds, the variables SV,$X,.. occurring in Patternl and SW,$Y .... occurring in Pattern2 are instantiated to sub-parts of A and B respectively. Then the system tries each preference rule of the list, with the instantiated arguments, till one rule can decide. In this case the relationship holding between A and B is the same as that holding between the sub-part of A and the sub-part of B. If no rule of the list can decide then preference is not decided.</Paragraph>
      <Paragraph position="9"> If the initial match doesn't succeed, then an attempt will be made to match A with Pattern2 and B with Patternl. If this succeeds the system tries the rules of the list in the same way as above. Composite preference rules allow recursion.</Paragraph>
      <Paragraph position="10"> This formalism is very much inspired by the programming language Prolog: a preference rule is analogous to a three argument predicate (two interpretations and the resulting relationship), a simple rule to an assertion, and a composite rule to a clause with sub-goals.</Paragraph>
      <Paragraph position="11"> 4.2. General algorithm Initially, all competing objects are in the set of non ordered objects N and the set of ordered objects O is empty. Then, the following is repeated until N is empty: an object is removed from N and is compared to each object of O (if any), then it is added to O.</Paragraph>
      <Paragraph position="12"> This algorithm does not ensure that the resulting directed graph of preference relationships among the competing objects has no cycle. Anyway, maximal (minimal) elements can be defined in the following way: An object E is a maximal (minimal) element if no competing object is better (worse) than E.</Paragraph>
      <Paragraph position="13"> Thus an object in a cycle of the graph cannot be maximal (minimal).</Paragraph>
      <Paragraph position="14"> To give the user control of how rules are tried on the competing objects, only one distinguished rule is applied to each competing pair. In the general case it should be a composite rule that just passes its two arguments to the rules of the list, thus ensuring that only these rules are tried and in that order.</Paragraph>
      <Paragraph position="15"> The pattern matching mechanism of composite rules is quite powerful. (see also the appendix): It allows some preferences rule to be applied only to selected objects (satisfying a precondition). It also allows (recursive) exploration of sub-parts of representations (a derivation tree for example), in parallel or not.</Paragraph>
      <Paragraph position="16"> Finally it enables the user to give priority to some preference rules over some others.</Paragraph>
      <Paragraph position="17"> 4.3. Problems with the implementation Although we decided that this model is good enough for preliminary experimentation, certain problems are already apparent: - The system takes arbitrary decisions in the case of a contradiction, that is if  some rule can be applied to a pair of arguments in both orders (if p(A,B) and p(B,A) are both possible). In particular a preference decision should not be taken between identical objects.</Paragraph>
      <Paragraph position="18"> - Infinite recurs!on can occur with ctmposite preference rules.</Paragraph>
      <Paragraph position="19"> - Maximal (minimal) elements may not exist in the resulting graph of preference relationships (for example if all elements are in a cycle).</Paragraph>
      <Paragraph position="20"> - Arbitrary decisions may be taken if the patterns allow multiple matches: the current model will stop with the first match that produces a decision.</Paragraph>
      <Paragraph position="21"> Currently it is the user's responsibility to avoid these problems by writing &amp;quot;sensible&amp;quot; rules. In the next section we sketch some possible solutions that are considered for a future implementation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML