File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1415_metho.xml
Size: 23,952 bytes
Last Modified: 2025-10-06 14:07:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1415"> <Title>An Empirical Analysis of Constructing Non.restrictive NP Modifiers to Express Semantic Relations</Title> <Section position="4" start_page="108" end_page="109" type="metho"> <SectionTitle> 2 Corpus annotation </SectionTitle> <Paragraph position="0"> To answer the first question, we annotated the MUSE corpus, from which we have observed three types of modifier uses in an NP: tions through NR constructions, which is important .... Firstly,. pro~i.ding .properties PSo .uniquely identify in two aspects. Firstly, an NR construction gives--.--the objects or concepts denoted bythe-NP.Witha more concise alternative realisation for a relation, where the relation is expressed implicitly rather than explicitly and usually more subtly. It does not need 3\Ve acknowledge that these cue phrases are controversial in their semantic interpretations, but not using cue phrases would be even more ambiguous. Besides, our experiment does not heavily depend on these cue phrases.</Paragraph> <Paragraph position="1"> out these modifiers, the NP can denote more than one object/concept or sets of objects/concepts and is ambiguous in its interpretation, e.g. those in (6a).</Paragraph> <Paragraph position="2"> Such modifiers usually appear in phrases headed by the definite article 'the', which according to Loebner (1987) has the same meaning in all its uses, including in generic references and predicatives. Modifiers (4) a.</Paragraph> <Paragraph position="3"> b.</Paragraph> <Paragraph position="4"> (5) a.</Paragraph> <Paragraph position="5"> b.</Paragraph> <Paragraph position="6"> Private Eye had been threatened with closure because it couldn't afford the libel payment. Private ~Ege;-',which. couldn~t~.a~o.rd.thevlibel. :paymen.t,.,: had:~been~threa~ned'with&quot; closure. But P&G contends the new Cheer is a unique formula that also offers an ingredient that prevents colors from fading. And retailers are expected to embrace the product, because it will take up less shelf space.</Paragraph> <Paragraph position="7"> And retailers are expected to embrace the product, which will take up less shelf space. in other types of genericreferences, e.g. indefini:tes; also belong here.</Paragraph> <Paragraph position="8"> This type subsumes the modifiers normally considered by the referring expression generation module of an NLG system for uniquely identifying the referents (e.g. (Dale, 1992)).</Paragraph> <Paragraph position="9"> Secondly, having no effect in constraining a unique or unambiguous concept out of the NP which is either already unique or not required to have a unique interpretation, but being important to the situation presented in the main proposition containing the NP.</Paragraph> <Paragraph position="10"> This type includes the modifiers described in the previous section and many modifiers in indefinite predicatives, e.g. that in (6b).</Paragraph> <Paragraph position="11"> Thirdly, providing additional details about the referents of the NP, which functions the same way as the NP without these modifiers, e.g. those in (6c). The effect of such modifiers is usually local to the heads they describe rather than to the main propositions as a whole, which is the main difference between this and the second type of modifier.</Paragraph> <Paragraph position="12"> This type subsumes the modifiers normally generated by an aggregation module, in particular one using embedding (e.g. (Shaw and McKeown, 1997), (Cheng, 1998)).</Paragraph> <Paragraph position="13"> (6) a. the decoration on this cabinet; the best looking food I ever saw b. This is a mighty empty country.</Paragraph> <Paragraph position="14"> c. the wide gilt bronze straps on the coffer fronts and sides; He lived in a fiveroom apartment in the Faubourg SaintAntoine. null To find out whether the above distinctions make sense to human subjects, we designed an annotation scheme for modifiers in NPs, describing which elements of an NP should be marked as a modifier and how to mark the features for a modifier. Apart from other features, each modifier should be anno/atecl with a pragmatic function feature (PRAGM), which specifies why a modifier is used it: an NP. The possible values for this feature are unique, int and attr, corresponding to the three types of modifier uses described above (we will use the value names to refer to the different types of modifier in the rest of this paper). X.XlL was used as the markup language.</Paragraph> <Paragraph position="15"> We' had -two trained annotators mark the NP modifiers in the MUSE corpus according to their understanding of the scheme. The agreement between them on the PRAGM feature by means of the Kappa statistic (Caxletta, 1996) is .734, which means that the distinctions we are trying to make can be identified by human subjects to some extent. The main ambiguity exists between int and attr modifiers. There seems to be a gradual difference between them and where to draw the line is a bit arbitrary.</Paragraph> <Paragraph position="16"> In the MUSE corpus annotated so far, 19% of 1078 modifiers in all types of NPs axe identified as int. So this is not a trivial phenomenon.</Paragraph> </Section> <Section position="5" start_page="109" end_page="114" type="metho"> <SectionTitle> 3 An experiment </SectionTitle> <Paragraph position="0"> We reduced the size of the problem of when to use an NR construction by focusing on two relations: a causal relation signalled by 'because' and a temporal relation signalled by 'then'. The reason for choosing these relations is that the possibilities of expressing them through NR constructions have already been shown by linguists. The two cue phrases are typical for the corresponding relations and can often substitute other cue phrases for the same relations. In the rest of this paper, we will still use the term causal or temporal relation, but what we actually mean is the specific relation signalled by 'because' or 'then'.</Paragraph> <Section position="1" start_page="109" end_page="111" type="sub_section"> <SectionTitle> 3.1 Independent variables and hypotheses </SectionTitle> <Paragraph position="0"> From the generation point of view, our question is: given two facts and the semantic relation between them, what extra input do we need for making realisation decisions? We collected examples of 'because' sentences from the MUSE corpus, and Wall .Street Journal source data, and transfered them to NR sentences by hand. Comparing the two constructions, we found some ~, .An~eresting..vaxiation.:. _Eor .example,:compaxing the sentences in Figure 2, we found intuitively that the meanings of (4a) and (4b) are much closer than those of (5a) and (5b). In other words, (4b) can be used in substitution of (4a), whereas (5b) cannot, so easily 41n (Carletta, 1996), a value of K between .8 and I indicates good agreement; a value between .6 and .8 indicates some agreement.</Paragraph> <Paragraph position="1"> number of other collected sentences.</Paragraph> <Paragraph position="2"> We claim that it is the degree ofinferrability of the relation between the semantics expressed through the two clauses that makes the difference. We define the inferrability of a causal/temporal relation as: Given two separate \]acts, the likelihood of human subjects inferring from their world knowledge that a causal/temporal connection between the \]acts might plausibly exist.</Paragraph> <Paragraph position="3"> In examples (4) and (5), the fact that Private Eye cannot afford the libel payment is very likely to directly cause the closure threaten, whereas a product occupying less space is not usually a cause of it being accepted by retailers according to common sense. Therefore, the two realisations in (4) can be used in substitution of one another whereas those in (5) cannot.</Paragraph> <Paragraph position="4"> In\]errability is dynamic and user dependent.</Paragraph> <Paragraph position="5"> Given two facts, people with different background knowledge can infer the relation between them with different ease. If a relation is easily recognisable according to general world knowledge, we say that the inferrability of the relation is globally strong, in which case a hypotactic and an NR construction can express the relation almost equally well (if not considering rhetorical effect). Context can also contribute to the inferrability of a relation. A relation not easily recognisable from world knowledge may be identified by a reader with ease as the discourse proceeds. In this case, we say that the inferrability of the relation is locally strong, where the two constructions can express the relation equally well only in a certain context. In this paper, we mainly consider the global aspect of a relation and we will describe how we decided the value of inferrability in the next section.</Paragraph> <Paragraph position="6"> In Table 1, we summarise the factors (independent variables) that might play a role in the closeness judgement between the semantics of a hypotactie construction and an NR construction. The levels are possible values of these factors. Besides Relation and In\]errability. Position gives the location of the NP that contains the NR modifier. It can be the first (initial) or the last (final) phrase in a sentenceS; Order gives the order of presentation; a hypotactic sentence to be compared with an NR sentence or vice versa, which is used to balance the influence of cue phrases on human judgement; Subordination specifies whether the nucleus or the satellite is realised as an NR clause6; and Cued/NoCue means using a cue phrase in the NR clause or not, which is only applicable to the temporal relation, for example, (7) The health-care services announced the spinoff plan last January, which was then revised in May.</Paragraph> <Paragraph position="7"> Based on our observation of human written sentences, we have the following hypotheses: Hypothesis ! For both causal and temporal relations, the inferrability of the relation between the semantics of two \]acts contributes significantly to the semantic similarities between a hypotactic construction and an NR construction.</Paragraph> <Paragraph position="8"> In other words, if the in\]errability of the relation between the two facts is strong, the semantic relation can be expressed similarly through an NR construction, otherwise, the similarity is significantly reduced. null Hypothesis 2 For the causal relation, the satellite subordination bears significantly higher similarity m meaning to the hypotactic construction than the nucleus subordination does.</Paragraph> <Paragraph position="9"> For example, (4b) would be preferred to &quot;Private Eye, which had been threatened with closure, couldn't afford the libel payment.&quot; Hypothesis 3 For the temporal relation, both the position of subordination and the use of an appropriate cue phrase in the NR clause make a significant difference to the semantic similarities between * a hypotactic and an NR construction. This hypothesis prefers Example (7) to the realisation that does not have 'then'.</Paragraph> <Paragraph position="10"> 5|n our implementation, we restrict ourselves to sentences with two NPs.</Paragraph> <Paragraph position="11"> aWe assume that in the causal relation, the clause bearing 'because'is always the satellite. Since the temporal relation is a multinuclear relation, this factor does not apply.</Paragraph> </Section> <Section position="2" start_page="111" end_page="111" type="sub_section"> <SectionTitle> 3.2 The design of the experiment </SectionTitle> <Paragraph position="0"> To assess a semantic similarity, which is thought to be influenced by the independent variables, we use human subjects to judge the following two dependent variables: Naturalness : how fluent a sentence is on its own. Similarity : how similar the meanings of two sentences are without considering their naturalness. null The scales of the variables are selected such that all values on the scale have natural verbal descriptions that could be grasped easily by our subjects (see Table 2). Similar rating methods have been described in (Jordan et al., 1993) to compare the output of a machine translation system with that of expert humans.</Paragraph> <Paragraph position="1"> Since we want to measure different groups of similarity judgement based on different in\]errability, order or position levels, a between-groups design (Hatch and Lazaraton, 1991) seems to be most appropriate. The design we used is illustrated in Table 3, where all possible combinations of the independent variables are listed. In the table, paraphrases gives the types of alternative sentences each original sentence has. They should be scored by human subjects for their similarities to the original sentences and their naturalness.</Paragraph> <Paragraph position="2"> We used a method similar to random selection to create a stratified random sample. The sample should contain 12 hypotactic sentences and 12 NR sentences: two for each combination of the causal relation and one for each combination of the temporal relation. These numbers were used to obtain as big a sample as possible which could still be judged by human subjects in a relatively short period of time (say less than 30 minutes).</Paragraph> <Paragraph position="3"> Using cue phrases as- the indicators of'the se ..... mantic relations between clauses, we collected all the sentences containing 'because' or 'then' from the Wall Street Journal source data. and went through each of them to pick out those that actually signal the desired relations and can potentially have NRrealisations, i.e. where there is a coreference relation between the two NPs in the two clauses. Sentences containing NR clauses signalled by ', which' or ', who ':~were~=coUected similarly<,<From: these~:seritcnces, we randomly selected one by category. If it realised an unused factor combination, it was kept in the sample. This process was repeated until we collected the right number of test items which instantiated all combinations of properties in Table 3.</Paragraph> <Paragraph position="4"> We asked two subjects to mark the 24 selected items with regard to their inferrability on a five-point scale: 5 for very likely, 4 for quite likely, 3 for possibly, 2 for .even less possibly and 1 for unknown.-We~took values of 4 and 5 as Strong ahd&quot;the others as weak. The subjects and an author agreed on 19 items, and the author's version was used for the experiment.</Paragraph> <Paragraph position="5"> For the test items, we manually produced the corresponding paraphrases, which were then put into a questionnaire for human assessment of the two dependent variables for each paraphrase.</Paragraph> </Section> <Section position="3" start_page="111" end_page="112" type="sub_section"> <SectionTitle> 3.3 Results </SectionTitle> <Paragraph position="0"> We had ten native English speakers evaluating tile similarity and naturalness on the sample.</Paragraph> <Paragraph position="1"> Since the similarity data is ordinal data and departs significantly from a theoretical normal distribution according to One-Sample Komogorov-Smirnov Test, we chose Mann Whitney U, which is a test for comparing two groups on the basis of their ranks above and below the median. The result is summarised in Table 4, with statistically significant items in bold-face (taking the conventional .05 p level). The Z scores tell how many standard deviations above or below the mean an observation might be. Means gives the means of the similarity scores with respect to the values of the independent variables in Table 1. For the causal relation, there is a significant difference between the means of similarities of the two groups of different inferrabilities (P<.0005). So we have high confidence to accept part of Hypothesis 1.</Paragraph> <Paragraph position="2"> i.e. the strong inferrability of the causal relation between the semantics of two facts makes the semantic similarities between a hypotactic construction and an NR construction significantly higher than the weak case does. In the strong case, tile mean of similarity is 4.59, wilich is ,close to very similar.</Paragraph> <Paragraph position="3"> We treated order as a factor to be balanced and did not expect it to have a significant effect, but it does (P=.008). An NR paraphrase shows much higher similarity to its corresponding hypotactic sen- .... tence (with a mean of 4.46) than the other way round (with a mean of 3.83), but the difference becomes smaller for the strong inferrability case. This could be because the causal relations expressed in NR sentences generally sound weaker than those in hypotactic sentences and the cue phrase has a big influence on the perceptibility of a relation.</Paragraph> <Paragraph position="4"> For the temporal relation, position is the only significant factor (P=.0389). So part of Hypothesis 3 is confirmed, that is, the final position subordination makes an NR paraphrase significantly more similar to the corresponding hypotactic construction than the initial position does.</Paragraph> <Paragraph position="5"> We do not have enough evidence to accept the claim that the inferrability of the temporal relation contributes significantly to the similarity judgement (as in Hypothesis 1). However, when we calculated the similarity mean for the alternative sentences using cue phrases, strong or weak in inferrability, we got 4.94 (very similar). Comparing this with that of the strong causal case using the Mann Whitney U test, we get a significance level of 0.0294. This means that we have strong confidence to believe that the similarity mean for the temporal relation if using a cue phrase is significantly . higher. -than, that for the strong causal relation. Therefore, the temporal relation can always be realised by an NR construction as long as an appropriate cue phrase is used in the NR clause.</Paragraph> <Paragraph position="6"> The assumption of normality is also not met by the subset of the data related to Hypothesis 2 and 3 (i.e. the similarity scores for nucleus/satellite subor-Whitney U on the similarity data dination paraphrases and cued/nocue paraphrases).</Paragraph> <Paragraph position="7"> We used the Wilcoxon Matched-Pairs Signed-Ranks Test because we were comparing pairs of paraphrases. The result is given in Table 5. We accept the hypothesis that the similarity means of nucleus and satellite subordination are significantly different in the initial position (Hypothesis 2). This confirms the linguistic observation that information of greater importance should be presented in a main position rather than a subordinate position. We can also accept the hypothesis that for the temporal relation, using cue phrases in NR clauses can significantly improve the similarity score of the NR construction (Hypothesis 3).</Paragraph> </Section> <Section position="4" start_page="112" end_page="114" type="sub_section"> <SectionTitle> 3.3.2 Naturalness </SectionTitle> <Paragraph position="0"> ~,Y=e -used the Mann Whitney U test on naturalness with regards to order, inferrability and position, and found no significant connection. Figure 3 shows the distribution of naturalness assessment of the paraphrases for the causal and temporal relation respectively. The majority of the NR constructions are natural or fairly natural, which suggests that they could be good alternative realisations.</Paragraph> <Paragraph position="1"> We briefly summarise the heuristics drawn from the experiment for expressing the causal and temporal relations with an NR construction. This is an acceptable realisation in the following circumstances: e the causal relation holds between two facts and the inferrability of the relation is strong, in which case satellite subordination should be used; or (r) the temporal relation holds between two facts, in which case a final position subordination and an appropriate cue phrase, like 'then', should be used in the NR clause.</Paragraph> <Paragraph position="2"> We also found that an NR construction can express the causal/temporal relation and the objectattribute elaboration relation at the same time, irrespective of the inferrability of the relation. Generally speaking, a semantic relation expressed by an NR construction sounds weaker than a hypotactic realisation with a cue phrase. Therefore, if a relation is to be emphasised, NR constructions should not be used.</Paragraph> <Paragraph position="3"> 4 Implementing the results in a OA-based text planner int-modifiers have a mixed character, i.e. like attr-modifiers they are not essential for identifying the referents, but like unique-modifiers they are not optional. Because of their role in supporting the semantics of the main propositions, the selection of int-modifiers should be a part of the text planning process, where a text structure is constructed to fulfill the overall goals for producing the text. However, compared with unique-modifiers, int-modifiers are less essential for an NP and they can only be added if there are available syntactic slots.</Paragraph> <Paragraph position="4"> Since embedding deals with attr-modifiers at both a content selection and an abstract realisation level, it could coordinate the addition of int-modifiers.</Paragraph> <Paragraph position="5"> Therefore, the text planner could consult the embedding module as to whether a property can be realised as an NP modifier, under the constraints from the NP type and the unique-modifiers that are already there. In other words, the text planner chooses facts to satisfy certain goals and the embedding process decides if the facts can be realised as NP modifiers in an abstract sense.</Paragraph> <Paragraph position="6"> We need a generation architecture that allows a certain degree of interaction between text planning, referring expression generation and embedding. So we chose the Genetic Algorithm based text planner described in (Mellish et el., 1998). Their task is, given a set of &quot;facts and-relations between facts, 'to produce a legal RST tree using all the facts and some relations. Tile text planning is basically a two step process. Firstly sequences of facts are generated by applying GA operators, and secondly the rhetorical structure trees built from these sequences are evaluated and the good sequences are kept for producing better offspring.</Paragraph> <Paragraph position="7"> We extended the text planner by adding a GA operator called embedding mutation, .which ~andomly selects two items mentioning a common entity from a sequence and assumes an embedding on them. Embeddings are evaluated together with the other properties an RST tree has. In this way, embedding is performed during text planning. The ultimate score of a tree is the sum of positive and negative scores for all the good and bad properties it bears. Since good embeddings are scored higher, they are kept in the sequences for producing,better offspring and. are very likely to be included in the final output.</Paragraph> <Paragraph position="8"> We incorporated the results from the experiment into the GA planner by using them as preferences for evaluating RST trees. We treated inferrability as an input to the system. If a good embedding can be formed from two facts connected by an RST relation (i.e. either of the two cases in Section 3.3.3 is satisfied and the required syntactic slot is free), the embedding is scored higher than the hypotactic realisation. However, this emphasis on embedding might not be appropriate. In a real application environment, other communicative intentions should be incorporated to balance the scoring for different realisations. And generally, inferrability has to be implemented based on limited domain-dependent knowledge and user configuration.</Paragraph> </Section> </Section> class="xml-element"></Paper>