File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2907_evalu.xml
Size: 5,912 bytes
Last Modified: 2025-10-06 13:59:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2907"> <Title>Investigating Lexical Substitution Scoring for Subtitle Generation</Title> <Section position="7" start_page="49" end_page="50" type="evalu"> <SectionTitle> 4.2 Results </SectionTitle> <Paragraph position="0"> Figure 1 shows the accuracy and average precision resultsofthevariousmodelsonourtestset. Therandom baseline and corresponding significance levels were achieved by averaging multiple runs of a system that assigned random scores. As can be seen in the figures, the models' behavior seems to be consistent in both evaluation measures.</Paragraph> <Paragraph position="1"> Overall, the distributional similarity based method (sim) performs much better than the other methods. In particular, Lin's similarity also performs better than semcor, the other context-independent model. Generally, the context independent models perform better than the contextual ones. Between the two contextual models, nntr is superior to Bayes. In fact the Bayes model is not significantly better than random scoring.</Paragraph> <Section position="1" start_page="49" end_page="50" type="sub_section"> <SectionTitle> 4.3 Analysis and Discussion </SectionTitle> <Paragraph position="0"> When analyzing the data we identified several reasons why some of the WordNet substitutions were judged as false. In some cases the source word as appearing in the original sentence is not in a sense for which it is a synonym of the target word. For example, in many situations the word answer is in the sense of a statement that is made in reply to a question or request. In such cases, such as in example 2 from Table 1, answer can be successfully replaced with reply yielding a substitution which conveys the original meaning. However, in situations such as in example 1 the word answer is in the sense of a general solution and cannot be replaced with reply. This is also the case in examples 4 and 5 in which subject does not appear in the sense of topic or theme.</Paragraph> <Paragraph position="1"> Havinganinappropriatesense, however, isnotthe only reason for incorrect substitutions. In example 8 approach appears in a sense which is synonymous with attack and in example 9 problem appears in a sense which is synonymous with a quite uncommon use of the word job. Nevertheless, these substitutions were judged as unacceptable since the desired sense of the target word after the substitution is not very clear from the context. In many other cases, such as in example 7, though semantically correct, thesubstitutionwasjudgedasincorrectduetostylistic considerations. Finally, there are cases, such as in example 6 in which the source word is part of a collocation and cannot be replaced with semantically equivalent words.</Paragraph> <Paragraph position="2"> When analyzing the mistakes of the distributional similarity method it seems as if many were not necessarily due to the method itself but rather to implementation issues. The online source we used contains only the top most similar words for any word.</Paragraph> <Paragraph position="3"> In many cases substitutions were assigned a score of zerosincetheywerenotlistedamongthetopscoring similar words in the database. Furthermore, the corpus that was used for training the similarity scores was news articles in American English spelling and does not always supply good scores to words of British spelling in our BBC dataset (e.g. analyse, behavioural, etc.).</Paragraph> <Paragraph position="4"> The similarity based method seems to perform better than the SemCor based method since, as noted above, even when the source word is in the appropriate sense it not necessarily substitutable with the target. For this reason we hypothesize that applying Word Sense Disambiguation (WSD) methods to target words may have only a limited impact on performance. null Overall, context independent models seem to perform relatively well since many candidate synonyms are a priori not substitutable. This demonstrates that such models are able to filter out many quirky Word-Net synonyms, such as problem and job.</Paragraph> <Paragraph position="5"> Fitness to the sentence context seems to be a less frequent factor and not that trivial to model. Local context (adjacent words) seems to play more of a role than the broader sentence context. However, these two types of contexts were not distinguished in the bag-of-words representations of the two contextual methods that we examined. It will be interesting to investigate in future research using different feature types for local and global context, as commonly done for Word Sense Disambiguation (WSD). Yet, it would still remain a challenging task to correctly distinguish, for example, the contexts for which answer is substitutable by reply (as in example 2) from contexts in which it is not (as in example 1).</Paragraph> <Paragraph position="6"> So far we have investigated separately the performance of context independent and contextual models. In fact, the accuracy performance of the (context independent) sim method is not that far from the upper bound, and the analysis above indicated a rather small potential for improvement by incorporating information from a contextual method. Yet, there is still a substantial room for improvement in therankingqualityofthismodel, asmeasuredbyaverage precision, and it is possible that a smart combination with a high-quality contextual model would yield better performance. In particular, we would expect that a good contextual model will identify the cases in which for potentially good synonyms pair, the source word appears in a sense that is not substitutable with the target, such as in examples 1, 4 and 5 in Table 1. Investigating better contextual models and their optimal combination with context independent models remains a topic for future research.</Paragraph> </Section> </Section> class="xml-element"></Paper>