File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/c02-1009_concl.xml
Size: 2,047 bytes
Last Modified: 2025-10-06 13:53:12
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1009"> <Title>A Robust Cross-Style Bilingual Sentences Alignment Model</Title> <Section position="8" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> Although those length-based approaches are simple and can achieve good performance when they are trained and tested in the corpora of the same style, the performance drops significantly when they are tested in different styles other than that of the training corpora. (For instance, the F-measure error increases from 1.8% to 14.4% in our experiment.) The main reason is that the statistical characteristics of those featuresadoptedbythelength-basedapproaches(such null as length-distribution, alignment-type-distribution and cognate-frequency) vary significantly from one style to another style.</Paragraph> <Paragraph position="1"> Since human align sentences mainly by examining the similarity between different meanings conveyed by the given bilingual sentences pair, not by counting the number of characters in sentences, the transfer-lexicon is expected to be the more reliable cue than the sentence length. A robust statistical sentences alignment model, which integrates the associated transfer-lexicons into the original length-based model, is thus proposed in this paper. Great improvement has been observed in our experiment, which reduces the F-measure error generated from the length-based model from 14.4% to 5.8%, when the proposed approach is tested in the cross-style case.</Paragraph> <Paragraph position="2"> Last, length-features, cognate-feature and transfer-lexicon-feature are implicitly assumed to contribute equally in aligning sentences in this paper; however this assumption is not usually held because different features might have various dynamic ranges for their scores and thus contribute differently to discrimination power. To overcome this problem, various features would be weighted differently in the future.</Paragraph> </Section> class="xml-element"></Paper>