File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0308_concl.xml
Size: 2,537 bytes
Last Modified: 2025-10-06 13:53:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0308"> <Title>TREQ-AL: A word alignment system with limited language resources</Title> <Section position="7" start_page="1" end_page="1" type="concl"> <SectionTitle> 6 Conclusions and further work </SectionTitle> <Paragraph position="0"> TREQ-AL was developed in a short period of time and is not completely tested and debugged. At the time of writing we already noticed two errors that were responsible for several wrong or missed links. There are also some conceptual limitations which, when removed, are likely to further improve the performance. For instance all the words in virgin alignment zones are automatically given null links but the algorithm could be modified to assign all the links in the Cartesian product of the words in the corresponding virgin zones.</Paragraph> <Paragraph position="1"> The typical example for such a case is represented by the idiomatic expressions (tanda pe manda = the list that sum up). A bilingual dictionary of idioms as an external resource certainly would significantly improve the results. Also, with an additional preprocessing phase, for collocation recognition, many missing links could be recovered. At present only those collocations that represent 1-2 or 2-1 alignments are recovered.</Paragraph> <Paragraph position="2"> A major improvement will be to make the algorithm symmetric. There are many cases when reversing the source and target languages new links can be established. This can be explained by different polysemy degrees of the translation equivalent words and the way we associate alignment zones.</Paragraph> <Paragraph position="3"> The word order in Romanian and English to some extent is similar, but in the present version of TREQ-AL this is not explicitly used. One obvious and easy improvement of TREQ-AL performance would be to take advantage of the similarity in word order and map the virgin zones and afterwards, the words in the virgin zones.</Paragraph> <Paragraph position="4"> Finally, we noticed in the gold standard some wrong alignments. One example is the following: &quot;... a XI - a ...&quot; = &quot;... eleventh...&quot; Our program aligned all the 4 tokens in Romanian (a, XI, -, a) to the English token (eleventh), while the gold standard assigned only &quot;XI&quot; to &quot;eleventh&quot; and the other three Romanian tokens were given a null link. We also noticed some very hard to achieve alignments (anaphoric links).</Paragraph> </Section> class="xml-element"></Paper>