File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0414_concl.xml
Size: 2,753 bytes
Last Modified: 2025-10-06 13:53:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0414"> <Title>Using 'smart' bilingual projection to feature-tag a monolingual dictionary</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 7 Discussion and conclusion </SectionTitle> <Paragraph position="0"> We have presented an approach to tagging a monolingual dictionary with linguistic features such as pos, number, and tense. We use a bilingual corpus and the English pos tags to extract information that can be used to infer the feature values for the target language.</Paragraph> <Paragraph position="1"> We have further argued that our approach can be used to infer the morphemes that mark the linguistic features in question and to assign the morphemes linguistic meaning. While various frameworks for unsupervised morpheme extraction have been proposed, many of them more sophisticated than ours, the main advantage of this approach is that the annotation of morphemes with their meaning is immediate. We believe that this is an important contribution, as role assignment becomes indispensible for tasks such as Machine Translation.</Paragraph> <Paragraph position="2"> One area of future investigation is the improvement of the classification algorithm. We have only presented one approach to classification. In order to apply established algorithms such as Support Vector Machines, we will have to adopt our algorithm to extract a set of likely positive examples as well as a set of likely negative examples. This will be the next step in our process, so that we can determine the performance of our system when using various well-studied classification methods.</Paragraph> <Paragraph position="3"> This paper represents our first steps in bilingual feature annotation. In the future, we will investigate tagging target language words with gender and case. This information is not available in English, so it will be a more challenging problem. The extracted training data will have to be fragmented based on what has already been learned about other features.</Paragraph> <Paragraph position="4"> We believe that our approach can be useful for any application that can gain from linguistic information in the form of feature tags. For instance, our system (Carbonell et al., 2002) infers syntactic transfer rules, but it relies heavily on the existence of a fully-inflected, tagged target language dictionary. With the help of the work described here we can obtain such a dictionary for any language for which we have a bilingual, sentence-aligned corpus.</Paragraph> <Paragraph position="5"> Other approaches to Machine Translation as well as applications like shallow parsing could also benefit from this work.</Paragraph> </Section> class="xml-element"></Paper>