File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-2023_concl.xml

Size: 2,181 bytes

Last Modified: 2025-10-06 13:54:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2023">
  <Title>An Unsupervised System for Identifying English Inclusions in German Text</Title>
  <Section position="8" start_page="137" end_page="137" type="concl">
    <SectionTitle>
7 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> We have presented an unsupervised system that exploits linguistic knowledge resources including lexicons and the Web to classify English inclusions in German text on different domains. Our system can be applied to new texts and domains with little computational cost and extended to new languages as long as lexical resources are available. Its main advantage is that no annotated training data is required.</Paragraph>
    <Paragraph position="1"> The evaluation showed that our system performs well on non-sparse data sets. While being out-performed by a machine learner which requires a trained model and therefore manually annotated data, the output of our system increases the performance of the learner when incorporating this information as an additional feature. Combining statistical approaches with methods that use linguistic knowledge resources can therefore be advantageous.</Paragraph>
    <Paragraph position="2"> The low results obtained in the CD experiments indicate however that the machine learner merely learns a lexicon of the English inclusions encountered in the training data and is unable to classify many unknown inclusions in the test data. The Google lookup module implemented in our system represents a first attempt to overcome this problem as the information on the Web never remains static and at least to some extent reflects language in use.</Paragraph>
    <Paragraph position="3"> The current system tracks full English word forms. In future work, we aim to extend it to identify English inclusions within mixed-lingual tokens. These are words containing morphemes from different languages, e.g. English words with German inflection (Receivern) or mixed-lingual compounds (Shuttleflug). We will also test the hypothesis that automatic classification of English inclusions can improve text-to-speech synthesis quality.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML