File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1080_concl.xml
Size: 2,159 bytes
Last Modified: 2025-10-06 13:55:19
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1080"> <Title>Self-Organizing D2-gram Model for Automatic Word Spacing</Title> <Section position="14" start_page="637" end_page="639" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> In this paper we have proposed a new method to learn word spacing in Korean by adaptively organizing context size. Our method is based on the simple D2-gram model, but the context size D2 is changed as needed. When the increased context is much different from the current one, the context size is increased. In the same way, the context is decreased, if the decreased context is not so much different from the current one. The benefits of this method are that it can consider wider context by increasing context size as required, and save the computational cost due to the reduced context.</Paragraph> <Paragraph position="1"> The experiments on HANTEC corpora showed that the proposed method improves the accuracy of the trigram model by 3.72%. Even compared with some well-known machine learning algorithms, it achieved the improvement of 2.63% over decision trees and 2.21% over support vector machines. In addition, we showed two ways for improving the proposed method: considering right context and word spacing sequence. By considering left and right context at the same time, the accuracy is improved by 1.23%, and the consideration of word spacing sequence gives the accuracy improvement of 2.34%.</Paragraph> <Paragraph position="2"> The D2-gram model is one of the most widely used methods in natural language processing and information retrieval. Especially, it is one of the successful language models, which is a key technique in language and speech processing. Therefore, the proposed method can be applied to not only word spacing but also many other tasks. Even though word spacing is one of the important tasks in Korean information processing, it is just a simple task in many other languages such as English, German, and French. However, due to its generality, the importance of the proposed method yet does hold in such languages.</Paragraph> </Section> class="xml-element"></Paper>