File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/w95-0109_concl.xml
Size: 1,895 bytes
Last Modified: 2025-10-06 13:57:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W95-0109"> <Title>Automatic Construction of a Chinese Electronic Dictionary</Title> <Section position="6" start_page="118" end_page="118" type="concl"> <SectionTitle> I </SectionTitle> <Paragraph position="0"> in seed size does provide significant improvement on precision and recall. With the large seed corpus, the weighted precision and recall are 71% and 73%. Considering the fact that the parts of speech are optimized from 10 parts of speech for each word, the results are reasonably acceptable.</Paragraph> <Paragraph position="1"> In this paper, we propose an unsupervised reestimation approach and a two-class classification method to extract embedded words from a large unsegmented Chinese text, and assign possible parts of speech to each word with a similar reestimation method. An electronic dictionary with parts of speech information can thus be acquired automatically.</Paragraph> <Paragraph position="2"> It is observed that the system could acquire POS-tagged lexicon entries with a reasonably acceptable precision and recall. Since this approach adopts an unsupervised learning approach to construct the dictionary, its performance, in terms of precision and recall, is less satisfactory than a supervised learning strategy, where a large tagged corpus and dictionary are used. However, it requires little human intervention in the whole process, the cost to construct the dictionary, in terms of budget and time for pre-tagging, is much smaller than a supervised learning approach. Therefore, it is worth while trading off the precision requirement with the cost of dictionary construction. With the results of this preliminary study, it is expected that the current techniques described here could form a good basis for constructing a better and automatic dictionary construction system.</Paragraph> </Section> class="xml-element"></Paper>