File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1705_concl.xml

Size: 1,542 bytes

Last Modified: 2025-10-06 13:53:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1705">
  <Title>A Bottom-up Merging Algorithm for Chinese Unknown Word Extraction</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
7 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> In this research, Chinese word segmentation and unknown word extraction has been integrated into a frame work. To increase the coverage of the morphological rules, we first derive a set of general rules to represent all kinds of unknown words. To avoid extracting superfluous character strings, we then append these rules with linguistic and statistical constraints. We propose an efficient bottom-up merging algorithm by consulting the general rules to extract unknown words and using priority measures to resolve the rule matching ambiguities. In the experiment, we compare effects of different priority strategies, and experimental results show that the co-occurrence measure performances best.</Paragraph>
    <Paragraph position="1"> It is found that the performance of unknown word detection would affect the entire performance significantly. Although the performance of unknown word detection is not bad, there is still room for improvement. The possible strategies for improvement in our future work include using contextual semantic relations in detection, and some updated statistical methods, such as support vector machine, maximal entropy and so on, to achieve better performance of unknown word detection.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML