File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-3013_intro.xml
Size: 4,688 bytes
Last Modified: 2025-10-06 14:02:55
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-3013"> <Title>NIL Is Not Nothing: Recognition of Chinese Network Informal Language Expressions</Title> <Section position="2" start_page="0" end_page="95" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The rapid global proliferation of Internet applications has been showing no deceleration since the new millennium. For example, in commerce more and more physical customer services/call centers are replaced by Internet solutions, e.g. via MSN, ICQ, etc. Network informal language (NIL) is actively used in these applications. Following this trend, we forecast that NIL would become a key language for human communication via network.</Paragraph> <Paragraph position="1"> Today NIL expressions are ubiquitous. They appear, for example, in chat rooms, BBS, email, text message, etc. There is growing importance in understanding NIL expressions from both technology and humanity research points of view. For instance, comprehension of customer-operator dialogues in the aforesaid commercial application would facilitate effective Customer Relationship Management (CRM).</Paragraph> <Paragraph position="2"> Recently, sociologists showed many interests in studying impact of network-mediated communication on language evolution from psychological and cognitive perspectives (Danet, 2002; McElhearn, 2000; Nishimura, 2003). Researchers claim that languages have never been changing as fast as today since inception of the Internet; and the language for Internet communication, i.e. NIL, gets more concise and effective than formal language.</Paragraph> <Paragraph position="3"> Processing NIL text requires unconventional linguistic knowledge and techniques. Unfortunately, developed to handle formal language text, the existing natural language processing (NLP) approaches exhibit less effectiveness in dealing with NIL text. For example, we use ICTCLAS (Zhang et al., 2003) tool to process sentence &quot;4 ?4?U (Is he going to attend a meeting?)&quot;. The word segmentation result is &quot; |4 |? |4 |?U &quot;. In this sentence , &quot;4 ? 4 (xi4 ba1 xi4)&quot; is a NIL expression which means 'is he ....?' in this case. It can be concluded that without identifying the expression, further Chinese text processing techniques are not able to produce reasonable result.</Paragraph> <Paragraph position="4"> This problem leads to our recent research in &quot;NIL is Not Nothing&quot; project, which aims to produce techniques for NIL processing, thus avails understanding of change patterns and behaviors in language (particularly in Internet language) evolution. The latter could make us more adaptive to the dynamic language environment in the cyber world.</Paragraph> <Paragraph position="5"> Recently some linguistic works have been carried out on NIL for English. A shared dictionary has been compiled and made available online. It contains 308 English NIL expressions including English abbreviations, acronyms and emoticons.</Paragraph> <Paragraph position="6"> Similar efforts for Chinese are rare. This is because Chinese language has not been widely used on the Internet until ten years ago. Moreover, Chinese NIL expression involves processing of Chinese Pinyin and dialects, which results in higher complexity in Chinese NIL processing.</Paragraph> <Paragraph position="7"> In &quot;NIL is Not Nothing&quot; project, we develop a comprehensive Chinese NIL dictionary. This is a difficult task because resource of NIL text is rather restricted. We download a collection of BBS text from an Internet BBS system and construct a NIL corpus by annotating NIL expressions in this collection by hand. An empirical study is conducted on the NIL expressions with the NIL corpus and a knowledge mining tool is designed to construct the NIL dictionary and generate statistical NIL features automatically. With these knowledge and resources, the NIL processing system, i.e. NILER, is developed to extract NIL expressions from NIL text by employing state-of-the-art information extraction techniques.</Paragraph> <Paragraph position="8"> The remaining sections of this paper are organized as follow. In Section 2, we observe formation of NIL expressions. In Section 3 we present the related works. In Section 4, we describe NIL corpus and the knowledge engineering component in NIL dictionary construction and NIL features generation. In Section 5 we present the methods for NIL expression recognition. We outline the experiments, discussions and error analysis in Section 6, and finally Section 7 concludes the paper.</Paragraph> </Section> class="xml-element"></Paper>