File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1713_metho.xml
Size: 23,295 bytes
Last Modified: 2025-10-06 14:09:16
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1713"> <Title>From Text to Exhibitions: A New Approach for E-Learning on Language and Literature based on Text Mining</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The Digital Museum Framework </SectionTitle> <Paragraph position="0"> Instead of digital library and traditional digital museum systems, which provide single function of exhibition, a modern digital museum provides multidimensional functions. Generally, a modern digital museum has three key functions, exhibition, education and research. In our design of Digital Museum for Language and Literature, the three dimansion would be: interacting theme based exhibitions from texts, E-Learning modules on language and literature, and related research on Computational Linguistics.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Digital Museum and E-Learning on Language and Literature </SectionTitle> <Paragraph position="0"> Digital Museum systems have gone beyond exhibitions of digital collections. Instead, they would increasingly emphasize educational uses rather than traditional exhibitions. It provides users with educational and well-motivated exhibitions [13]. UK-wide Digital Museum linked exhibitions connected by subject and theme with an integrated learning environment [6]. By 2000, the National Science Plan of Digital Museums of Taiwan has defined a specific and integrated program on how to utilize scientific technology, especially information technology, and how to digitalize the archives in both cultural and natural fields, with significant humanistic meaning. It has conducted further discussions on how to apply these kinds of digital projects and productions to education, research and industrialization, for the sake of conserving culture, promoting education, inspiring research and increment of industrialization. [3].</Paragraph> <Paragraph position="1"> Knowledge on a learning topic should be organized as an exhibition theme, which is represented by a series of real or virtual objects and detailed descriptions. Exhibitions of various themes are linked together corresponding to the relativity of their themes. Learners can participate in the Digital Museum by choosing a pathway of linked exhibitions with a typical topic. Special modules will also be provided for participants to interacting with the system, which will be discussed in section 4.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 General Architecture Design of a Digital Museum </SectionTitle> <Paragraph position="0"> The life cycle of a modern digital museum looks like a fountain model [11]. There are feedbacks from each design phase to previous phases. There are several milestones in the life cycle, each of which acts as a knowledge container and a foundation of knowledge processing on upper levels. [14]. These knowledge containers are as follows:</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Milestones Functionality Information Origin Pool: (Primitive Corpus) </SectionTitle> <Paragraph position="0"> The mass storage of large-scale information from preliminary digitalization work.</Paragraph> <Paragraph position="1"> individual or integrated, for regular accessing by system.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Reusable Tool Base for Functional Modules </SectionTitle> <Paragraph position="0"> Tool pool for reusable module functions, individual or integrated components for various use.</Paragraph> <Paragraph position="1"> Multi-functional Interface Web-based interface for exhibitions, education and research.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Architecture </SectionTitle> <Paragraph position="0"> Based on these milestones, the general architecture of a Digital Museum on Language and Literature can be represented in the following figure: Figure1: General Architecture of a Digital Museum based on language processing 2.3 Example: Introduction to the Digital Museum of Chinese Ancient Poetry The Digital Museum of Chinese Ancient Poetry Art [10] is a research model by Peking University, Beijing, combining E-Learning, computer assisted research on Chinese Ancient Poetry and computational linguistics. A prototype of this Digital Museum was designed in order to meet the needs of exhibition, education and research on the art of Chinese Ancient Poetry. The analysis, design and implementation of this project were on a highly abstract level.</Paragraph> <Paragraph position="1"> The information origin pool and the refined knowledge base of this project were also the corpus for related computational linguistics research. It involves Chinese Ancient Poetry across 2,000 years, approximately 100,000 items [10]. Other advanced knowledge bases such as Author Information base, Image and media base, Location information base and Word lists were constructed. In the design of this Digital Museum system, knowledge mining was divided into two types, item entity information mining and relational information mining. Item entity information was detailed to exhibiting items, characters, images, media, locations and words. Relational information reflected all aspects of relations among items. Metadata for each category of instances was defined in the design phase. Particularly, a group of items with relating meaning was structured as a virtual item class, which was also treated as a specific item.</Paragraph> <Paragraph position="2"> In the prototype system, items of poetry, character, location and others were exhibited along with all related formats of knowledge. Users can leap from one item to its related items, and learn them in the context where they originally belongs. Sample exhibitions on specific themes, such as clothing, plants, food and spring were also designed.</Paragraph> <Paragraph position="3"> this Digital Museum In the dimension of learning, Digital Museum of Chinese Ancient Poetry explored the study of E-Leaning system for the language and literature features of Chinese Ancient Poetry. It enabled a way to learn a poem in its background environment, with reference to its related poetry and other related objects in multiple formats. The system also presented statistical research results of the corpus to users, such as the words usages of authors, the cooccurrence of words, the likelihood of the hidden meanings of words, which help users to be well-informed and easier to understand in learning a poem or a word.</Paragraph> <Paragraph position="4"> In the dimension of research, the digital museum is closely related to specific research topics on computational linguistics, especially statistical natural language processing. We refined unknown words from the corpus though statistic methods and explored to cluster them into concepts. In this way, we studied the hidden meanings of words and poetry in context and studied the relation discovery among poems. We also conducted some research of knowledge mining and discovering from corpus, which can also inspire extended researches like Knowledge of humanities areas, especially language and literature, is commonly carried by texts. Therefore, the language processing, specifically the text processing will be vital for transforming pure texts and domain knowledge into abstracted exhibitions. Actually, most digital museums today haven't made good use of computational linguistics techniques. Most of them remain on organizing exhibitions manually and providing them online. Those exhibitions are relatively isolated from each other.</Paragraph> <Paragraph position="5"> However, there are remarkable relations among text units and real objects and topics, which are hidden in the texts. For example, the word &quot;willow&quot; seems having nothing to do with &quot;getting apart&quot; by the semantic definitions, but in the context, &quot;breaking a willow branch&quot; does indicate &quot;send-off friends&quot;, or &quot;seeing a friend leaving&quot; in Chinese Ancient Poetry.</Paragraph> <Paragraph position="6"> These meaningful entities and relations can be learned from the statistical analysis of large scale poetry texts. The use of computational linguistics methods here is crucial, which distinguishes it with traditional Digital Museum models. Statistical natural language processing over large scale corpus is the most significant approach we have adopted in this research.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Construction of Corpuses and Integrated Knowledge bases </SectionTitle> <Paragraph position="0"> The first phase of language processing is to build corpora and knowledge bases. Primitive corpora are constructed by archive digitalization. Refined corpora are constructed by applying language processors on the primitive corpus. We can use Digital Museum of Chinese Ancient Poetry for example.</Paragraph> <Paragraph position="1"> For the Digital Museum of Chinese Ancient Poetry Art, the primitive corpora include texts of poems over 1, 200, 000 lines, descriptions of 4000 authors, a name dictionary and a location dictionary. The refined corpora include a words dictionary which is thoroughly discovered from the texts, a concept base constructed by supervised word clustering and a storage of words cooccurances. Other knowledge bases include images, music, medias(reading), relics, events, and a series of expertise knowledge on Chinese</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Ancoent Poetry. </SectionTitle> <Paragraph position="0"> The general ontology of domain knowledge was carefully studied. Important entities and relations from texts and related domains were determined.</Paragraph> <Paragraph position="1"> Consequently, we carefully designed the metadata and chose a database system to maintain the knowledge base. This knowledge base should be expandable so that it can contain texts, entities from related domains, and relations.</Paragraph> <Paragraph position="2"> The last step of this phase is to design an referencing mechanism to query and get the answer. The outcome of this phase is an integrated knowledge base, the textual part of which is the corpus for mining and knowledge discovery.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Text Mining: Extracting Objects from Texts </SectionTitle> <Paragraph position="0"> As soon as the corpora and knowledge bases are constructed, higher level methods of natural language processing are applied to mine in the corpus. The goal is to find objects abstracted from texts, which are organized by individual topics.</Paragraph> <Paragraph position="1"> Statistical natural language processing plays a very important role in this procedure, which can be described in the following three levels.</Paragraph> <Paragraph position="2"> Texts.</Paragraph> <Paragraph position="3"> Textual knowledge is not &quot;dead&quot; in the fields of language and literature. It is interacting with knowledge in other forms, by other carrier or on other abstract level. Taking Chinese ancient poetry for example, a poem is associated to its author, its era and its writing background. The textual body of a poem also refers to certain persons, events, locations, plants, scenes, feelings and other entities, either real or virtual. In addition, there are various sources of objects relevant to the poem, such as paintings, calligraphy works, music and cultural relics, etc. All these entities above are so important to the synopsis of the poem that it is an advisable way to learn the poem with the appearance of these objects. Furthermore, relying on these directly relevant objects makes teaching and learning much more open and exciting than barely focusing on texts.</Paragraph> <Paragraph position="4"> In the early phase of Digital Museum design, an integrated exhibition base is built, in which directly relevant entities of the texts are refined, stored in relational or XML databases and associated with the body of texts.</Paragraph> <Paragraph position="5"> Associated with Language Units.</Paragraph> <Paragraph position="6"> As the Computer assisted research develops on these fields, we can work on the hidden knowledge of texts by means of text mining and retrieval. As language technology evolves, a computational age of language has arrived [1]. We can conduct computer assisted analytical research on language, with both linguistic and statistical approaches. In the research on the language of Chinese ancient poetry, we studied the statistical concurrences and meaningful units in the texts, extracted words from collocations and clustered words into meaningful concepts. In further research, we explored ways to study the hidden meanings of the words and collocations, especially those related to emotions of human. Consequently, expected to learn emotional characteristic of a poem, associating words, concepts and other units it refers with the similar characteristic.</Paragraph> <Paragraph position="7"> On the other hand, language and texts are the most important carriers of cultural fragments.</Paragraph> <Paragraph position="8"> Many interesting knowledge patterns are hidden in the texts. There is a considerable proportion of Chinese ancient history and culture buried in the texts of Chinese ancient poetry, which evolutes along more than 2,000 years and involves locations all over China. By language techniques, fragments of culture can be mined from the texts, refined and stored, and finally integrated into interacting virtual scenes.</Paragraph> <Paragraph position="9"> By this we can discover hidden entities and relations associated with text and expand it to analytical meaningful segments.</Paragraph> <Paragraph position="10"> In our framework, knowledge entities are not living alone but interacting. Both textual entities and other objects are associated to its relevant entity set. There are two kinds of relations identifying that two entities are interacting, direct relation, which have already been discussed above, and indirect relation. For instance, a poem refers to various knowledge objects, thus poems referring to the same objects are indirectly interacting with each other. These poems are involved in their relevant entity set, with &quot;identical reference&quot; as an indirect relation. In a more intelligent level, poems with the similar hidden meanings or relevant emotions are arranged together as a set. This set can be associated with a topic, a subject, a scene or a specific semantic cluster.</Paragraph> <Paragraph position="11"> In these three approaches to expand textual knowledge into relevant objects, a former purely textual entity has been developed as involving in the surrounding of various relevant objects, real or virtual. Thus we complete the procedure of extracting objects for exhibitions from texts. An example from poems to objects is as follows: Figure2: Expanding Objects Set from a Poem Text.</Paragraph> </Section> <Section position="9" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Theme Driven Knowledge Discovery </SectionTitle> <Paragraph position="0"> From the statistical analysis on character concurrences, we applied various methods to discover unknown words from the texts. Chinese language is different from other language because there isn't natural interval from a word to another. We consider all words to be unknown in the beginning and generate a word dictionary from the filtering by mutual information value, u-test and other statistical methods.</Paragraph> <Paragraph position="1"> Upon the word dictionary, we conducted words clustering by the distance of words concurrence vectors. This procedure has abstracted concepts from words. After supervised filtering, these concepts will indicate some hidden semantic meanings.</Paragraph> <Paragraph position="2"> The consecutive knowledge discovery work will be theme driven. First, a theme, or a learning topic is decided, some features and key concepts of this theme will be decided with the expert knowledge. Using statistical methods, we can find the concepts and words which are semantically similar or in some way related to this theme. Then, directly and indirectly related objects (discussed in section 3.2) will be associated with the topic. Then, reluctant units are eliminated. We will filter the most significant entities and relations, which can be represented by combinations of both concepts and words, and organize them around the theme. In this way, we can put the topic/theme back to its ancient living environment.</Paragraph> <Paragraph position="3"> Further works includes rebuilding ancient scenarios where the topic belongs, and mining for relations among topics.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Knowledge Processing and Integration of </SectionTitle> <Paragraph position="0"> the Digital Museum Knowledge processing plays a very significant role in the Digital Museum framework. It is involved as a clue throughout the life cycle of the digital museum. The entire design and implementing of the digital museum is focusing on language processing, knowledge discovery and exhibition integrating. The knowledge processing procedures can be represented in the following figure: Figure3: Knowledge Processing in this digital museum.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Knowledge Processing Hierarchy </SectionTitle> <Paragraph position="0"> An intelligent platform of knowledge deals with knowledge in five primary hierarchies, namely, knowledge citation, knowledge application, knowledge transmitting, knowledge learning and knowledge developing [8]. This division of knowledge hierarchies remarkably adapts the needs of an E-Learning program. In the study of this article, we make a little modification to this division and applied it to the Digital Museum system as follows:</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Texts Images Medias Virtual Realities </SectionTitle> <Paragraph position="0"> Actually, this division is somewhat relative and not absolute. For instance, in some activities defined as knowledge representation and knowledge developing, we may also need to do knowledge citation and applying. However, this division of knowledge hierarchy would help to define the functions of Knowledge Platform and content the needs for knowledge by systems and users. [8] The Digital Museum presents multidimensions according to the three functions of exhibition, education and research. The processing targets, procedures and emphases on Knowledge vary among dimensions.</Paragraph> <Paragraph position="1"> In the dimension of exhibition, system focuses on Knowledge citation and Knowledge representing in the hierarchy above.</Paragraph> <Paragraph position="2"> In the dimension of e-learning, system focuses on the hierarchy of Knowledge applying, learning and teaching, Knowledge Representing and information interaction.</Paragraph> <Paragraph position="3"> In the dimension of computational linguistics research, system emphasizes the hierarchy of Knowledge Mining and Knowledge developing.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Two Types of Integration for Knowledge Objects </SectionTitle> <Paragraph position="0"> After discussing the generating of objects from the texts, we would be interested in how to integrate them for E-Learning.</Paragraph> <Paragraph position="1"> Relating and interacting objects are extracted from texts and stored in the exhibition base. The next phase is to arrange exhibitions by selecting, dividing and integrating these objects, and construct the digital museum interface.</Paragraph> <Paragraph position="2"> There are two key forms of objects integration, tutored and theme-oriented exhibitions and virtual scenarios.</Paragraph> <Paragraph position="3"> In the first form, tutored theme-oriented exhibition, objects relevant to a specific subject or theme are integrated and represented in multimodals. This interface design provides a dynamic exhibition module by grouping texts and their relevant objects in various formats together, providing docent knowledge for this topic and links to relevant topic exhibitions. Learners participate in one exhibition and go through links fitting to their needs or under instructions, thus personalized learning paths are formed.</Paragraph> <Paragraph position="4"> There are two tips in tutored theme-oriented exhibitions. One is &quot;multi-modal&quot;. Personalized exhibitions in our framework enable learning through multi channels, in forms of texts, image, music and virtual reality, etc. Also taking Chinese ancient poetry for example, we first discover the relevant scenes and hidden emotions of a poem, select objects referring to similar scenes and emotions, provide them as background materials and then integrate them with the poem. A more detailed instance is the Auto-matching poems and paintings. The other is &quot;interactive&quot;. In our framework, a learner can add his remarks or discuss in every exhibition topic. These remarks are processed and stored as new relevant objects to this topic. Users can also provide materials or background information to an object or a topic, and can provide their own exhibition plans of new organizations of objects. The system studies the feedbacks and provides users with personalized participation paths.</Paragraph> <Paragraph position="5"> The second integration form is scenarios.</Paragraph> <Paragraph position="6"> Knowledge objects were recorded in texts from their original living environments. By collecting and extracting relevant objects from texts and analytical researching on their relevant environmental elements such as emotions, we are able to put a textual object back to a scene representing its original living environment by rebuilding these origin scenes. Teaching and learning are made easier and more exciting with participating in the original scenes that a topic really lived. With the technology of multimedia and virtual reality, we are able to integrate objects and environmental elements surrounding a specific topic and rebuild a virtual scene, which is represented in our framework as multimedia demonstration, tests and games.</Paragraph> <Paragraph position="7"> These two key integrating patterns organize various formats of objects and represent these integrated exhibitions to users in an interactive and personalized way. It maximizes the educational use of a digital museum on language and literature fields.</Paragraph> </Section> </Section> class="xml-element"></Paper>