File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-2003_metho.xml

Size: 8,209 bytes

Last Modified: 2025-10-06 14:09:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2003">
  <Title>A Hybrid Chinese Language Model based on a Combination of Ontology with Statistical Method</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Artificial IntelligenceP
</SectionTitle>
    <Paragraph position="0"> [9] P, it was closed combined with natural language processing and are widely applied in many field such as knowledge engineering, digital library, information retrieval, semantic Web, and etc.</Paragraph>
    <Paragraph position="1"> In this paper, combining with the characteristic of ontology and statistical method, we present a hybrid Chinese language model. In this study, we determined the structure of Chinese language model and evaluate its performance with two groups of experiments on texts reordering for Chinese information retrieval and texts similarity computing.</Paragraph>
    <Paragraph position="2"> The rest of this paper is organized as follows. In section 2, we describe the Chinese language model. In section 3, we evaluate the language model by several experiments about natural language processing. In section 4, we present the conclusion and some future work.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="14" type="metho">
    <SectionTitle>
2 The language model description
</SectionTitle>
    <Paragraph position="0"> Traditional SLM is make use to estimate the likelihood (or probability) of a word string, in  this study, we determined the structure of Chinese language model, first, we gave the ontology description framework of Chinese word and the representation of Chinese lingual ontology knowledge, and then, automatically acquired the usage of a word with its co-occurrence of context in using semantic, pragmatics, syntactic, etc from the corpus to act as Chinese lingual ontology knowledge bank. In actual document, the usage of lingual knowledge will be gotten from lingual ontology knowledge bank.</Paragraph>
    <Section position="1" start_page="13" end_page="13" type="sub_section">
      <SectionTitle>
2.1 Ontology description framework
</SectionTitle>
      <Paragraph position="0"> Traditional ontology mainly emphasizes the interrelations between essential concept, domain ontology is a public concept set of this domainP null [10] P. We make use of this to present Chinese lingual ontology knowledge bank.</Paragraph>
      <Paragraph position="1"> In practical application, ontology can be figured in many waysP [11] P, natural languages, frameworks, semantic webs, logical languages, etc. Presently, popular models, such as Ontolingua, CycL and Loom, are all based on logical language. Though logical language has a strong expression, its deduction is very difficult to lingual knowledge. Semantic web and natural language are non-formal, which have disadvantages in grammar and expression.</Paragraph>
      <Paragraph position="2"> For a Chinese word, we provided a framework structure that can be understood by computer combined with WordNet, HowNet and Chinese Thesaurus. This framework includes a Chinese word in concept, part of speech (POS), semantic, synonyms, English translation. Figure1 shows the ontology description framework</Paragraph>
    </Section>
    <Section position="2" start_page="13" end_page="14" type="sub_section">
      <SectionTitle>
2.2 Lingual ontology knowledge representation
</SectionTitle>
      <Paragraph position="0"> A word is the basic factor that composes the natural language, to acquire lingual ontology knowledge, we need to know POS, means and semantic of a word in a sentence. For example, for a Chinese sentence, the POS, means and Semantic label of &amp;quot;Da &amp;quot; in HowNet are shown in table 1. For the Chinese sentence &amp;quot;Wai Guo You Ke Lai Bei Jing You Wan . &amp;quot;, after words segmented, POS tagging and semantic tagging, we get a characteristic string. They are shown in table 2.</Paragraph>
      <Paragraph position="1">  sents not to be defined or exist this semantic in HowNet.</Paragraph>
      <Paragraph position="2"> In order to use and express easily, we gave a description for ontology knowledge of every Chinese word, which learned from corpus, to be shown as expression 1. All of them composed the Chinese lingual ontology knowledge bank.</Paragraph>
      <Paragraph position="3">  tic relation pair between the keyword and its co-occurrence in current context.</Paragraph>
      <Paragraph position="4"> The multi-grams of a Chinese word in context, including the co-occurrence and their position will act as the composition of lingual ontology knowledge too. In figure 2, the charac-</Paragraph>
    </Section>
    <Section position="3" start_page="14" end_page="14" type="sub_section">
      <SectionTitle>
2.3 Lingual ontology knowledge acquisition
</SectionTitle>
      <Paragraph position="0"> According to the course that human being acquires and accumulates knowledge, we propose a measurable description for Chinese lingual ontology knowledge through automatically learning typical corpus. In this approach, we will acquire the usage of a Chinese word in semantic, pragmatic and syntactic in all documents. We combine with the multi-grams in context including its co-occurrence, POS, semantic, synonym, position. In practical application, we will process every Chinese keyword that has the same grammar expression, semantic representation and syntactic structure with Chinese lingual ontology knowledge bank.</Paragraph>
      <Paragraph position="2"> B in the document set {D}, we treat the sentence that includes keyword as a processing unit. First, we have a Chinese word segmentation, POS tagging, Semantic label tagging based on HowNet, and then, confirm a word to act as the keyword for acquiring its co-occurrence knowledge. We wipe off the word that can do little contribution to the lingual ontology knowledge, such as preposition, conjunction, auxiliary word and etc.</Paragraph>
      <Paragraph position="3"> Step 2: Unify the keyword.</Paragraph>
      <Paragraph position="4"> Making use of the ontology description of Chinese word, we make the synonym into uniform one.</Paragraph>
      <Paragraph position="5"> Step 3: Calculate the co-occurrence distance. In our proposal, first, we treat the sentence that includes keyword as a processing unit and make POS tagging, semantic label tagging, then, we get Characteristic string. We take the key-word as the center, define the left and right dis- null Where, m and n represent the left and right number of word that centered with the keyword. In this way, we try to get the language intuition, in a word, if the co-occurrence is nearer to the keyword, we will get more the co-occurrence distant. Final, we respectively get the left-side and right-side co-occurrence distant from key-word to its co-occurrence to be shown as for- null B that appear in corpus and act as the average co-occurrence distance  In order to improve the processing speed, for acquired lingual ontology knowledge bank, we first build an index according to Chinese word, and then, we respectively make a sorting according to the semantic label SemB</Paragraph>
      <Paragraph position="7"> Chinese word.</Paragraph>
      <Paragraph position="8">  In practical application, we will respectively get different evaluation of a document from the lingual ontology knowledge bank. For the natural language processing, e.g. documents similarity computing, text re-ranking for information retrieval, information filtering, the general processing is as follow.</Paragraph>
      <Paragraph position="9"> Step 1: Pre-processing and unify the keyword. null The processing is the same as Step 1 and Step 2 in section 2.3.1.</Paragraph>
      <Paragraph position="10"> Step 2: Fetch the average co-occurrence distance from lingual ontology knowledge bank. We regard a sentence including keyword in document D as a processing unit. First, we make POS tagging, semantic label tagging and get Characteristic string, and then, for every keyword, if it has the same semantic relation pair as lingual ontology knowledge bank, i.e. the key-word and its co-occurrence (SemB</Paragraph>
      <Paragraph position="12"> practical document is the same one as lingual  will act as the evaluation value of current document.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML