File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1213_metho.xml

Size: 12,343 bytes

Last Modified: 2025-10-06 14:07:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1213">
  <Title>Annotating information structures in Chinese texts using HowNet</Title>
  <Section position="3" start_page="0" end_page="86" type="metho">
    <SectionTitle>
2 An Overview of HowNet
</SectionTitle>
    <Paragraph position="0"> HowNet is a bilingual general knowledge-base describing relations between concepts and relations between the attributes of concepts. The latest version covers over 65,000 concepts in  Chinese and close to 75,000 English equivalents. The relations include hyponymy, synonymy, antonymy, meronymy, attribute-host, materialproduct, converse, dynamic role and concept cooccurrence. The philosophy behind the design of HowNet is'its ontological view that all physical and non-physical matters undergo a continual process of motion and change in a specific space and time. The motion and change are usually reflected by a change in state that in turn, is manifested by a change in value of some attributes. The top-most level of classification in HowNet thus includes: entity, event, attribute and attribute value. It is important to point out that the classification is derived in a bottom-up manner. First, a set of sememes, the most basic set of semantic units that are non-decomposable, is extracted from about 6,000 Chinese characters. This is feasible because each Chinese character is monosyllabic and they are meaning-bearing.</Paragraph>
    <Paragraph position="1"> Similar sememes are grouped. The coverage of the set of sememes is tested against polysyllabic concepts to identify additional sememes.</Paragraph>
    <Paragraph position="2"> Eventually, a total of over 1,400 sememes are found and they are organized hierarchically.</Paragraph>
    <Paragraph position="3"> This is a closed set from which all concepts are defined. The bottom-up approach takes advantage of the fact that all concepts, either current or new, can be expressed using a combination of one or more existing Chinese characters. It is yet to f'md a new concept that has to resort to the creation of a new Chinese character. Therefore, by deriving the set of sememes in a bottom-up fashion, it is believed that the set of sememes is stable and robust enough to describe all kinds of concepts, whether current or new. The fact that HowNet has verified this thesis over 65,000 concepts is a good proof of its robustness.</Paragraph>
    <Section position="1" start_page="85" end_page="86" type="sub_section">
      <SectionTitle>
2.1 Types of Relation
</SectionTitle>
      <Paragraph position="0"> The definition of a concept in HowNet expresses one or more of the following relations.</Paragraph>
      <Paragraph position="1">  There are a total of 71 dynamic roles defined in HowNet. Dynamic role resembles case role in case grammar (Fillmore, 1968). However, it differs from case role in that it is concerned with all probable actants of an event and the roles they play in the event. The issue of whether these actants can be realized grammatically is not its concern. For example, Concept(l): IJ~g~ (be a vegetarian for religious reasons) DEF=eatlI~, patient=vegetablel~, religionl~J~ At the syntactic level, &amp;quot;1~&amp;quot; is an intransitive verb. According to case grammar, it has only one case role: agent. However, for this word, the patient is self-contained in its constituent (i.e. &amp;quot;~&amp;quot; ). HowNet specifies this explicitly and indicates the category ('vegetable'4) of prototypical concepts which fills up this role. Another distinguishing feature of dynamic role is its use in defining concepts of 'entity&amp;quot; class. Concept(2): ~\[!~ (writing brush) DEF=Penlnkl~l~, *writel~ Through the use of the &amp;quot;*&amp;quot; pointer, the above definition states that the concept being defined (~!~) is the instrument of the event type &amp;quot;write'.</Paragraph>
      <Paragraph position="2"> HowNet also uses dynamic role to specify the attributes that a concept contains. For example, Concept(3): ~:~ (arise suddenly) DEF=happenl~Z~ :, manner=suddenly The definition of concept (3) specifies that the manner of the event is 'sudden'.</Paragraph>
      <Paragraph position="3">  The 'event' and 'entity' classes in HowNet are organized in a hierarchical manner. The parent class is a hypernym of its children classes. Details of the organization are available from the HowNet site and are therefore omitted here.</Paragraph>
      <Paragraph position="4">  pointer &amp;quot;%&amp;quot; . For example, Concept (4): ~.~ (CPU) DEF=partl~, %computerl~J~, heartl,~,  The class of the-concept &amp;quot;t~5~:~&amp;quot; is 'part'. It is a part of the class 'computer'. The function of the part &amp;quot;t:l~SI~&amp;quot; is the 'heart' of the whole 'computer'.</Paragraph>
      <Paragraph position="5">  Material-product relation is expressed through the pointer &amp;quot;?&amp;quot; . For example, Concept (5): ~,~ (knitting wool) DEF=matefialltf~t, ?clothingl~ &amp;quot;~,~&amp;quot; belongs to the class 'material'. It is a material for the product 'clothing'.</Paragraph>
      <Paragraph position="6">  Attribute-host relation is expressed by the pointer &amp;quot;&amp;&amp;quot; . For example, Concept (6): ~--~ (face) DEF=attributelJ~, reputationl~, &amp;humanl),., &amp;organizationl~\]~,~ &amp;quot;~:-~:&amp;quot; is an attribute; in particular, it is about the attribute 'reputation'. The hosts could be 'human' as well as 'organization'.</Paragraph>
      <Paragraph position="7">  Some concept typically co-occurs with certain concept. For example, Concept (7): ~:~ (lawless person) DEF=humanl)~., fiercely, efimel~l~, #policel~, undesiredl~ The typical context where the concept &amp;quot;~-~ ~t~&amp;quot; is used involves the concept 'police'. This type of relation is expressed by the pointer</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="86" end_page="88" type="metho">
    <SectionTitle>
3 Information Structures
</SectionTitle>
    <Paragraph position="0"> Dong (2000) uses the example &amp;quot;~_~l~:.~ \[\] &amp;quot; (Narcotic drugs smuggling group) to illustrate what information structure is.</Paragraph>
    <Paragraph position="1"> Describing the structure of this phrase at the syntactic level, such as the analysis of Penn Treebank (Xue, 1999: 72-77), only reveals that it is a noun phrase with the head of &amp;quot;~.~\[\]&amp;quot; modified by a relative clause &amp;quot;~_~.h&amp;quot; which involves operator movement. At the semantic level of description, we would indicate that &amp;quot;~\[\]&amp;quot; (group) is the agent of the event &amp;quot;~_~.L&amp;quot; (smuggle) and &amp;quot;~&amp;quot; (Narcotic drugs) is the pat/ent of &amp;quot;~_~&amp;quot; (smuggle).</Paragraph>
    <Paragraph position="2"> The informaton structure of this example consists of two parts, the dependency relations and the HowNet definitions. The descriptions are as follows: Dependency ~ \[patient\] &lt;--:~_~--\[agent\] ~\[\] relations: Definitions: ~: medicinel~:jqe~J, ?addictivel~ ~.L: transportl~l_I~, manner= secretly, crimel~l~ ~\[\]: communityl\[\]~ In this example, the descriptions specify that a 'community&amp;quot; is an agent involved in a 'transport' event transporting the patient &amp;quot;medicine'. Furthermore, the 'transport&amp;quot; event is a 'crime' and the manner is 'secret'. The 'medicine' is a material of 'addictive' products. The arrow between two concepts is a dependency connection with the concept pointed to by the arrow denoting the dependent and the concept at the other end as the governor. The name of the dependency relation is enclosed in a square bracket and it could appear at either the dependent or the governor side.</Paragraph>
    <Paragraph position="3"> Currently, over 60 types of information structure have been defined. The pattern of information structure is specified in the following format: (sememe) \[DRel\] ~ \[DRel\] (sememe), where DRel means the name of a dependency relation.</Paragraph>
    <Paragraph position="4"> For the dependency relation to apply, the governor and the dependent must satisfy the requirement of the sememes. Table 1 shows a  subset of the information structures. Information structures are derived in a bottom-up fashion from analysing the mechanisms used in the composition of words. This approach is based on the insight that mechanisms used in word formation are also applicable to phrase and sentence construction in Chinese. For example, the type &amp;quot;(l l ltime) levent) &amp;quot; applies to the formation of the following units at various levels of linguistic  In the process of annotating the corpus, the coverage of information structure types at the phrase and sentence levels was evaluated and missing types are added. The new types arise mainly due to function words. For example, the type &amp;quot;(modalityl ~ ~) \[modalityl ~ ~ \] &lt;-(~'f~:levent)&amp;quot; is due to the use of function words such as &amp;quot;~j~,&amp;quot; (must) and &amp;quot;~-~&amp;quot; (must). These are words expressing the attitude of the speaker of an utterance towards an event.</Paragraph>
    <Section position="1" start_page="87" end_page="88" type="sub_section">
      <SectionTitle>
3.1 An example
</SectionTitle>
      <Paragraph position="0"> We annotated a subset of the Sinica corpus (version 3.0) of 30,000 words with information structures. The corpus includes 103 newspaper texts covering the crime domain. The annotation has been completed and is currently under verification. We expect to release the corpus and the annotation guideline at the end of this year.</Paragraph>
      <Paragraph position="1"> An example of our annotation is shown below and its information structures are shown in Figure 1 at the end of this paper. The difference between this work and the work reported in Gan and Tham (1999) lies in the addition of the dependency relations into the annotation.</Paragraph>
      <Paragraph position="2">  with the Criminal Investigation Depathuent of the Xinhua police branch of Tainan county, committed suicide by shooting himself yesterday afternoon,&amp;quot; The hierarchical structure in Figure 1 is another way to represent the relation between governor and dependent, as illustrated in Figure 2. C1 immediately dominates C2, indicating that C1 is the governor and C2 the dependent. The relation between them is either R1 or R2. R1 is located at the same level as C 1 and R2 is located at the same level as C2. These two possibilities could also be represented linearly as shown in (2).</Paragraph>
      <Paragraph position="3">  (2) Cl \[RI\] ---~ JR2\[ C2  R2 between the two concepts C1 and C2 should be read as &amp;quot;C2 is the R2 of CI&amp;quot; . For example, &amp;quot;T~=&amp;quot; (afternoon) is the .//me of &amp;quot;~&amp;quot; (raise). R1 between C1 and C2 should be interpreted as &amp;quot;C1 is the RI of C2&amp;quot; . For example, the &amp;quot;time&amp;quot; between &amp;quot;~&amp;quot; (after) and &amp;quot;~\]~&amp;quot; (suicide) should be interpreted as &amp;quot;'~t&amp;quot; is the t/me of &amp;quot;~I~&amp;quot; The HowNet definitions of the concepts in (1) are provided in Table 2: s A string of Chinese characters ending with a punctuation nmrk is regarded as a unit for information structure annotatioIL  timell~\[hq, pastil, dayl El timeliest, aftemoonl~F:  li~tl~-~weaponl~, *firingl~ suicidel\[~ timel\[!.~, futurel~ {punclC/~} The structures in Figure 1 and Table 2 reveal the following information: (a) example (1) is about the time after a  'suicide&amp;quot; event; (b) preceding the 'suicide' event is the event ' raise'; (c) the time of the 'raise' event is &amp;quot;1~'I~&amp;quot;1 r q=&amp;quot; , the agent is &amp;quot;;~k3~&amp;quot; and the patient is a &amp;quot;weapon'; (d) the 'occupation' of &amp;quot;@~J~&amp;quot; is &amp;quot;1~ :\]~&amp;quot; which is an &amp;quot;official' of &amp;quot;secondary' importance and &amp;quot;;~hk3~J~&amp;quot; belongs to the 'institution' &amp;quot;-~-,~-~-~lJ~l:~\[&amp;quot; ; (e) the location of the 'institution' &amp;quot;~-~J-~qJ ~&amp;quot; is at &amp;quot;~'~g~Jr~J6&amp;quot;  This kind of representation enables a computer to analyse texts at a deeper level of understanding. As an English and Chinese bilingual eornmon-sense knowledge system, HowNet can contribute much to better text understanding and machine translation (Dong 1999).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML