File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2098_metho.xml
Size: 8,515 bytes
Last Modified: 2025-10-06 14:12:09
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-2098"> <Title>A Animal B Female Animal C Concrete D Male Animal E 'S' + 'L' F Female tluman G Gas It ltuma~ I Inanimate J Movable K Male ('D' + 'M') L Liquid M Male Human N Not Movable O 'A' + 'II' P Plant Q Animate R Female ('B' + 'P)</Title> <Section position="3" start_page="460" end_page="462" type="metho"> <SectionTitle> 4 Comparison between Result of Ex- </SectionTitle> <Paragraph position="0"> traction and BOX Code The thesmlrus produced from LDOCE by the key noun and key verb extraction programs is all approximate one, and, obviously, contains several errors. The key noun of abbreviation 1, for example, is shorler in table 5, because the current program ignores ing-formed words. However, it should be making. (Even if we (:hanged the extraction algorithm, still we have a problem that making is not a simple noun, but a gerund. We need to define noun-verb semantic relations.) To evaluate the quality of the produced thesaurus, the noun part of the thesaurus has been compared with the semantic markers in LDOCE.</Paragraph> <Paragraph position="1"> 461.</Paragraph> <Paragraph position="2"> (esp. formerly) a building in wMch Christian meu (monk <s> ) or women (nun <s> ) live shut away from other people and work as a group for God monastery > or convent the group of people living in such a building a large church o~ house that was once such a building the act of making shorter a shortened \]orm of a word, often one used in writing</Paragraph> <Section position="1" start_page="462" end_page="462" type="sub_section"> <SectionTitle> 4.1 Semantic Markers in L1)OeJ~ih PS~0~ (C~de </SectionTitle> <Paragraph position="0"> The magnetic version of LDOCE has a~ spech;~l field retatcd ~o semantic markers, which is called as BOX code tields, :,A~,,h,q@t it does not appear in the printed version of LI_)(){7~\]. Some o~! the BOX code field (called BOX1, tbr hlstance) express ~z-;ma~,~t~c restrictions for a noun governed by a verb or an adjective, ~,,d ~, semantic classification of a nolm. For exampl% the semaC/4ic re striction for a subject of the verb ~0 travel is marked ~_~ '~b~m~o~'; the noun person is classified as 'H? Th~ shows the,J, ~,h~ verb g0 lravel may govern the noun per,~on in its snbjec~ po.~i~,io~. 'Lhe LDOCE uses 34 markers for expressing ~h~ restrictio~ ('~:~i,le 3).</Paragraph> <Paragraph position="1"> These semantic markers have a hierarci~y as shown in fi,% ure 2. ~br example, 'Human' , 'Plant', and 'A~dmaF are sub.</Paragraph> <Paragraph position="2"> elassificatior, s of 'animate (Q)? In the following part of this s&tion, the comparison betwee~ semantic markers of LDOCE and the thesaurus constrn&ed ti:o~, ~he definitions of nouns in LDOCE is discussed ikon~ ~;he view ...... ~0,~, '~'.,. N,m~,:, it/fi:~rkcA ~; Q (~nimate) and V (plant .{. animal)</Paragraph> </Section> </Section> <Section position="4" start_page="462" end_page="463" type="metho"> <SectionTitle> II\[W BI KN DF </SectionTitle> <Paragraph position="0"> n~l~. developed under the influence tff man leu.~i tlta)l the usual size ~,ure of b~ceds pta~kg~ V ~fe ghe very sta~:Al ~o~m~l of plant and ~ixaal life that live in watee ~,~M,~' K at&,r.d a male Eey_~9,Lyr.anidna ! *~ta!e g, mai~al a fcntale pe~2L(~iy!~l Oa.~'ea~ I\[ mothe~ the I~L~.~I._rn_p~_~ of a peraou poi,;; ~ff ti6::J ; derard~y. E.-:l>c~:ially the nous related to '.Animate', ~ ~deg ~ Nouns rdated to the concept animate have a relatively rumple st,nctnre in the thesaurus, us auimat~ is often used ~s an example (:d ~C/ the~uaar~_s.like system. Example~ of the words marked as ':~~fimi~te (Q)' a~,(l rela~ed ~mims, c~pecia,lly marked ms 'plant q-v.*d'md (V)', ~.re ,<~how~~ in table 9.</Paragraph> <Paragraph position="1"> The pro&aced thes~.mus contains more than 60% of the words mw&cd a>s eimple concepts, such as 'plant' (table 10), %.nimal', a~(t 'h..man (persm,~ in definitions)'~ i~ correct positions. As shown in t.ble 10, for example, 645 words are traversed from 'PS$:~ble 10: N(nms Related to (Living) Thing aml Plant (~ins) thi.~ .... phu~ (P) i, hc,~ i*~ tim pmduoed thesaurus; 370 words (62.4%) of these wosds a~c i~arked au 'Pin,it2 l~owever, the produced tlmsaurus does not capture disjuneIive coucelAs ~a(h ~s %hiram or plant (V) ~ correctly. In the definition of cro~b','eed (table 9), the produced thesaurus only uses plant a~ v. key nom~, and ingores a~lffmal. This is a typical problem hi ~.he current produced th~aurus.</Paragraph> <Paragraph position="2"> No~e tln..t the disth~ction between 'animate (Q)' and 'animal o~. pl~.nt (V} ~ (animate without human) .~enm to tie difficult for the lexico~r;i:aphe~'s: bl~ed is marked as Q; cwssbreed, however, is 4PS-i N~'a~s }Y~arked ~,~ ~abs~lYacU ~.~ LDOC~3 really nouns (about 40%, table 8) are marked as ':.<bsS,~h'ozC, ~md fltey are not classified into more detailed subcl~.~:~:o 0~ ~he other hand, fimction nouns work as a key for ~b, ch~:dtic~tk,ia i~i the produced thesaurus, ha r~ction a.2, some of the function nouns are listed as action, star% amount trod degree. The~e function nouns classify abstract nouns.</Paragraph> <Paragraph position="3"> For example, there are 597 nouns whose function noun is ilct, and 584 nouns (97%) of them are marked as 'abstract'; there are 398 nouns whose function noun is state, and 391 nouns (98%) of them are 'abstract.' The distinction between <state' and 'act', h)r instance, is useful for natural language processing in general.</Paragraph> <Section position="1" start_page="462" end_page="462" type="sub_section"> <SectionTitle> 4.4 Nouns Marked as 'Inanimate' </SectionTitle> <Paragraph position="0"> Some 'Inanimate' nouns are correctly identified in the produced thesaurus (table 11). Especially, 39% of nouns under the noun liquid have 'Liquid' markers, and 56~ of nmms under the noun gas have 'Gas' markers.</Paragraph> <Paragraph position="1"> However, many <Inanimate' nouns are defined by substance in LDOCE. Sub-classification of these noun is expr(~sed with a compound word (or an adjective) as shown in table 11: coke is a solid substance; fluorine is a non-metallic substance. Since the currect extraction program does not handle a compound word, the thesaurus cannot express these classification.</Paragraph> </Section> <Section position="2" start_page="462" end_page="463" type="sub_section"> <SectionTitle> 4.5 Other Typical Nouns </SectionTitle> <Paragraph position="0"> Several typical nouns in the produced thesaurus are also compared with markers of LDOCE. Because the current system can.not distinguish senses of nouns, nouns which have several different senses causes a problem. A typical example is found in the definitions whose key noun is case. As shown in table 12, altache ease and tesl ease are both defined by case; these expr~ses corn pletely different concept. In 30 nouns whose key noun is case, a gas that is a simple substance (ELEMENT), without colour or smell, that is lighter than Mr and that burns very emsiiy the most common liquid,' without colour, taste, or smell, wtlk:h falls from the sky as rain, forms rivers, lakes, and seas, and is drunk by people and animals the solid substance that remains after gas has been removed from coal by heating a non-metallic substance, na~l, in the form of a poisonous pale greenish-yellow gas</Paragraph> <Paragraph position="2"> attache case J case ryinga thinpapershard case with a handle, for cartest case T case a case in acourt of law which establishes a particular principle and is then as a standard against which other eases can</Paragraph> </Section> </Section> <Section position="5" start_page="463" end_page="463" type="metho"> <SectionTitle> L FF DF </SectionTitle> <Paragraph position="0"> strong rough cloth used for tent, sails, bags, etc.</Paragraph> <Paragraph position="1"> a strong cotton cloth used esp. for jeans type a type of strong cloth, usu. woven from wo01, and used esp. for suits, coats, and dresses type a type of coarse woolen cloth woven form threads of several different colours 16 nouns are 'movable (J)', and 14 nouns are 'absTract.' Difficulity of semantic marking is also found. For example, lexicographers could not mark 'movable (J)' and 'Solid' systematically. For example, some nouns whose key noun is cloth are marked as 'Solid', and others are marked as 'movable (J)' (table 13). This is a problem in gathering of semantic information itself.</Paragraph> </Section> class="xml-element"></Paper>