File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/90/c90-2032_intro.xml
Size: 13,875 bytes
Last Modified: 2025-10-06 14:04:47
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-2032"> <Title>Sentence disambiguation by document preference sets oriented</Title> <Section position="2" start_page="0" end_page="185" type="intro"> <SectionTitle> 2. The concept of DoPS 1. Introduction </SectionTitle> <Paragraph position="0"> Ambiguity in sentence interpretation is a major problem in natural language processing(NLP). Conventional NLt' systems often use ad hoc or extremely large knowledgebases (pragmatic / semantic / commonsense) to eliminate ambiguities. Such syslems are too slow and sometimes provide iacomplete analyses. They have the further handicap lhat very large knowledgebases are t~eeded. Asking the user for confirmation \[Nishida 1982\] is a practical solution to get correct parse-trees, but this confirmation is ~ot useful l'or further computations. A practical NLP system should produce accurate results automatically while using a simple method and simple knowledge.</Paragraph> <Paragraph position="1"> Preference models \[Petitpierre 1987, Fass 1983, Schubert 1984\], such as preference semantics, scoring, and syntactic preference are good candidates for a practical NLP system, because these models utilize simple ready-made knowledge like semantic markers or case frame dictionaries. The most difficult problem with preference models is the selection of the most appropriate preference knowledge that will induce a correct interpretation. However, preference knowledge extracted from a large corpus or an on-line dictionary \[Jensen 1987\] induces preference knowledge conflicts which block complete disambiguation.</Paragraph> <Paragraph position="2"> Syntactic rules are capable of producing many sentence parse -trees. These parse-trees are syntactically correct, but most are incorrect from the view points of semantic meaning, contextual meaning, common-sense, specific field knowledge. It is necessary to use appropriate knowledge (semantic / contextual / commonsense / specific field) to eliminate the incorrect interpretations. For example, consider passage 1 of Figure 1. There are two possible interpretations for the gerund-phrase attachment.</Paragraph> <Paragraph position="3"> (1) The power supply(~u-it,b for charging ~ tt~ ravine a volta~e-temr~erature coefficient .... ... (Passage 1;begining of target sentence) the voltage-temperatm'e coefficient of being charged .... !.. ~1' (Passage 2;part of target sentence) k._ j</Paragraph> <Section position="1" start_page="0" end_page="185" type="sub_section"> <SectionTitle> People with electrical-engineering </SectionTitle> <Paragraph position="0"> knowledge know that batteries have voltage-</Paragraph> <Paragraph position="2"> temperature coefficients, not circuits. However if specific field knowledge is lacking, it is difficult to determine which is correct.</Paragraph> <Paragraph position="3"> The notion of the DoPS is to utilize preference knowledge which can be extracted from other sentences of the target document or other documents. Documents sometimes contain paraphrases and the same or similar expressions. These expressions can contain several kinds of knowledge (semantic / contextual / commonsense / specific field).</Paragraph> <Paragraph position="4"> Sentence disambiguation can be based on such knowledge. For example, from passage 2 (which was written in another part of the target sentence(l)), it is clear that the coefficient of voltage-temperature is a property of the battery, thus the beginning of sentence (1) can be disambiguated.</Paragraph> <Paragraph position="5"> This notion will be useful for any NLP stage, but it will be especially useful for dependency structure analysis. A DoPS is a collection of plausible combinations of phrases or words. To eliminate conflicts of preference knowledge, a hierarchical structure of preference knowledge is adopted in the DoPS.</Paragraph> <Paragraph position="6"> Figure 2 shows a hierarchical structure of a DoPS. The domains are, in order of increasing priority, language, application, field, author, document, paragraph, and sentence.</Paragraph> <Paragraph position="7"> A priority sentence domain High from the target ~ ............ ~. /t document p.m2.al~rap, h &quot; domain * /~ /~document domain X 1 V / author domain k from other / ...... field&quot; ~t;main&quot; ........ X documents /- ............................... &quot;k / ...... application domain ......... k. i ......... ............ 't degw several documents in the same field. We consider that the knowledge associations held in the document, paragraph, sentence domains are more reliable than those in other domains. DoPS entries of document, paragraph, sentence domains are acquired from the target document during the analysis, others can be prepared before analysis. For example, in Figure 3, if the author of document B is the same as document A, same DoPS entries of author, field, application, language domains are used in the analysis. Other domains, that is, sentence, paragraph, document domain are acquired during the analysis.</Paragraph> <Paragraph position="8"> By using such domain structured preference knowledge, the system can extract the most plausible interpretation.</Paragraph> <Paragraph position="9"> Figure 4 shows DoPS system flow diagram. First, the system starts analyzing the dependency structure of the target sentence with conventional syntactic rules. From each confirmed dependency relation, DoPS system develops a knowledge association or entry.</Paragraph> <Paragraph position="10"> The language domain in a DoPS contains general language preference extracted from a large database, such as a word corpus or on-line dictionary. In the application domain (e.g. patent claim sentences, news papers, editorials, manuals), there exists application dependent phrases or word relations. In the field domain (e.g. electrical engineering, chemistry, agriculture), there exists field specific phrases or word relations. The author domain include author's characteristics as shown in his writing. A author often write on The DoPS entries are similar to the dependency relationships in dependency grammar, but two expansions have been made: -semantic expansion -coordination expansion Semantic expansion ensures that for efficient use of DoPS, the dependency relationships will be expanded into semantic dependency relationships. Ill passage 3, the entry 3 is extracted as a dependency relation between instances. These will be semantically expanded by using an ordinary thesaurus dictionary(e.g. Roget's thesaurus). For example, the thesaurus category number of &quot;battery&quot; is 160 and the broader-word is &quot;POWER&quot;. This means the word &quot;battery&quot; is a member of a word group named &quot;POWER&quot;. This word group contains &quot;power pack&quot;, &quot;charger&quot;, &quot;condenser&quot;, and so on. It is assumed that the same dependency relation will be valid for other members of the same word group.</Paragraph> <Paragraph position="11"> Passage 5 can be validate by entry 3 from</Paragraph> <Paragraph position="13"> ;&quot;condenser&quot; is the same word group as ;&quot;battery&quot; 'Fhe other expansion is to exchange the intermediary expressions (usually prepositional words or verb). The transformation rules of intermediary expressions will be written in the DoPS system like</Paragraph> <Paragraph position="15"> Coordination expansion means that a DoPS like preference sets can be constructed using coordinated relationships between the coordinated sentence constituents. Using the coordinated constituents of preference sets, ambiguous constituents can be uniquely resolved, if the same type of coordinated sentence exists somewhere else in the target document or other documents.</Paragraph> <Paragraph position="16"> In passage 7, it is clear that &quot;records&quot; and &quot;files&quot; is coordinated constituents. Preference sets for coordinated constituents is extracted as Entry 7. Using entry 7, the coordination in passage 8 is disambiguated.</Paragraph> <Paragraph position="17"> Even when semantically expanding the disambiguous dependency relations, ambiguities sometime persist. If ambiguous parts remain, the system adds ambiguous entries to the DoPS. In any domain, the execution priority of disambiguous entries is, of course, higher than that of ambiguous entries. Thus tile target candidate is analyzed with disambiguous entries first. After that, if ambiguities still persists, the ambiguous entries are used.</Paragraph> <Paragraph position="18"> Finally deterministic rules, such as right association or minimal attachment, must be used to eliminate any remaining ambiguity. 3. The DoPS system for Japanese dependency analysis In this section, we describe the implementation of the DoPS system of Japanese dependency analysis.</Paragraph> <Paragraph position="19"> A DoPS system was implemented for Japanese dependency analysis and, because patent claim sentences have a tendency to use many similar expressions, the target documents were Japanese patent claim sentences. The implemented system restricted the application domain to patent claim sentences and activated only the application and higher domains. Figure 5 shows the implemented system. If dependency analysis using syntactic rules can resolved all sentence ambiguities, execution was stopped and DoPS entries were not created.</Paragraph> <Paragraph position="20"> The syntactic rules used here were the general dependency rules and affiliated-word rules. The general dependency rules are (1) dependency relationships must not cross and (2)each verb doesn't have same case. The affiliated-word rules are given in table 1 which represents the connection between the governor and the dependant. In Japanese, the governor is the word units, BUNSETSU, which modifies another BUNSETSU, called the dependant. The properties of governor can be determined from the last post-positional word and are dependant on the last independent word in the BUNSETSU.</Paragraph> <Paragraph position="21"> The acquisition of DoPS entries begins after syntactic analysis is completed. The system analyzes the sentence structure within a document and chooses the disambiguated parts as entries as well as converting all dependency relationship candidate~ into ambiguous entries. For example, if the system executes syntactic analysis and finds passage 9 disambiguous, then the acquisition process creates entry 9.</Paragraph> <Paragraph position="22"> &quot;/&quot; indicates that this can be used in reversed relationships.</Paragraph> <Paragraph position="23"> After all entries are extracted from the target document, the system executes coordination analysis. The constituents are picked up using the similarity of constituent and conjunction &quot;to&quot;, &quot;ya&quot;, and &quot;mataha&quot; as a clue. If the coordination analysis fails to elimiuate all ambiguity, constituents are determined from coordinated constituents of preference sets. After coordination analysis is completed, punctuation BUNSETSU analysis starts. In patent claim sentence, punctuation marks are used mainly for a restriction of the nearest dependency relation not for emphasis. Finally, disambiguation of the dependency structure is commenced. In the disambiguation process, first the disambiguous entries are compared against the ambiguous parts of the sentence. The most similar dependency relation is selected as the correct relation. During the disambiguation process, disambiguated knowledge associations are added to the DoPS. If there are many candidates of similar relations, the highest scoring candidate is selected. In one domain, first disambiguous then ambiguous entries are applied. The Japanese deterministic rule to is to choose the nearest dependency relation.</Paragraph> <Paragraph position="24"> Using this rule, all ambiguous relations will be disambiguated.</Paragraph> <Paragraph position="25"> 4. System empirical results To test the effectiveness of the implemented DoPS system, we analyzed 10 real Japanese patent claim sentences; a total of nearly 7,000 words. These sentences were randomly selected from the computer and control systems region (the International patent classification G06F).</Paragraph> <Paragraph position="26"> Only half of tile dependency relations will determined before the disambiguation by DoPS. After the disambiguation by DoPS performed, we obtained an averaged accuracy of 93%(accuracy is defined as the number of right dependency relationships / the number of dependency relationships). Finally by using the deterministic rule, we obtained an averaged accuracy of 97 %. A simple system, using only deterministic rule, can obtain the average accuracy only 84%. Compared to this simple system, the sentence dependency analysis of our DoPS system can disambiguate with a high degree of accuracy, without needing a large knowledgebase.</Paragraph> <Paragraph position="27"> In this experiment, most errors occurred during coordination analysis and disambiguation. Therefore, it is necessary to resolve coordination problems and to achieve more accurate disambiguation with DoPS. A more accurate DoPS system requires the elimination of useless and wrong entries. In the DoF'S disambiguation process, utilization of dependency relations from case frame dictionaries is also needed.</Paragraph> <Paragraph position="28"> Using a DoPS system for Japanese dependency analysis, we obtained an average accuracy of 97%. Compared to the 84% accuracy of simple analysis, it is clear that DoPS is more accurate. Furthermore, the concept of DoPS can also be applied to other NLPs such as MT \[Tanaka 1990\].</Paragraph> </Section> </Section> class="xml-element"></Paper>