File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1026_metho.xml

Size: 11,943 bytes

Last Modified: 2025-10-06 14:11:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-1026">
  <Title>RECOGNITION'OF ABSTRACT OBJECTS - A DECISION THEORY APPROACH WITHIN NATURAL LANGUAGE PROCESSING</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 THE &amp;quot;WAI&amp;quot; AND THE &amp;quot;AIR&amp;quot; PROJECT 1
</SectionTitle>
    <Paragraph position="0"> The DAISY/ALIBABA system \[1\], \[2\], \[3\], as developed at the Technical University Darmstadt analyses abstracts and describes them according to the coordinate indexing philosophy using a prescribed set of descriptors. To perform this task, a domain dependent dictionary is needed. Estimating the non-existence of suitably sized dictionaries to be one of the main problems for research and development of automatic indexing \[4\], in 1978 the WAI project started with dictionary construction. The two completed dictionaries are * FST, covering the scope of food science and technology 3 and * PHYS, covering the scopeofPhysics, a part of INIS (International Nuclear Information SSystem) 4 . - Different procedures for generating dictionary data were developed and applied. To classify them and to unify the created data is one of the main tasks of dictionary construction (described in detail in \[3\], \[4\]). This cannot be done without examination of their influence on the quality of the resulting indexing. To perform indexing tests, the development of DAISY and ALIBABA was another important objective of WAI.</Paragraph>
    <Paragraph position="1"> Indexing results are reported in \[4\], \[5\], \[6\] which are based on consistency tests only, using the manual indexing as a standard. To confirm or to modify these results, the AIR project is now preparing a retrieval test on the physics data base INKA-PHYS of the Fachinformationszentrum FIZ 4 (Energie, Physik, Mathematik; Karlsruhe) (order of magnitude: I0.000 documents, 200 search requests). The indexing will be based upon the new dictionary PHYS-2 which is to be constructed using about 80.000 documents of the INKA-PHYS data base.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="161" type="metho">
    <SectionTitle>
2 THE BASIC PRINCIPLES UNDERLYING THE &amp;quot;WAI&amp;quot;/&amp;quot;AIR&amp;quot; APPROACH
</SectionTitle>
    <Paragraph position="0"> The WAI/AIR approach represents both a specific solution of the indexing problem  162 G. KNORZ .</Paragraph>
    <Paragraph position="1"> and a general framework for a wide class of problems within natural language processing and other fields.</Paragraph>
    <Paragraph position="2"> This paper will only give reference to details of the particular solution published elsewhere. The objective of this work is to present the general framework derivable from the basic principles underlying the WAI and the AIR project: (I) Knowledge bases are very important for problem solving. But to presuppose knowledge for an automatic system must notquestionits applicability, caused by non-existent procedures for construction of knowledge bases of an indispens- null able size. The realistic appropriate solution is the main aim rather than a perfect one.</Paragraph>
    <Paragraph position="3"> (2) Controlling the quality and expenditure of effort of a system must not wait until it is put into practice. System development has to be guided by a control derivable from the task to be performed, (3) The algorithms that make the bases of the procedure should not be assumed to be  perfect. Applied to complex tasks, it is a fundamental fact that they are based on simplified models.</Paragraph>
    <Paragraph position="4"> The principles can be considered to be a guideline for designing application oriented systems. With good reason it is claimed that the quality of such a system can be determined by evaluation in application environments only (see for example \[7\], \[8\]). This cannot be done without empirical studies of the user-system interaction. null The paradigm of recognizing abstract objects presented here is an approach to integrate the evaluation aspect into system development. It is also an approach to problems, for which no perfect solutions exist or seem to be applicable.</Paragraph>
  </Section>
  <Section position="4" start_page="161" end_page="161" type="metho">
    <SectionTitle>
3 RECOGNITION OF ABSTRACT OBJECTS
3.1 THE DEFINITION OF THE RECOGNITION TASK
</SectionTitle>
    <Paragraph position="0"> The basic idea is to use the application environment itself to get an implicit description of the problem. Whenever talking about a particular application environment there is no other way then to take a conceptual model M E as a basis which determines the adequate concepts (see \[9 \], or see also \[1015).</Paragraph>
    <Paragraph position="1"> Here, a conceptual model has to be formulated in this way, that it defines (abstract) objects (x,k), ~EX, k~K. ~ denotes those aspects of an object which can be observed directly with regard to the problem, K denotes a set of object classes. A model m E of the application environment gives an implicit definition of the (recognition) problem, by forming a continuous stream of abstracts objects.</Paragraph>
    <Paragraph position="2"> To develop a recognition system (RS) is nothing more than the finding of a suitable mapping e: x/e(x) that recognizes an actual x to be (x, e(x)).</Paragraph>
    <Paragraph position="3"> If the RS-mEinterface is identical to the system-user interface, then m E may refer to the user's judgement directly, to define the co-occurrence of ~ and k.</Paragraph>
  </Section>
  <Section position="5" start_page="161" end_page="161" type="metho">
    <SectionTitle>
RECOGNITION OF ABSTRACT OBJECTS 163
</SectionTitle>
    <Paragraph position="0"> This is also adequate, whenever human cognitive capabilities are to be simulated.</Paragraph>
    <Paragraph position="1"> We give some examples: * Information retrieval can be based upon recognition of document-query relationships (described in \[6\]). ~ can be represented by (d,f) where d denotes the document, f denotes the query, k may be in the most simple case a member of the set {is relevant, is not relevant}, refering to the user's judgement* * Expressions, possibly within the scope of a quantifier as well as hypotheses for inferences, can both be regarded as abstract objects. Determining the scope of a quantifier or drawing inferences can be based on the recognition of those objects by simulating human decisions* Two other examples are given - avoiding the simulation approach: * Complex tasks often require the testing of many hypotheses, which can be regarded as abstract objects, m E may refer to the final results of the processing.</Paragraph>
    <Paragraph position="2"> * In \[6\] a decision theory approach to optimal retrieval forms a basis for m E , defining the task of indexing as recognition of document-descriptor relationships* null</Paragraph>
    <Section position="1" start_page="161" end_page="161" type="sub_section">
      <SectionTitle>
3.2 STRUCTURE OF THE RECOGNITION SYSTEM
</SectionTitle>
      <Paragraph position="0"> The structure of the recognition system as presented here makes evident tbat the recognition problem arises essentially at the interface of two models:  to the recognition task.</Paragraph>
      <Paragraph position="1"> M I is part of the recognition system (Figure 1). It structures the object using the knowledge base, so that all available aspects that may influence the decision of the RS are included. In many cases it also initiates the recognition process, i.e. it constructs the hypothesis, represented by the object.</Paragraph>
      <Paragraph position="2"> According to M I a formal description x of ~ is produced. We do not consider here 'the nature of M I, that can be a sophisticated one with a strong theoretical foundation as well as a rather simple and heuristic one. Different models M I might cause quite different recognition systems for the same task. The main point is, that M I leads to an object description instead of a decision. Another point is, that both models M E and M I are essentially independent* This fact causes every  in some cases. That means, an 'optimal recognition systems' cannot be defined without taking the number of cases causing faults into consideration or - more precisely - the statistical properties of the application environment represented by m E . The desision theory approach appropriate to the given situation is described in \[5\] and \[6\] with respect to the indexing problem. The approach requires that every single decision of RS is classified. This task is for the most part anticipated by M E , which defines the set of object classes K. K determines the scope of possible faults. Those can be weighted independently by a loss function c: (e(x),k) +w. With the model m E given, a particular recognition system will cause an expected value E(w). The optimal system RS~!tvp isthe result of searchingfor this RsMI E(w). It can be shown that the optimal decision RS~t(x ) can which minimizes W be based on the restricted probabilities p(klx). The mappings ek(x ) = p(klx)can b( approximated by polynomial functions to be constructed automatically using a sample of objects (~,k). This way has been choosen by the ALIBABA system, that uses polynomial classifiers, adapted in the mean square sense \[11\] . The indexing results in \[5\] and \[6\] demonstrate that - applied to the indexing problem - the recognition approach and in particular the method of approximation is adequate for the problem.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="161" end_page="161" type="metho">
    <SectionTitle>
4 DISCUSSION
</SectionTitle>
    <Paragraph position="0"> The approach of recognizing abstract objects is evaluated using the paradig nf automatic indexing. The model ~,I E refers - for practical reasons - not to the</Paragraph>
  </Section>
  <Section position="7" start_page="161" end_page="161" type="metho">
    <SectionTitle>
RECOGNITION OF ABSTRACT OBJECTS 165
</SectionTitle>
    <Paragraph position="0"> retrieval process but to the decisions of human indexers. If a consistency factor (comparing manual and automatic indexing) measures the quality of automatic indexing, the set K requires two elements only. If a more sophisticated evaluation is intended, the set K can be increased, according to the kind of faults that should be considered. The classification of faults can for example depend on the descriptor under consideration.</Paragraph>
    <Paragraph position="1"> For the model M I used see for example \[5\] and \[12\].</Paragraph>
    <Paragraph position="2"> We summarize the essentials of the suggested approach (the first point refers in particular to the indexing paradigm).</Paragraph>
    <Paragraph position="3"> - The recognition problem causes one to regard two independent models: one with respect to retrieval and one with respect to analysis of abstracts. This point of view is important for an approach to optimal indexing \[6 \], but it is not self-obvious. In \[14\] the retrieval oriented approach of Robertson and the indexing oriented approach of Harter \[13\] are brought together. The result is a one model approach like also other approaches in this field (for example \[15\]).</Paragraph>
    <Paragraph position="4"> - The internal model M I is restricted to the base of the decision to be made. This fact makes it very easy to additionally include a lot of knowledge and heuristic procedures, that might play a role only for decision making. There is no risc of causing faults by determining how to compute the decision, using this knowledge.Artificial intelligence approaches use a correspondant model M I to determine the decision \[1G\].</Paragraph>
    <Paragraph position="5"> - The need for a model m E implies an educational aspect with respect to evaluation. ensures, that the gap between the optimal system RsM~ t and the ideal m E system (equivalent to mE) is.under control.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML