File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-2002_metho.xml
Size: 16,271 bytes
Last Modified: 2025-10-06 14:08:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-2002"> <Title>Intelligent patent analysis through the use of a neural network: experiment of multi-viewpoint analysis with the MultiSOM model</Title> <Section position="4" start_page="2" end_page="2" type="metho"> <SectionTitle> 3. The MultiSOM model </SectionTitle> <Paragraph position="0"> The communication between self-organizing maps that has been first introduced in the context of an information retrieval model [10], represents a major amelioration of the basic Kohonen SOM model. From a practical point of view, the multi-map display introduces in the information analysis the use of viewpoints. Each different viewpoint is achieved in the form of map. Each map is a spatial order in which the information is represented into nodes (classes) and spatial areas (group of classes).</Paragraph> <Paragraph position="1"> The multi-map enables a user to highlight semantic relationships between different topics belonging to different viewpoints. Each map represents a particular viewpoint. Figure 4 of section 4 illustrates it.</Paragraph> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 3.1 The viewpoint paradigm </SectionTitle> <Paragraph position="0"> The viewpoint building principle consists in separating the description space of the documents into different subspaces corresponding to different keyword subsets. The set of V all possible viewpoints issued from the description space D of a document set can be defined as:</Paragraph> <Paragraph position="2"> represents a viewpoint and P(D) represents the set of the parts of the description space of the documents D; the union of the different viewpoints constitutes the description space of the documents.</Paragraph> <Paragraph position="3"> The viewpoint subsets issued from V may be overlapping ones. Moreover, they may also fit into the structure of the document when they correspond to different vocabulary subsets associated to different documents subfields, if any. Other viewpoints may be also manually extracted from an overall document description space. At last, the viewpoint model is flexible enough to tolerate document descriptions belonging to different media, as soon as these descriptions can be implemented by description vectors (for ex. an image can be simultaneously described both by a keyword vector and by color histogram vector).</Paragraph> <Paragraph position="4"> The inter-map communication mechanism, which is described hereafter, takes directly benefit of the above described viewpoint model in order to overcome the low quality problem inherent to a global classification approach while conserving a overall view on the interaction between the data.</Paragraph> </Section> <Section position="2" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 3.2 Inter-map communication mechanism </SectionTitle> <Paragraph position="0"> In MultiSOM, this inter-map communication is based on the use of the data that have been projected onto the maps as intermediary nodes or activity transmitters between maps. The intercommunication process between maps operates in three successive steps. Figure 1 shows graphically the three steps of this intercommunication mechanism.</Paragraph> <Paragraph position="1"> At the step 1, the original activity is directly set up by the user on the node or on the logical areas of a source map through decisions represented by different scalable modalities (full acceptance, moderated acceptance, moderated rejection, full rejection) directly associated to nodes activity levels. This procedure can be interpreted as the user's choices to highlight (positively or negatively) different topics representing his centers of interest relatively to the viewpoint associated to the source map. The original activity could also be indirectly set up by the projection of a user's query on the nodes of a source map. The effect of this process will then be to highlight the topics that are more or less related to that query. The activity transmission protocol, which corresponds to the steps 2 and 3 of the inter-map communication mechanism, is extensively described in [24].</Paragraph> <Paragraph position="2"> To perform in the best conditions, the inter-map communication process obviously necessitates that a significant part of the data should play that roles between the maps. This last condition could be easily verified if each vector used for the map generation indexes a significant part of the bibliographic database.</Paragraph> <Paragraph position="3"> Source signal: direct user activation or query matching activation nodes of the source map. [3] The activity is transmitted through the data nodes to other maps to which these data are associated. Positive as well as negative activity could be managed in the same process. Note that the data are in this case indexed document.</Paragraph> </Section> </Section> <Section position="5" start_page="2" end_page="2" type="metho"> <SectionTitle> 4. Application </SectionTitle> <Paragraph position="0"> In the two preceding sections we have introduced MultiSOM after having previously presented the SOM algorithm. In this section, we shall then use a real example, to make some of the notions more concrete. We argue that visualization into form of a set of maps represents an important added-value for analysis in the technology watching tasks, as well as in science watch, and in knowledge discovery in databases. Our example is a set of 1000 patents about oil engineering technology recorded during the year 1999.</Paragraph> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 4.1 The analysis phase </SectionTitle> <Paragraph position="0"> The role of the MultiSOM application has been firstly planed by the domain expert in order to get answers to such various kinds of questions on the patents that: 1: &quot;Which are the relationships between the patentees?&quot; 2: &quot;Which are the advantages of the different oils?&quot;, 3: &quot;Does a patentee works on a specific engineering technology, for which advantage and for which use?&quot;, 4: &quot;Which is the technology that is used by a given patentee without being used by another one?&quot;, 5: &quot;Which are the main advantages of a specific oil component and do this advantages have been mentioned in all the patents using this component?&quot;. An analysis carried out on all the possible types of question led the expert to define different viewpoints on the patents that could be associated to different closed semantic domains appearing in these questions. One of the main aim of the expert was to be able to use each viewpoints separately in order to get answers to domain closed questions (like questions 1,2) while maintaining the possibility of a multi-viewpoint communication in order to get answers to multi-domain questions (like questions 3,4,5) that might also contain negation (like question 4). The specific viewpoints which have been highlighted by the expert from the set of possible questions are: 1: Patentees, 2: Title (often contains information on the specific components used in the patent), 3: Use, 4: Advantages.</Paragraph> <Paragraph position="1"> A fifth &quot;global viewpoint&quot; which represent the combination of all the specific ones is also considered in order to perform our comparison between a global classification mechanism, of the WEBSOM type, and a pure viewpoint-oriented classification mechanism, of the MultiSOM type.</Paragraph> </Section> <Section position="2" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 4.2 The technical realization </SectionTitle> <Paragraph position="0"> The role of this phase consists in mapping the four specific viewpoints highlighted by the domain expert in the preceding phase in four different maps. A preliminary task consists in obtaining the index set (i.e. the vocabulary set) associated to each viewpoint from the full text of the patents. This task has been itself divided into three elementary steps.</Paragraph> <Paragraph position="1"> At the step 1, the structure of the patent abstracts is parsed in order to extract the subfields corresponding to the Use and to the Advantages viewpoints . At the step 2, the rough index set of each subfield is constructed by the use of a basic computer-based indexing tool [4]. This tool extracts terms and noun phrases from the subfield content according to a normalized terminology and its syntactical variations. It eliminates as well usual language templates. At the step 3, the normalization of the rough index set associated to each viewpoint is performed by the domain expert in order to obtain the final index sets. The normalization of the Title, Use and Advantages subfields consists in choosing a single representative among the terms or noun phrases which represent the same concept (for ex., &quot;oil fabrication&quot; and &quot;oil engineering&quot; noun phrases will be both assimilated to the single &quot;oil engineering&quot; noun phrase). The normalization of the Patentees viewpoint is operated in the same way considering that the same firm can appear with different names in the set of published patents.</Paragraph> <Paragraph position="2"> The Patentees and Title subfields are directly represented in the original patent structure and therefore do not necessitate any extraction. After the construction of the final index sets, the patents are re-indexed separately for each viewpoint thanks to these sets. Figure 2 presents a patent abstract including its generated multi-index.</Paragraph> <Paragraph position="3"> The following task consists in building the maps representing the different viewpoints, using the map algorithm described in section 2. Before these step, a classical IDF-Normalization step [27] is applied to the index vectors associated to the patents in order to reduce the influence of the most widespread terms of the indexes. For each specific viewpoint a map of 10x10 nodes (classes) is finally generated. Two global maps representing global unsupervised classifications, of the WEBSOM type [7], of the patents are also constructed. The index sets of these maps represent the union of the index sets of all the specific viewpoints. They only differ one to another by the number of their classes. The first one (GlobMin) is constrained to have the same number of classes as the viewpoint maps (i.e. 100 classes). The second one (GlobMax) is constrained to have to sum of the number of classes of all the viewpoint maps (i.e. it becomes a 20x20 map comprising 400 classes). The table 1 summarizes the results of the patent indexation and the map building. A single viewpoint map resulting from the map building process is presented at the figure 4.</Paragraph> <Paragraph position="4"> Some remarks must be made concerning the results shown in table 1. (1) The index count of the Title field is significantly higher than the other ones. An analysis of the indexes shows that the information contained in the patent titles is both sparser, of higher diversity, and more precise than the ones contained in the Use and Advantages fields.</Paragraph> <Paragraph position="5"> Thanks to the expert opinion, the high level of generality of the Use and Advantages fields, which consequently led to poorer generated indexes, could be explained as an obvious strategy of the Patentees for indirectly protecting their patents. (2) The number of final patentees (i.e. 32) has been significantly reduced by the expert as compared to the one initially generated by the computer-based indexing tool. The main part of this reduction is not due to variations in patentee names. It is related to the fact that the prior goal of the study was to consider the main companies and their relationships. Thus, the patentees corresponding to small companies have been grouped into a same general index: &quot;Divers&quot;. (3) On the Patentees map, the number of classes is close to the final number of retained patentees. Most of these patentees will then be associated to separate classes on the Patentees map. (4) Only 62% of the patents have an Advantages field and 75% a Use field.</Paragraph> <Paragraph position="6"> Consequently, some of the patents will not be indexed for the all the expected viewpoints. The role of the mechanism of communication between viewpoints (see next section) will then be to generate indirect evaluation of the contents of these patents on their missing viewpoints through their for the above patent abstract corresponds to the &quot;Final indexation&quot; field. The terms of the generated multi-index are prefixed by the name of the viewpoint to which they are associated: &quot;adv.&quot; for the Advantages viewpoint, &quot;titre.&quot; for the Title viewpoint, &quot;use.&quot; for the Use viewpoint, &quot;soc.&quot; for the Patentees viewpoint. organized as a square 2D grid of nodes. The viewpoint chosen for the showed map is the &quot;Advantages&quot; viewpoint. The names of the classes illustrate the topics (considering the chosen viewpoint) that have been highlighted by the learning. After the learning, the nodes related to the same topics have been grouped into coherent areas thanks to the topographic properties of the map. The number of nodes of each area can then be considered as a good indicator of the topic weight in the database. Topics or areas near one to another represent related notions. For example, the &quot;extending oil live&quot; area shares some of its borders with the &quot;black sludge control&quot; area on the map. The proximity of these two areas illustrates the fact that oil duration strongly depends of maintaining a low level of sludge in it. The surrounding circles represent the centers of gravity of the areas. the area corresponding to the TONEN CORP. company on the Patentees map and to propagate the activity to the thematic maps associated to the Use, Advantages and Title viewpoints corresponds to a &quot;viewpoints crossing query&quot; whose explicit formulation might look like: &quot;I want to know which are the specific areas of competence (concerning oil use, oil composition and expected advantages) of the TONEN CORP. company, if there are. The MultiSOM application let him interactively find that TONEN CORP. company is a specialist of the lubrication of the automatic transmissions [arrow ndeg2 on the map] and that it adopted for this kind of lubrication sulfur-containing organo-molybdenum compound [arrow ndeg1] whose main advantages are to provide oil with a friction coefficient that is stable on a wide range of temperature [arrow ndeg3]. In this case, an inverted propagation from the target topics should be also used to verify that these topics only belong to TONEN CORP. areas of competence. The whiter is the color of a node representing a map class (topic), the higher is its resulting activity.</Paragraph> </Section> <Section position="3" start_page="2" end_page="2" type="sub_section"> <SectionTitle> Use 4.3 Inter-map communication for analysis </SectionTitle> <Paragraph position="0"> In comparison with the standard mapping methods, as such as principal component analysis, multidimensional scaling or WEBSOM global SOM analysis, the advantage of the multi-map displays is the inter-map communication mechanism that MultiSOM environment provides to user. Each map is representing a viewpoint. Each viewpoint is representing a subject category. The inter-map communication mechanism assisted the user to cross information between the different viewpoints. In both cases, the responses of the system are given both through activity profiles on the maps and through patents examples associated to the most active class representatives of these maps. The estimation of the quality of thematic deduction is achieved through an evaluation of the activity focalization on the target maps (see [13]). The figure 4 illustrates a thematic deduction between the four different viewpoints of the study.</Paragraph> </Section> </Section> class="xml-element"></Paper>