Overview of Patent Retrieval Task at NTCIR-3 
Makoto Iwayama 
Tokyo Institute of 
Technology/Hitachi Ltd. 
iwayama@crl.hitachi.co.jp 
Atsushi Fujii 
University of Tsukuba/Japan 
Science and Technology Corp.
fujii@slis.tsukuba.ac.jp 
Noriko Kando 
National Institute of 
Informatics 
kando@nii.ac.jp 
Akihiko Takano 
National Institute of 
Informatics 
aki@acm.org 
 
 
Abstract 
We describe the overview of patent re-
trieval task at NTCIR-3. The main task was 
the technical survey task, where participants 
tried to retrieve relevant patents to news ar-
ticles. In this paper, we introduce the task 
design, the patent collections, the character-
istics of the submitted systems, and the re-
sults overview. We also arranged the free-
styled task, where participants could try 
anything they want as far as the patent col-
lections were used. We describe the brief 
summaries of the proposals submitted to the 
free-styled task. 
1 Introduction 
In the field of information retrieval, there have 
been held successive evaluation workshops, such 
as TREC [8], CREF [1], and NTCIR [5], to build 
and utilize various kinds of test collections. In the 
Third NTCIR Workshop (NTCIR-3), which was 
held from June 2001 to December 2003, a serious 
effort was first made in the “Patent Retrieval Task” 
to explore information retrieval targeting patent 
documents. 
The goal of Patent Retrieval Task is to provide 
test collections for enhancing research on patent 
information processing, from patent retrieval to 
patent mining. Although there exist many com-
mercial patent retrieval systems and services, pat-
ent retrieval has not been paid much attention in 
the research field of information retrieval. One of 
the reasons is the lack of test collection on patent. 
TREC used patent documents as a part of the 
document collections, but there was no treatment 
specially applied to the patent collection. 
In SIGIR2000, the first workshop on patent re-
trieval was held [4] and there were many fruitful 
discussions on the current status and future direc-
tions of patent retrieval. The workshop convinced 
us that there was the need of test collections spe-
cifically for patents. 
We then asked for PATOLIS Co. [7] to provide 
patent collections for the patent retrieval task. Con-
sequently, we could release three kinds of patent 
collections; those were two years’ Japanese full 
texts, five years’ Japanese abstracts, and five 
years’ English abstracts. At the same time, we 
could fortunately have cooperation with JIPA (Ja-
pan Intellectual Property Association) [3] in creat-
ing search topics and assessing the relevance. 
Since each member of JIPA belongs to the intellec-
tual property division in her/his company, they are 
all experts in patent searching. All the above 
contributions enabled us to kick off the first 
evaluation workshop designed for patent 
information processing. 
There are various phases and aspects in patent 
information processing. For example, various 
kinds of users (researchers, patent searchers, busi-
ness managers, and so on) search patents for vari-
ous purposes (technical survey, finding conflicting 
applications, buying/selling patents, and so on). 
Corresponding to each situation, an appropriate 
search model should be developed. The standard of 
the relevance judgments may also depend on each 
situation. In some cases, retrieving relevant patents 
is not enough but further analysis on the retrieved 
patents might be necessary. For example, creating 
a patent map of a product would clarify the patent 
relations between the techniques used to make the 
product. Cross-lingual patent retrieval is also im-
portant when applying patents to foreign countries. 
All of these are within scope of our project and this 
task was the first step toward our goal. 
2 Task Design 
In this workshop, we focused on a simple task of 
technical survey. End-users we assumed in the task 
were novice users, for example, business managers. 
The major reason of adopting such general task 
was that we could only use the two years’ full texts 
that were not enough for trying more patent-
oriented task like finding conflicting applications 
from patents. 
 
 
Figure 1: Scenario of technology survey 
 
To fit the task to a real situation, we used Japa-
nese news articles as the original sources of search 
topics, so the task was conducting cross-database 
retrieval, searching patents by news articles. The 
task assumed the following situation that is de-
picted in Figure 1. When a business manager looks 
through news articles and is interested in one of 
them, she/he clips it out and asks a searcher to find 
related patents to the clipping. The manager passes 
the clipping to the searcher along with her/his 
memorandum, and this clipping with memorandum 
became the search topic in this task. The memo-
randum helps the searcher to have the exact infor-
mation need the manager has, when the clipping 
contains non-relevant topics or the clipping has 
little description on the information need. Task 
participants played the role of the searcher and 
tried to retrieve relevant patents to the clipping. 
Since the purpose of the searching was technical 
survey, the claim part in patent was not treated 
specifically in assessing the relevance. Patent 
documents were treated as if those were technical 
papers. 
Cross-database retrieval itself is so general that 
techniques investigated in the task can be applied 
to various combinations of databases. This is an-
other purpose of the task. 
We prepared search topics in four languages, 
Japanese, English, Korean, and Chinese (both tra-
ditional and simplified). Participants could try 
cross-lingual patent retrieval by using one of the 
non-Japanese topics. Unfortunately, only two 
groups submitted cross-lingual results and both of 
them used English topics. 
In addition to the technical survey task ex-
plained so far, we arranged the optional task, 
where participants could try anything they want as 
far as they used the patent collections provided. 
One of the purposes of this free-styled task is to 
explore next official tasks. 
3 Characteristics of Patent Applications 
In this section, we briefly review the characteristics 
of patent applications (patent documents). 
• There are structures, for example, claims, 
purposes, effects, and embodiments of the 
invention. 
• Although the claim part is the most impor-
tant in patent, it is written in an unusual style 
especially for Japanese patent; all the sub-
topics are written in single sentence. 
• To enlarge the scope of invention, vague or 
general terms are often used in claims. 
• Patents include much technical terminology. 
Applicants may define and use their original 
terms not used in other patents. 
• There are large variations in length. The 
longest patent in our collections contains 
about 30,000 Japanese words! 
• The search models would be significantly 
different between industries, for example, 
between chemical / pharmaceutical indus-
tries and computers / machinery / electric 
industries. 
• Classification exists. IPC (International Pat-
ent Classification) is the most popular one. 
• The criterion of evaluation depends on the 
purpose of searching. For example, high re-
call is required for finding conflicting appli-
cations. 
• In some industries, images are important to 
judge the relevance. 
Our task focused on few of the above character-
istics. We treated patent documents as technical 
documents rather than legal statements, so we did 
not distinguish between the claim part and the oth-
ers in assessing the relevance. High recall was not 
necessary, so we used the standard averaged preci-
sion to evaluate the results. Few groups used struc-
tures and classifications. Images were not included 
in the patent collections provided. 
4 Patent Collections 
PATOLIS Co. provided and we released the fol-
lowing patent collections. 
• kkh: Publication of unexamined patent ap-
plications (1998, 1999) (in Japanese) 
• jsh: JAPIO Patent Abstracts (1995–1999) 
(in Japanese) 
• paj: Patent Abstracts Japan (1995– 1999) (in 
English) 
“Kkh” contains full texts of unexamined patent 
applications in Japanese. Images were eliminated. 
“Jsh” contains human edited abstracts in Japanese. 
Although all the texts in “kkh” have the abstracts 
written by the applicants, experts in JAPIO (Japan 
Patent Information Organization) [2] short-
ened/lengthened about half of them to fit the length 
within about 400 Japanese characters. They also 
normalized technical terms if necessary. “Paj” is 
English translation of “jsh”. 
translation by  
human experts 
 
modification of the original abstracts by 
human experts (JAPIO) 
 
kkh: (98,99) 
Publication of 
unexamined patent 
applications  
(in Japanese) 
 
jsh: (95-99) 
JAPIO Patent 
Abstracts 
(in Japanese) 
 
paj: (95-99) 
Patent Abstracts 
Japan 
(in English) 
 
 
Figure 2: Relationships between the patent col-
lections 
 
Figure 2 shows the relationships between these 
three collections. Here, we see parallel relations, 
for example, full texts vs. abstracts, original ab-
stracts vs. edited abstracts, and Japanese abstracts 
vs. English abstracts. Researchers can use these 
parallel collections for various purposes, for exam-
ple, finding rules of abstracting, creating a term 
normalization dictionary, acquiring translation 
knowledge, and so on. 
Table 1 summarizes the characteristics of the 
three collections. 
 
kkh jsh paj 
Type Full text Abstract Abstract 
Language Japanese Japanese English 
Years 98,99 95-99 95-99 
Number of 
documents
697,262 1,706,154 1,701,339
Bytes 18139M 1883M 2711M 
 
Table 1: Characteristics of the patent collections 
5 Topics 
JIPA members created topics, six for the dry run 
and 25 for the formal run. Since the topics for the 
dry run were substantially revised after the dry run, 
we decided to re-use those in the formal run. In 
consequence, we had the total 31 topics for the 
formal run. 
Figure 3 is an example of the topics in English 
and Table 2 shows the explanations of the fields in 
the topics. In our task, <ARTICLE> and 
<SUPPLEMENT> correspond to the news clipping 
and the memorandum respectively. 
The topics also contain <DESCRIPTION> and 
<NARRATIVE> fields we are familiar with. Since 
many NTCIR tasks already have the results for 
using <DESCRIPTION> and <NARRATIVE> 
fields, we can compare our results of using these 
fields with the results of other tasks. 
Along with the grade of relevance (i.e., “A”, 
“B”, “C”, or “D”), each judged patent has a mark 
(“S”, “J”, or “U”) representing the origin from 
which the patent was retrieved. Table 3 explains 
about the marks. For example, a document with 
“BJ” means that the document was judged as “par-
tially relevant” (i.e. “B-“) and only found by ex-
perts in their preliminary search (i.e., “-J”). 
Here, note that all the submitted runs contrib-
uted to collecting the “S” patents, but only the top 
30 patents for each run were used. Note also that 
we can restore the patent set retrieved by the man-
ual search (i.e., “PJ” set) by collecting “J” and “U” 
patents. 
 
 

 � � � � � �
;,*/5636.@ �:<9=,@ � �
,=0*, �;6 �1<+., �9,3(;0=, �4,90;: �)@ �*647(905. �
*6+,: �:<*/ �(: �)(9*6+,: �>0;/ �,(*/ �6;/,9 � �

 �
 �
 �
 �
 �  � � � � � � �  � �
 �
 � � � �
 �
6*0,;@ � �
 �
 �6 � � �
 � � �  � � �
 �	 �36:; �( �3(>:<0; �-69 �709(*@ �-03,+ �)@ �

 �(; �62@6 �0:;90*; �
6<9; � � �
 � �   � � � � � � � � � �
 �5 �:,;;3,4,5; �6- �;/, �3(>:<0; �-03,+ �)@ �
 �

 � � �;/, �;6@ �4(5<-(*;<9,9 � �(.(05:; �	 �
 � � � � �: �
*647,5:(;065 �6- � � � � �4033065 �-69 �+(4(.,: �-69 �05-905.,4,5; �
6- �( �*(9+ �.(4, �7(;,5; � �;/, �62@6 �0:;90*; �
6<9; �69+,9,+ �
	 �;6 �7(@ �()6<; � � � � �4033065 �65 �;/, � � �;/ � �/, �79,:0+ �
05. �1<+., � �9 � � 6:/0@<20 �690 � �05+0*(;,+ �;/(; �:64, �-<5* �
;065: �05*3<+05. �2,@ �67,9(;065 �-69 �;/, � �<7,9 �	(9*6+, �
(9: � �4050 �.(4, �4(*/05, �4(5<-(*;<9,+ �(5+ �:63+ �)@ �	 �

 � � � � �05 �<3@ � � �   � �;6 �(9*/ � � �   � �-,33 �<5+,9 �;/, �
 �;,*/50*(3 �9(5., �6- �( �7(;,5; �30*,5:,+ �;6 �
 �

 � � � � � �
 � �
 �
 �
 �
,;,9405(;065 �6- �=0*;69@ �69 �+,-,(; �)@ �*64 �
7(905. �,(*/ �6;/,9 �: �=(3<,: �)(:,+ �65 �*6+,: �-964 �)(9*6+, �
9,(+05.: �+6,: �56; �*65-30*; �>0;/ �;/, �7(;,5; � � �

/(; �205+ �6- �+,=0*,: �+,;,9405,: �3,(+,9: �69 �
=0*;69: �)@ �9,(+05. �:,=,9(3 �*6+,: �:<*/ �(: �)(9*6+,: �(5+ �
*647(905. �;/, �=(3<,: �*699,:765+05. �;6 �;/,:, �
*6+,: �
 �
 �<7,9 �	(9*6+, �(9: � �0: �( �;@7, �6- �4050 �.(4, �
4(*/05, �>/,9, �9,*69+,+ �)(9*6+,: �(9, �9,(+ �05 �*(9+: �-,( �
;<905. �*/(9(*;,9: �(5+ �;/, �.(4, �796*,,+: �05 �:,40 �9,(3 �
;04, �)@ �67,9(;05. �6--,5*, �(5+ �+,-,5:, �2,@: � �(473, �*6+,: �
05*3<+, �)(9*6+,: �(5+ �4(.5,;0* �*6+,: � �)<; �:/(33 �56; �), �
+,-05,+ �(: �3040;,+ �653@ �;6 �;/,:, � � �


0.5 � �)(9*6+, � �*6+, � �:<7,90690;@ �69 �05-,9069 �
0;@ � �=0*;69@ �69 �+,-,(; � �*647(90:65 � �1<+.4,5; �

 �
 � � � � � � � � � � � � � �
 �
 �
 
Figure 3: Example of the topics 
 
 
Field Explanation 
<LANG> Language code 
<PURPOSE> Purpose of search 
<TITLE> Concise representation 
of search topic 
<ARTICLE> MAINICHI news article 
in NTCIR format 
<SUPPLEMENT> Supplemental informa-
tion of news article 
<DESCRIPTION> Short description of 
search topic 
<NARRATIVE> Long description of 
search topic 
<CONCEPT> List of keywords 
<PI> Original patents of news 
article 
 
Table 2: Explanations of the fields in topics 
6 
6.1 
Results Overview 
Participants 
Eight groups submitted the 36 runs. One group 
submitted runs only for pooling. We briefly de-
scribe the characteristics of each group. Refer to 
the proceedings of Patent Retrieval Task [6] for 
each detail. 
LAPIN: This group focused on the “term distil-
lation” in cross-database retrieval, where the dif-
ference between the term frequency in source 
database and that in target database was integrated 
into the overall term weighting. 
SRGDU: This group tried several pseudo rele-
vance feedback methods in the context of patent 
retrieval. The proposed method using Taylor for-
mula was compared with the traditional Rocchio 
method. 
daikyo: This group made long gram-based in-
dex from the patent collections. Compared with the 
traditional gram-based indexing, proposed method 
produce more compact index. 
DTEC: This group searched various kinds of 
abstracts rather than full texts, and compared the 
effectiveness of those. The abstracts were JAPIO 
patent abstracts and the combinations of “title”, 
“applicant’s abstract”, and “claims”. Manual and 
automatic runs were compared. 
DOVE: This group also submitted manual and 
automatic runs. In the manual runs, non-relevant 
passages in <ARTICLE> were eliminated manu-
ally. 
IFLAB: This group evaluated their cross-
lingual IR system PRIME through several mono-
lingual runs. They also evaluated their translation 
extraction method by using Japanese-US patent 
families, which were not provided in this task. 
brkly: This group submitted both monolingual 
and cross-lingual runs. In the cross-lingual runs, 
words in English topics were translated into Japa-
nese words by using English-Japanese dictionary 
automatically created by the aligned bilingual cor-
pus (i.e., “paj” and “jsh”). Their method of creating 
the dictionary is based on word co-occurrence with 
the association measure. 
sics: This group also submitted cross-lingual 
runs, where they automatically created a cross-
lingual thesaurus form the aligned bilingual corpus, 
“paj” and “jsh”, and used the thesaurus for word-
based query translation. The Random Indexing 
vector-space technique was used to extract the 
cross-lingual thesaurus. Note that, in both the 
“sics” and the “brkly” groups, there was no mem-
ber who understands Japanese. 
6.2 
6.3 
6.4 
7 
Recall/Precision 
The recall/precision graphs of the mandatory runs 
are shown in Figure 4, and those of the optional 
runs in Figure 5. In each figure, there are both re-
sults for the strict relevance (“A”) and the relaxed 
relevance (“A” + “B”). For each run in the figures, 
brief system description is specified; the descrip-
tion includes the searching mode (automatic or 
manual), the topic fields used in query construction, 
and the topic language. 
Topic-by-topic Results 
Figure 6 shows the median of the average preci-
sions for each topic. Figure 7 shows the breakdown 
of the relevance judgments. Detailed analysis on 
each topic will be given by JIPA, where it will be 
discussed about the reasons why systems could not 
find some patents human experts found and vise 
versa. 
Recall of the relevant patents retrieved in 
the preliminary human search 
Figure 8 shows the recall of the relevant patents 
retrieved in the preliminary human search. In the 
process of making pool, we used only the top 30 
documents for each run. Here, we extracted more 
documents from each run and investigated how 
many human retrieving relevant patents could be 
covered by the systems. 
Optional (Free-styled) Task 
The following two groups applied to the optional 
task. Refer to the proceedings of Patent Retrieval 
Task [6] for each detail. 
CRL: This group investigated the method of 
extracting various rules from the existing align-
ments in patents. The “diff” command of UNIX 
was used to find the alignments between JAPIO 
patent abstracts and the original abstracts by appli-
cants, between claims and embodiments, and be-
tween different claims in an application. 
TIT: This group focused on the unusual style of 
Japanese claims, and tried to automatically struc-
ture the claims to raise the readability of claims. 
Rhetorical structure analysis was applied for this 
purpose. 
8 Summary and Future Directions 
In this paper, we described the overview of patent 
retrieval task at NTCIR-3. We are planning to con-
tinue our effort for the next patent retrieval task 
along with the following directions. 
• Longer range of years will be covered. 
• Purpose of search would shift to more real 
one, for example, searching conflicting ap-
plications.  
Acknowledgements 
 
We are grateful to PATOLIS Co. for providing the 
patent collections of this task. We also thank all the 
members of JIPA who created the topics and as-
sessed the relevance. Without their expertise in 
patent, this task would not be realized. Lastly, we 
thank all the participants for their contributions to 
this task. 

References 
[1] CLEF (Cross Language Evaluation Forum) 
(http://clef.iei.pi.cnr.it/) 
[2] JAPIO (Japan Patent Information Organization) 
(http://www.japio.or.jp/) 
[3] JIPA (Japan Intellectual Property Association) 
(http://www.jipa.or.jp/) 
[4] ACM-SIGIR Workshop on Patent Retrieval, or-
ganized by Mun-Kew Leong and Noriko Kando, 
2000. 
(http://research.nii.ac.jp/ntcir/sigir2000ws/) 
[5] NTCIR (NII-NACSIS Test Collection for IR Sys-
tems) 
(http://research.nii.ac.jp/ntcir/index-en.html) 
[6] Proceedings of the Third NTCIR Workshop on 
Research in Information Retrieval, Automatic Text 
Summarization and Question Answering, 2003. 
[7] PATOLIS Co.                                       
 (http://www.patolis.co.jp/e-index.html) 
[8] TREC (Text Retrieval Conference) 
(http://trec.nist.gov/) 
