Patent Claim Processing for Readability
- Structure Analysis and Term Explanation -
Akihiro SHINMORI
Department of
Computational
Intelligence and
Systems Sciences,
Tokyo Institute of
Technology, and
INTEC Web and
Genome Informatics Co.
shinmori@isl.intec.co.jp
Manabu OKUMURA
Precision and
Intelligence
Laboratory,
Tokyo Institute of
Technology
oku@pi.titech.ac.jp
Yuzo MARUKAWA
Japan Science and
Technology Corp., and
National Institute of
Informatics
maru@nii.ac.jp
Makoto IWAYAMA
Precision and
Intelligence
Laboratory,
Tokyo Institute of
Technology, and
Hitachi, Ltd.
iwayama@pi.titech.ac.jp
Abstract
Patent corpus processing should be cen-
tered around patent claim processing be-
cause claims are the most important part
in patent specifications. It is common that
claims written in Japanese are described in
one sentence with peculiar style and word-
ing and are difficult to understand for ordi-
nary people. The peculiarity is caused by
structural complexity of the sentences and
many difficult terms used in the descrip-
tion. We have already proposed a frame-
work to represent the structure of patent
claims and a method to automatically an-
alyze it. We are currently investigating a
method to clarify terms in patent claims
and to find the explanatory portions from
the detailed description part of the patent
specifications. Through both approaches,
we believe we can improve readability of
patent claims.
1 Introduction
The importance of intellectual property, specifically
patent, is being recognized more than ever. In the
academia, patent is being considered as the core
component for technology transfer to industry. With
the upsurge of business method patents and software
patents, more and more business persons are con-
cerned about patent.
Patent is described in patent specification which is
a kind of legal documents. The most important part
of patent specification is where the claims are writ-
ten, because “the claims specify the boundaries of
the legal monopoly created by the patent” (Burgun-
der, 1995). Therefore, we believe that patent corpus
processing should be centered around patent claim
processing.
It is common that Japanese patent claims are de-
scribed in one sentence with peculiar style and word-
ing and that they are difficult to read and under-
stand for ordinary people. After surveying related
literature and investigating NTCIR3 patent collec-
tion (Iwayama et al., 2003), we found the difficulty
has two aspects: structural difficulty and term diffi-
culty.
In this paper, we first present the characteristics
of patent claims. Next, we present our work on the
structure analysis of patent claims. Third, we intro-
duce our on-going research on term explanation for
patent claims.
2 Characteristics of Patent Claim
Typical Japanese patent claims taken from two
patents are shown in Figure 1 and 2.
In general, Japanese sentences are inserted with
the touten “z”or“|” (comma) and end with the
kuten “{”or“}” (period) . The touten plays a
role of segmenting the sentence for disambiguating
the meaning and for improving readability. Accord-
ing to the literature (Maekawa, 1995), the average
length of Japanese sentences is 55.85 characters in
newspaper articles on politics and 75.37 characters
on social affairs articles.
The claims of Figure 1 and 2 are both written
in one sentence. Though they are appropriately in-

�^	�t������������`o	tl
w^���sO^�;tSMoz
�G^�;w^
�;�t^;b��Y�U	Zb��YU	Z	�
qz\w�YU	Z	�wU	Z�t ah	*�

:w	���	Z�b�Hw	*�
:!�+qzp
��YU	Z	�wU	Z�t ah	*�
:w�
���	Z�b�Hw	*�
:!�+qz
�G
Hw	*�
:!�+T�	Z�^��	���
�G
Hw	*�
:!�+T�w���w	Z�8i
Z=$t	Z�b�!�	�qz\w!�	�
w	Z�	��t ao	���C
\b�	��C
\	
�q�
�Zh\q���qb�^�;w
�^;
>
�	��
\
R
��{
Figure 1: A sample Japanese patent claim
(Publication Number=10-011111)
�n���U�������t��`h	�6w&
a��-b��-	�qz<nl>

�G�-	�w-ALt,nMoz
�G
�n���w
�����	��	rg�z8$t
��b�
�����	��	�qz��Qh\q
���qb������������{
Figure 2: A sample Japanese patent claim con-
taining a newline (Publication Number=10-146993)
(Note: <nl> means a newline.)
serted with the touten “z”, they are unusually long
with the length of 295 characters and 119 charac-
ters. It is definitely true that most Japanese who are
not accustomed to reading patent claims have diffi-
culty in reading them. In fact, according to (Kasuya,
1999), Japanese patent attorneys themselves recog-
nize that Japanese patent claims are difficult to read.
The salient characteristics of Japanese patent
claims from the viewpoint of readability are as fol-
lows:
1. The length of sentence is long.
2. The structure of description is complex.
3. There are several terms which are difficult to
understand or requires explanation for under-
standing.
To examine the first point, we extracted all of the
first claims of the sample data (59,968 patents) in the
NTCIR3 patent collection, and calculated the aver-
age sentence length. We found that it is 242 char-
acters and confirmed that Japanese patent claims are
unusually long.
With regard to the second point, we surveyed
several books and articles written for patent appli-
cants to explain how to draft patent claims(Kasai,
1999; Kasuya, 1999) and how to translate patent
claims(Lise, 2002).
Based on the survey, we classify the description
style into the following three. [Note: In the follow-
ing explanation, Japanese phrases are followed by
their literal expression in [] and their English trans-
lation in (). ]
Process sequence style As in “...`[shi](does), ...
`[shi](does), ...`h[shita] (and does)...”|
the sequence of processes is described}Mainly
used in method inventions.
Element enumeration style As in “...q[to](and),
...q[to](and), ...qT�s�[to kara
naru](comprising), ...”, the set of element is de-
scribed. Mainly used in product inventions.
Jepson-like style As in “...tSMo[ni oite](in), ...
���qb�[wo tokuchou to suru](be charac-
terized by), ...”, the description consists of the
first half part and the last half part. In the first
half part, either the known or the precondition
part is described. In the last half part, either the
new or the main part is described
1
.
These patterns are not mutually exclusive. For ex-
ample, the first half part of the Jepson-like style may
be written in the process sequence style or in the el-
ement enumeration style.
With regard to the third point, Figure 1 contains
the term “�������”(an actuator) and Figure
2 contains the term “
�����”(sticky ink) which
require explanation for understanding.
Because of these characteristics, the well-known
Japanese parser KNP (Kurohashi, 2000) incorrectly
analyze or cannot process most of the Japanese
patent claims.
KNP’s dependency analysis works by detecting
parallel structure utilizing thesaurus and dynamic
programming, but it does not work well for patent
1
Note that the term “Jepson claim” is rigidly defined and
used in Europe or in the USA to describe the kind of claims
in which the known part and the new part are clearly sepa-
rated. In Japan, that is not common and the separation is more
vague(Lise, 2002). That’s why we name this as “Jepson-like
style”.
Table 1: Relations for Japanese patent claims
Type Relation Explanation Example
Multi- PROCEDURE Process Sequence [�`z][�`z][�b�]X
Nuclear Style [Note: The above means “X which [does�,]
[does�,] [and does�].”]
Multi- COMPONENT Element Enumeration [�qz][�qz][�q]�
Nuclear Style [Note: The above means “[�,] [�,] [and�].”]
Mono- ELABORATION S elaborates N. [X�Y`h][ZwA]
Nuclear [Note: The above means “[A of Z] [which Y X].”]
Mono- FEATURE Characterization [XpK�Y][���qb�]
Nuclear [Note: The above means “[characterized
by] [Y which is X].”].
Mono- PRECONDITION Jepson-like Style [XpKloz][Y`hZ]
Nuclear [Note: The above means “[In X,] [Z which Y].”.
Mono- COMPOSE Composition [�qz�qz�q][��Qh]X
Nuclear [Note: The above means “X [composed of] [�,
�, and�].”].
claims because they often include “chain expres-
sions” in which one concept is first defined and next
another concept is defined using the first. For the
claim in Figure 1, although “�YU	Z	�” (a load
detection method), “Hw	*�
:!�+” (a fre-
quency transfer device no.1), “Hw	*�
:!�
+” (a frequency transfer device no.2), “!�	�”
(a modulation method), and “	��C
\	�” (an os-
cillation generation method) need to be recognized
as parallel, it cannot be recognized due to the exis-
tence of the expressions designated by the underline.
3 Structure Analysis of Patent Claims
3.1 Background
To improve readability of Japanese patent claims,
we claim that the structure of description needs to
be presented in a readable way. To do so, the struc-
ture needs to be analyzed first.
Japanese patent claims are described in such a
way that multiple sentences are coerced into one
sentence(Kasuya, 1999). In other words, a claim
is composed of multiple sentences that have some
kind of relationships with each other. Therefore, we
decided to apply the RST (Rhetorical Structure The-
ory) (Mann, 1999) that was proposed to analyze dis-
course structure composed of multiple sentences.
RST was proposed in the 1980’s and has been
successfully applied to automatic summarization
(Marcu, 2000), automatic layout (John Bateman,
2000), and so on. A Tcl/Tk-based interactive tool
(OD’onnel, 1997) was developed to support to man-
ually edit and to visually show the structure.
3.2 Framework
For the structure analysis of Japanese patent claims,
we defined six relations as in Table 1. Two of them
are multi-nuclear where composing elements are
equally important. Four of them are mono-nuclear
where one element is nucleus, the other is satellite,
and the nucleus is more important than the satellite.
In the “Example” column of Table 1, the regions en-
closed with “[” and “]” are segments or spans and
the underlined ones are nuclei.
Given the patent claims in Figure 1 and Figure
2, we can analyze their structure and present them
visually by using RSTTool (OD’onnel, 1997) as in
Figure 3 and Figure 4
2
.
3.3 Cue-phrase-based Approach
In designing the algorithm, we took a similar ap-
proach to (Marcu, 2000). We collected cue phrases
that can be used for segmenting long claims and es-
tablishing relations among segments or spans.
2
Because RSTTool is written in Tcl/Tk and Tcl/Tk is an in-
ternationalized language, we did not have to localize it to dis-
play Japanese characters.
Figure 3: A result of structure analysis of patent claim in Figure 1 (using RSTTool v2.7)
Figure 4: A result of structure analysis of patent claim in Figure 2 (using RSTTool v2.7)
Table 2: Description pattern just before the newlines
in claims in which newline are explicitly inserted
No Pattern Ratio
1 (Noun|Symbol)q(z||) 46.1%
[Note: “q” means “and”.]
2 (Verb-Cont-Form| 17.5%
AuxVerb-Cont-Form)(z||)
3 (Noun|Symbol)tSMo(z||) 16.4%
[Note: “tSMo” means “in”.]
4 (Noun|Symbol)pKlo(z||) 7.2%
[Note: “pKlo” means “in”.]
Cue phrases were first collected manually by
reading patent claims. Then we found that about half
of the claims are inserted with newlines at seemingly
segment boundaries as in Figure 2.
We investigated all of the extracted first claims
of the sample data and 48.5% of them are newline-
inserted claims. It seems that the drafters of patent
claims explicitly inserted those newlines for read-
ability for themselves. We checked the description
pattern of the last three morphemes just before each
newline of those claims. The result is shown in Ta-
ble 2. In Table 2, “Verb-Cont-Form” means “��
�;” (verb in continuous form) and “AuxVerb-
Cont-Form” means “	���;” (auxiliary verb
in continuous form). Note that the description pat-
terns are expressed in the regular expression notation
of Perl.
Summarizing the above, we came up with the
cue phrases in Table 3. In Table 3, “Verb-Basic-
Form” means “��,�” (verb in basic form)
and “AuxVerb-Basic-Form” means “	��,�”
(auxiliary verb in basic form).
3.4 Algorithm and Implementation
We designed an algorithm for analyzing structure
of independent claims
3
. Although patent claims are
written in natural language, it’s not written in a free
form and is restricted in a sense that there are de-
scription styles established in the community. So,
we designed an algorithm composed of a lexical an-
alyzer and a parser as in the formal language proces-
sors.
3
Independent claims are claims which do not refer to any
other claims.
First, the input claim is analyzed with the morpho-
logical analyzer “chasen” (Matsumoto et al., 2002).
Because some patent claims explicitly contain new-
lines as in Figure 2, we use the “-j” option setting
the sentence delimiter as “{�” in “.chasenrc”.
Next, the output from chasen is analyzed with the
lexical analyzer. The main point of our algorithm
is the context-dependent behavior of the lexical ana-
lyzer as follows:
• The lexical analyzer outputs two types of to-
ken: cue phrase token and morpheme token.
• Outputting morpheme tokens is done depend-
ing on some contextual conditions to avoid am-
biguities in the parsing.
• For other morphemes whose context did not
satisfy the above conditions, an anonymous
morpheme token (WORD) is output.
Next, the output from the lexical analyzer is pro-
cessed with the parser generated from a context-free
grammar (CFG) by using Bison (Donnelly and Stall-
man, 1995)-compatible parser generator. The CFG
we designed for Japanese patent claim consists of 57
rules, 11 terminals, and 19 non-terminals.
Finally, a structure tree is constructed in the form
of “.rs2” file used in RSTTool v2.7. By using RST-
Tool, the output is visually displayed as in Figure 3
and Figure 4.
3.5 Evaluation
The evaluation was done by using the first claims
4
of 59,956 patents extracted from the NTCIR3 patent
data collection.
The NTCIR3 patent data collection consists of
697,262 patents opened to public in 1998 and in
1999. For the analysis, the collection of cue phrases,
and the creation of the CFG, we used patents in
1998. For the evaluation, we used patents in 1999.
We checked the IPC (International Patent Classi-
fication) code of 59,956 patents and confirmed that
the distribution is similar to the one of all opened
patents in 1999 disclosed by JPO (Japan Patent Of-
fice).
The evaluation was done in the following points:
4
First claims are always independent claims.
Table 3: Cue phrases which can be used to analyze patent claims
Token Name Cue Phrase Gloss
JEPSON CUEt(S|)Mo(z||) [ni oite] (in)
pKlo(z||) [de atte] (in)
tKh�(z||) [ni atari] (in)
tp(h)?�(z||) [ni atari] (in)
FEATURE CUE���q(`h|b�)(z||)? [wo tokuchou to
(shita|suru)]
(characterized by)
COMPOSE CUE�eL`o�
R^�(h|�|oM�)(z||)? [wo tousaishite kousei
sare (ta|ru|teiru)]
(comprising)
�(z||)?(�|�|fs)Q(h|�|oM�)(z||)? [wo sonae (ta|ru|teiru)]
(comprising)
�(z||)?��(`h|b�|`oM�|`os�) [wo gubi (shita|suru|
(z||)? shiteiru|shitenaru)]
(comprising)
(p|T�)�
R^�(h|oM�)(z||)? [(de|kara) kousei sare
(ta|teiru)]
(comprising)
�(z||)?(b�|`h)(z||)? [wo yuu (suru|shita)]
(comprising)
�(z||)?A(b�|`h)(z||)? [wo hougan (suru|shita)]
(comprising)
�(z||)?(�|�i)(z||)? [wo fuku (mu|nda)]
(comprising)
T�(z||)?(s�|slh|sloM�)(z||)? [kara (naru|natta
|natteiru)]
(comprising)
T�(z||)?(
R�|
Rlh|
RloM�)(z||)? [kara (naru|natta
|natteiru)]
(comprising)
�(z||)?
�Z(h|oM�)(z||)? [wo mouke (ta|teiru)]
(comprising)
�(z||)?
��(b�|`h|`oM�)(z||)? [wo soubi (suru|shita
|shiteiru)]
(comprising)
NOUN The sequence of “(Noun|Symbol)q(z||)”
POSTP TO
PUNCT TOUTEN
VERB RENYOU The sequence of
PUNCT TOUTEN “(Verb-Cont-Form|AuxVerb-Cont-Form)(z||)”
which exist before
“(Verb-Basic-Form|AuxVerb-Basic-Form)
(Noun|Symbol)”
Accept Ratio The ratio of claims accepted by the
parser generated by the CFG.
Processing Speed The time required to process one
claim.
Accuracy The accuracy of the analysis result eval-
uated indirectly and directly.
The accept ratio was more than 99.77%. The pro-
cessing speed was 0.30 second per each claim (eval-
uated on a Linux PC using Pentium III 1GHz and
512MB memory). So, it is almost real-time.
3.5.1 Indirect Evaluation on Accuracy
By specifying a command-line switch, our pro-
gram can be run without utilizing the originally in-
serted newlines. The newline insertion positions can
be predicted by the result of structure analysis and
some heuristics. So, indirect evaluation was done by
comparing the newline insertion positions between
the originally newline-inserted claims and the auto-
matically newline-inserted claims utilizing the result
of structure analysis. The recall(R), the precision(P),
and the F-measure(F) are calculated by the follow-
ings, where c is the number of correctly-inserted
newlines, n is the number of newlines in the orig-
inal claim, and i is the number of inserted newlines.
R =
c
n
(1)
P =
c
i
(2)
F =
2 ∗ R ∗ P
R + P
(3)
The baseline was set in that the newlines are in-
serted mechanically at the end of every sequence
of “(NOUN|SYMBOL)(z||)” and “(Verb-Cont-
Form|AuxVerb-Cont-Form)(z||)”.
Note that newlines are sometimes inserted at the
positions that are not segment boundaries in the
meaning of RST. For example, it is often the case
that at the end of “xz” (a postpositional particle
representing the subject), newlines are inserted. So,
our newline-insertion prediction algorithm has the
inherent upper limit whose recall is 0.873.
The result is shown in Table 4.
Table 4: Evaluation result (Indirect)
Index Baseline Newline Upper
Insertion Limit
utilizing
RST
Recall(R) 0.478 0.674 0.8736
Precision(P) 0.374 0.663 N/A
F-measure 0.420 0.669 N/A
Table 5: Evaluation result (Direct)
Category Count Percentage
(Except
“No judgment”)
Correct 76 80.85%
Partially Correct 11 11.70%
Incorrect 7 7.45%
No judgment 6 -
3.5.2 Direct Evaluation on Accuracy
The direct evaluation on accuracy was done by us-
ing randomly selected 100 claims extracted. All of
these claims are the first claims. Again, we checked
the distribution of IPC and confirmed it’s similar to
the one of all opened patents in 1999 disclosed by
JPO.
The 100 claims were analyzed by our program
and the visually-displayed outputs like Figure 3 and
4 were presented to a subject who had some expe-
rience in reading patent specifications. The subject
evaluated the result by the following criteria:
• when the claim is in the Jepson-like style,
whether that is correctly recognized.
• when the claim is in the Jepson-like style,
whether the structure is correctly analyzed for
the first half part and for the last half part.
• when the claim is not in the Jepson-like style,
whether the structure is correctly analyzed for
the whole.
The result is shown in Table 5.
3.6 Application to Patent Claim Paraphrase
Once the structure of patent claims are analyzed, we
can apply the result to paraphrase patent claims.
To do so, the following actions are incorporated
into the lexical analyzer and the parser.
• The lexical analyzer deletes the words “
�G”
(the), “�” (the), and “	�G” (the).
• For the parser, new actions are added which re-
locates the “noun group” located at the end to
the front. Same thing for the “noun group” lo-
cated just before JEPSON CUE for the Jepson-
like style claims.
• For the process sequence style, the lexical an-
alyzer conjugates verbs and adverbs from their
continuous form to basic form and replaces the
touten “(z||)” with the kuten “{”.
• For the element enumeration style, the lexical
analyzer converts those cue phrases such as “T
�s�”(consist of) and “�b�” (include)
to their “���”(“teiru” form) plus “{” and
deletes “q(z||)” (and) at the end of each el-
ement.
• The lexical analyzer converts “\q”(thing)
just before “���qb�”(characterized by)
to “�<”(the following).
• For the Jepson-like style, the parser separates
the first-half part and the last-half part by in-
serting a newline.
By doing the above processing, long patent claim
sentences are divided into multiple sentences. But
as there are cases where some of the generated sen-
tences are still too long, those sentences longer than
the threshold length (75 characters) are recursively
processed.
An example of paraphrase is shown in Figure 5.
We believe that paraphrasing can not only im-
prove readability of patent claims but also can work
effectively as a preprocessing for machine transla-
tion
5
.
5
In fact, there are several commercial machine translation
software which does special preprocessing for patent claims be-
fore translating from Japanese to English.

�^	�t������������`o	tl
w^���sO^�;{
�<�
�ZoM�\q���qb�^�;w
�
^;>
�	��
\
R
��:
y^�;w^�;�t^;b��Y�U	Zb�
�YU	Z	�
y\w�YU	Z	�wU	Z�t ah	*�
:w
	���	Z�b�Hw	*�
:!�+
yp��YU	Z	�wU	Z�t ah	*�
:w
����	Z�b�Hw	*�
:!�+
yHw	*�
:!�+T�	Z�^��	���H
w	*�
:!�+T�w���w	Z�8iZ
=$t	Z�b�!�	�
y\w!�	�w	Z�	��t ao	���C
\
b�	��C
\	�
Figure 5: A sample paraphrase for Figure 1
4 Term Explanation for Patent Claims
4.1 Background and Motivation
Once the structure of patent claims are analyzed
and presented visually, next hurdle for readability is
terms.
There are many novel terms used in patent claim
description. They can be classified into the follow-
ing categories:
Terms specific to the invention Patent drafters of-
ten assign unique names to the invention, its
elements, and its processes for their identifica-
tion.
Terms specific to the domain The patent law re-
quires patents should be written so that those
who have ordinary knowledge in the domain
can understand and perform the invention. So,
technical terms that are established in the do-
main are often used. Additionally, there exist
“patent jargons” which are created by combin-
ing two kanji characters such as “U
�” (put
and insert) and “
;�” (put into the hall)(Kasai,
1999). They are first created by some patent
drafters for the sake of brevity and have been
widely used in the community. So, they are
terms specific to the inventions of the domain.
Those who do not have enough knowledge in
the domain or those who are not accustomed to
reading patent specifications have difficulty in
understanding them.
Giving appropriate explanations for these terms
would help to improve readability of patent claims.
4.2 Approach
First of all, it is necessary to recognize terms to be
explained. There are many research issues in term
extraction in general, but for our purpose we use
the following morphological pattern to extract terms
from patent claims:
(Prefix)*(Noun|Unknown-Words|Symbol
|Verb-Cont-Form|Verb-Compound-With-
Indeclinable-Word)+
By using the above pattern, we can extract such
terms as “��
'V�Z	�” (method to blow heat
wind), “�	�” (read value), and “�#” (liquid
drop) which contain verbs.
Second, by using the result of structure analysis,
we can infer the categories of the terms as follows:
• If the term appears at the end of the claim or
just before the JEPSON CUE in the Jepson-
like style, or just before “q” (and) in the el-
ement enumeration style, it is a term specific
to the invention. For example, “
�^;>
�	�
�
\
R
��” (an operational virtual oscillation
generating device) and “�YU	Z	�”(a load
detection method) in Figure 1 are terms specific
to this invention.
• If the term appears in the middle of the first half
in the Jepson-like style, it can be a term specific
to the domain. For example, “������
�”(an actuator) in Figure 1 is a technical term
in the domain.
• If the term is a two-kanji character and is not
listed in the ordinary dictionaries, it can be a
patent jargon.
Finally, by looking at the detailed description of
the invention or related inventions, we can back up
the above inference as follows:
• The terms specific to the invention should be
described after the “means to solve the prob-
lem” section in the detailed description of the
invention.
• The terms specific to the domain are widely
used in the inventions of the domain. So, it is
highly possible that they occur frequently in the
related inventions. We can consider the collec-
tion of search result as the related inventions.
• Some of the technical terms specific to the do-
main are described in the “prior art” section of
the detailed description of the invention or re-
lated inventions in the domain.
For those technical terms specific to the domain,
explanatory portions such as the following can be
found:
 “...\�t0 b�	y����s���������
�fw
�^�t ahSp��`z...”
(... driving the oil pressure cylinder (or the actuator) at
the speed of ...)
 “...A	Z��������...”
(... the spout (or the orifice) ...)
 “...���w'�$�	Z�m���������...”
(... blowing out ink preliminarily (namely, purging ink)
...”
 “...���������w����{����...”
(... ink of the hot-melt type (or solid ink) ...
As can be seen in the above, explanatory por-
tions can be found by using cue phrases such as “
�” and “�”, “�<” (“in the following”), and “m�
�” (“or” or “namely”).
4.3 Sample Scenario
From the patent claim in Figure 2, we find many
terms that are candidates for explanation such as “�
-” (time measurement), “�-	�” (the
method to measure time), “-AL” (measurement
result), “
�����” (sticky ink), “
�����	�
�” (removal of sticky ink), “
�����	��	rg”
(removal processing of sticky ink), “
�����	�
�	�” (the method to remove sticky ink).
Among the above terms, “�-	�” (the
method to measure time) and “
�����	��	
�” (the method to remove sticky ink) are terms spe-
cific to the invention because they are judged as the
elements by structure analysis.
By searching the detailed description, we can find
the explanatory portion for “
�����” (sticky
ink) as follows.
 “...���w�SU
�Cb�\q��<�
������
qMO�...”
(... the ink of increased stickiness (in the following, we
call it as “sticky ink” ...)
4.4 Further Analysis and Experimentation
We continue to analyze the NTCIR3 patent data col-
lection, specifically “Patolis Test Collection” which
is a test collection for patent retrieval consisting of
a set of query and search result. We use each search
result as “related inventions” and analyze them to
collect cue phrases for finding explanatory portions
for technical terms specific to the domain.
5 Related Work
A NLP research for patent claim is already reported
in (Kameda, 1995). It is directed toward dependency
analysis of patent claims. Although it is proposed to
support “analytic reading” of patent claims, the eval-
uation result for large-scale real patent data is not
reported. Our approach is different from (Kameda,
1995) in that the top-level structure is analyzed.
In (Sheremetyeva and Nirenburg, 1996), a re-
search on a system for authoring patent claims us-
ing NLP and knowledge engineering technique is re-
ported.
6 Concluding Remarks
We have presented a framework to represent the
structure of patent claims and a method to automat-
ically analyze it. The evaluation result suggest that
our approach is robust and practical.
We are currently investigating a method to clar-
ify terms in patent claims and to find the explana-
tory portions from the detailed description part of
the patent specifications.
It is not only a step toward improving readability,
but it can also lead to more challenging task of auto-
matic patent map generation(Study group on patent
map, 1990).
Acknowledgements
The NTCIR3 patent data collection was used in our research.

References
Lee B. Burgunder. 1995. Legal Aspects of Managing
Technology. South Western.
Charles Donnelly and Richard Stallman, 1995. Bison:
The YACC-compatible Parser Generator, Version 1.25.
Makoto Iwayama, Atsushi Fujii, Akihiko Takano, and
Noriko Kando. 2003. Overview of patent retrieval
task at ntcir-3. In The Third NTCIR Workshop on Re-
search in Information Retrieval, Automatic Text Sum-
marization and Question Answering. National Institute
of Informatics.
Jorg Kleinz Klaus Reichenberger John Bateman,
Thomas Kamps. 2000. Toward constructive text,
diagram, and layout generation for information pre-
sentation. Computational Linguistics, 27(3):409–449.
Masayuki Kameda. 1995. Support functions for reading
japanese text. In IPSJ SIGNotes Natural Language,
number 110. Information Processing Society of Japan.
(in Japanese).
Yasuji Kasai. 1999. Manual for Drafting Patent Claims.
Kougyo Chosakai. (in Japanese).
Youji Kasuya. 1999. On the description style of patent
claims and the techniques to draft them. Patent, 52(2).
(in Japanese).
Sadao Kurohashi. 2000. KNP - japanese parsing for real.
IPSJ MAGAZINE, 41(11). (in Japanese).
William Lise. 2002. An investigation of ter-
minology and syntax in japanese and us patents
and the implications for the patent translator.
http://www.lise.jp/patsur.html.
Mamoru Maekawa. 1995. Science of Sentences.
Iwanama. (in Japanese).
Bill Mann. 1999. An introduction
to rhetorical structure theory (RST).
http://www.sil.org/ mannb/rst/rintro99.htm.
Daniel Marcu. 2000. The Theory and Practice of Dis-
course Parsing and Summarization. MIT Press.
Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita,
Yoshitaka Hirano, Hiroshi Matsuda, Kazuma Takaoka,
and Masayuki Asahara, 2002. Morphological Analy-
sis System ChaSen version 2.2.9 Manual. Nara Insti-
tute of Science and Technology.
Michael OD’onnel. 1997. RST-Tool: An RST analysis
tool. In The 6th European Workshop on Natural Lan-
guage Generation.
Svelana Sheremetyeva and Sergey Nirenburg. 1996.
Knowledge elicitation for authoring patent claims.
IEEE Computer, 57–63.
Study group on patent map, editor. 1990. Patent Map
and Information Strategy. Japan Institute of Invention
and Innovation. (in Japanese).
