Interactive Paraphrasing
Based on Linguistic Annotation
Ryuichiro Higashinaka
Keio Research Institute at SFC
5322 Endo, Fujisawa-shi,
Kanagawa 252-8520, Japan
rh@sfc.keio.ac.jp
Katashi Nagao
Dept. of Information Engineering
Nagoya University
Furo-cho, Chikusa-ku,
Nagoya 464-8603, Japan
nagao@nuie.nagoya-u.ac.jp
Abstract
We propose a method “Interactive Paraphras-
ing” which enables users to interactively para-
phrasewordsinadocumentbytheirdeﬁnitions,
making use of syntactic annotation and word
sense annotation. Syntactic annotation is used
for managing smooth integration of word sense
deﬁnitionsintotheoriginaldocument,andword
sense annotation for retrieving the correct word
sense deﬁnition for a word in a document. In
thisway, documentscanbeparaphrasedsothat
they ﬁt into the original context, preserving the
semantics and improving the readability at the
sametime. Noextralayer(window)isnecessary
for showing the word sense deﬁnition as in con-
ventional methods, and other natural language
processing techniques such as summarization,
translation, and voice synthesis can be easily
applied to the results.
1 Introduction
There is a large number of documents of great
diversity on the Web, which makes some of the
documents diﬃcult to understand due to view-
ers’ lack of background knowledge. In particu-
lar, if technical termsor jargon are contained in
the document, viewers who are unfamiliar with
them might not understand their correct mean-
ings.
Whenwe encounter unknownwordsin a doc-
ument, for example scientiﬁc terms or proper
nouns, we usually look them up in dictionar-
ies or ask experts or friends for their mean-
ings. However, if there are lots of unfamiliar
words in a document or there are no experts
around, the work of looking the words up can
beverytimeconsuming. Tofacilitate theeﬀort,
we need (1) machine understandableonline dic-
tionaries, (2) automated consultation of these
dictionaries, and (3) eﬀective methods to show
the lookup results.
There is an application which consults online
dictionaries when the user clicks on a certain
word on a Web page, then shows the lookup re-
sults in a popped up window. In this case, the
applicationaccessesitsinner/onlinedictionaries
andtheconsultationprocessisautomatedusing
the viewer’s mouse click as a cue. Popup win-
dows correspond to the display method. Other
related applications operate more or less in the
same way.
We encounter three big problems with the
conventional method.
First, due to the diﬃculty of word sense dis-
ambiguation,inthecaseofpolysemicwords,ap-
plications to date show all possible word sense
candidates for certain words, which forces the
viewer to choose the correct meaning.
Second, the popup window showing the
lookup results hides the area near the clicked
word, so that the user tends to lose the context
and has to reread the original document.
Third, since the document and the dictio-
narylookupresultsareshownindiﬀerentlayers
(e.g., windows),othernaturallanguageprocess-
ing techniques such as summarization, transla-
tion, and voice synthesis cannot be easily ap-
plied to the results.
To cope with these problems, we realized a
systematic method to annotate words in a doc-
ument with word senses in such a way that
anyone (e.g., the author) can easily add word
senseinformationtoacertainwordusingauser-
friendly annotating tool. This operation can be
consideredasacreationofalinkbetweenaword
inthedocumentandanodeinadomain-speciﬁc
ontology.
The “Interactive Paraphrasing” that we pro-
pose makes use of word sense annotation and
paraphrases words by embedding their word
sense deﬁnitions into the original document to
generate a new document.
Embedding occurs at the user’s initiative,
which means that the user decides when and
where to embed the deﬁnition. The generated
documentcanalsobethetargetforanotherem-
bedding operation which can be iterated until
the document is understandable enough for the
user.
One of the examples of embedding a doc-
ument into another document is quotation.
Transcopyright (Nelson, 1997) proposes a way
for quoting hypertext documents.
However,quotingmeansimportingotherdoc-
uments as they are. Our approach is to convert
other documents so that they ﬁt into the orig-
inal context, preserving the semantics and im-
proving the readability at the same time.
As the result of embedding, there are no win-
dows hiding any part of the original text, which
makes the context easy to follow, and the new
documentisreadytobeusedforfurthernatural
language processing.
2 Example
In this section, we present how our system per-
forms using screenshots.
Figure 1 shows an example of a Web docu-
ment
1
aftertheautomaticlookupofdictionary.
Words marked with a diﬀerent remains back-
ground color have been successfully looked up.
Figure 1: Example of a web document showing
dictionary lookup results
Theconventionalmethodsuchasshowingthe
deﬁnitionofawordinapopupwindowhidesthe
neighboring text. (Figure 2)
Figure 2: Example of a conventional method
popup window for showing the deﬁnition
1
This text, slightly modified here, is from “Internet
Agents: Spiders, Wanderers, Brokers, and Bots,” Fah-
Chun Cheong, New Riders Publishing, 1996.
Figure 3 shows the result of paraphrasing the
word “agent.” It was successfully paraphrased
usingitsdeﬁnition“personalsoftwareassistants
withauthoritydelegatedfromtheirusers.” The
word “deployed” was also paraphrased by the
deﬁnition “to distribute systematically.” The
paraphrasedarea is marked by a diﬀerent back-
ground color.
Figure 3: Example of the results after para-
phrasing “agents” and “deployed”
Figure 4 shows the result of paraphrasing the
wordintheareaalreadyparaphrased. Theword
“authority” was paraphrased by its deﬁnition
“power to make decisions.”
Figure 4: Example of incremental paraphrasing
3 Linguistic Annotation
Semantically embedding word sense deﬁnitions
into the original document without changing
the original context is much more diﬃcult than
showing the deﬁnition in popup windows.
For example, replacing some word in a sen-
tence only with its word sense deﬁnition may
cause the original sentence to be grammatically
wrong or less cohesive.
Thisisduetothefactthatthewordsensedef-
initions are usually incapable of simply replac-
ing original words because of their ﬁxed forms.
For appropriately integrating the word sense
deﬁnition into the original context, we employ
syntactic annotation (described in the next sec-
tion) to both original documents and the word
sense deﬁnitions to let the machine know their
contexts.
Thus, we need two types of annotations for
Interactive Paraphrasing. Oneisthewordsense
annotation to retrieve the correct word sense
deﬁnitionforaparticularword, andtheotheris
the syntactic annotation for managing smooth
integration of word sense deﬁnitions into the
original document.
In this paper, linguistic annotation covers
syntactic annotation and word sense annota-
tion.
3.1 Syntactic Annotation
Syntactic annotation is very useful to make on-
line documents more machine-understandable
on the basis of a new tag set, and to de-
velop content-based presentation, retrieval,
question-answering, summarization, and
translation systems with much higher qual-
ity than is currently available. The new
tag set was proposed by the GDA (Global
Document Annotation) project (Hasida,
http://www.etl.go.jp/etl/nl/gda/). It is based
on XML , and designed to be as compatible
as possible with TEI (The Text Encoding Ini-
tiative, http://www.uic.edu:80/orgs/tei/)
and CES(Corpus Encoding Standard,
http://www.cs.vassar.edu/CES/). It speciﬁes
modiﬁer-modiﬁee relations, anaphor-referent
relations, etc.
An example of a GDA-tagged sentence is as
follows:
a19 ✏
<su><np rel="agt">Time</np>
<v>flies</v><adp rel="eg">
<ad>like</ad><np>an <n>arrow</n></np>
</adp>.</su>
✒ ✑
The tag, <su>, refers to a sentential unit.
Theothertags above, <n>, <np>, <v>, <ad> and
<adp> mean noun, noun phrase, verb, adnoun
or adverb (including preposition and postposi-
tion), and adnominal or adverbial phrase, re-
spectively.
Syntactic annotation is generated by auto-
matic morphological analysis and interactive
sentence parsing.
Someresearchissuesconcerningsyntactic an-
notation are related to how the annotation cost
can be reduced within some feasible levels. We
have been developing some machine-guided an-
notation interfaces that conceal the complexity
of annotation. Machine learning mechanisms
also contribute to reducing the cost because
they can gradually increase the accuracy of au-
tomatic annotation.
3.2 Word Sense Annotation
Inthecomputational linguisticﬁeld,wordsense
disambiguation has been one of the biggest is-
sues. For example, to have a better translation
of documents, disambiguation of certain poly-
semic words is essential. Even if an estimation
ofthewordsenseisachieved tosomeextent, in-
correct interpretation of certain words can lead
to irreparable misunderstanding.
To avoid this problem, we have been pro-
moting annotation of word sense for polysemic
words in the document, so that their word
senses can be machine-understandable.
Forthispurpose,weneedadictionaryofcon-
cepts, for which we use existing domain ontolo-
gies. Anontology is a set of descriptions ofcon-
cepts - such as things, events, and relations -
that are speciﬁed in some way (such as speciﬁc
natural language) in order to create an agreed-
upon vocabulary for exchanging information.
Annotating a word sense is therefore equal to
creating a link between a word in the document
andaconcept inacertain domainontology. We
have made awordsenseannotating tool forthis
purposewhich has been integrated with the an-
notation editor described in the next section.
3.3 Annotation Editor
Our annotation editor, implemented as a Java
application, facilitates linguistic annotation of
the document. An example screen of our anno-
tation editor is shown in Figure 5.
Figure 5: Annotation editor
Theleftwindowoftheeditorshowsthedocu-
ment object structure of the HTML document.
The center window shows some text that was
selected on the Web browser as shown on the
righttopoftheﬁgure. Theselectedareaisauto-
matically assigned an XPointer (i.e., a location
identiﬁer in the document) (World Wide Web
Consortium, http://www.w3.org/TR/xptr/).
Therightbottomwindowshowsthelinguistic
structureofthesentenceintheselectedarea. In
this window, the user can modify the results of
the automatically-analyzed sentence structure.
Using the editor, the user annotates text
with linguistic structure (syntactic and seman-
tic structure) and adds a comment to an ele-
ment in the document. The editor is capable of
basic natural language processing and interac-
tive disambiguation.
Thetool also supportswordsense annotation
as shown in Figure 6. The ontology viewer ap-
pearsinthe rightmiddleofthe ﬁgure. Theuser
can easily select a concept in the domain ontol-
ogy and assign a concept ID to a word in the
document as a word sense.
Figure 6: Annotation editor with ontology
viewer
4 Interactive Paraphrasing
Using the linguistic annotation (syntactic and
word sense annotation), Interactive Paraphras-
ingoﬀers away toparaphrasewordsinthe doc-
ument on user demand.
4.1 Interactivity
Oneof the objectives of thisresearch isto make
onlinedocumentsmoreunderstandablebypara-
phrasingunknownwordsusing their wordsense
deﬁnitions.
Users can interactively select words to para-
phrase by casual movements like mouse clicks.
The paraphrase history is stored for later use
such as proﬁle-based paraphrasing (yet to be
developped) which automatically selects words
to paraphrase based on user’s knowledge.
The resulting sentence can also be a target
for the next paraphrase. By allowing incremen-
tal operation, users can interact with the doc-
ument until there are no paraphrasable words
in the document or the document has become
understandable enough.
Interactive Paraphrasing is divided into click
paraphrasing and region paraphrasing accord-
ing to user interaction type. The former para-
phrases a single word speciﬁed by mouse click,
andthelatter, oneormoreparaphrasablewords
in a speciﬁed region.
4.2 Paraphrasing Mechanism
As described in previous sections, the original
documentandthewordsensedeﬁnitionsarean-
notatedwithlinguisticannotation,whichmeans
theyhavegraphstructures. Awordcorresponds
to a node, a phrase or sentence to a subgraph.
Our paraphrasing is an operation that replaces
a node with a subgraph to create a new graph.
Linguistic operations are necessary for creating
a graph that correctly ﬁts the original context.
We have made some simple rules (principles)
for replacing a node in the original document
with a node representing the word sense deﬁni-
tion.
Therearetwotypesofrulesforparaphrasing.
One is a ”global rule” which can be applied to
any pair of nodes, the other is a ”local rule”
which takes syntactic features into account.
Below isthedescriptionofparaphrasingrules
(principles) that we used this time. Org stands
for the node in the original document to be
paraphrased by Def which represents the word
sense deﬁnition node. Global rules are applied
ﬁrstfollowedbylocalrules. Pairstowhichrules
cannot be applied are left as they are.
- Global Rules -
1. If the word Org is included in Def, para-
phrasingisnotperformedtoavoid theloop
of Org.
2. Ignore the area enclosed in parentheses in
Def. The area is usually used for making
Def an independent statement.
3. Avoid doublenegation, whichincreases the
complexity of the sentence.
4. Toavoidredundancy,removefrom Def the
same case-marked structure found both in
Org and Def.
5. Other phrases expressing contexts in Def
areignored,sincesimilarcontextsarelikely
to be in the original sentence already.
- Local Rules -
The left column shows the pair of linguistic
features
2
corresponding to Org and Def. (e.g.
N −N signiﬁes the rule to be applied between
nodes having noun features.)
2
N stands for the noun feature, V , AJ and AD for
verbal, adjective and adverbial features respectively.
N −N Replace Org with Def agreeing in
number.
N −V Nominalize Def and replace Org.
(e.g., explain→the explanation of)
V −N If there is a verbal phrase modify-
ing Def,conjugate Org using Def’s
conjugation and replace Org.
V −V Apply Org’s conjugation to Def
and replace Org.
AD−N Replace Org with any adverbial
phrase modifying Def.
AJ −N Replace Org with any adjective
phrase modifying Def.
4.3 Implementation
We have implemented a system to realize Inter-
active Paraphrasing. Figure 7 shows the basic
layout of the system. The proxy server in the
middle deals with user interactions, document
retrievals, and the consultation of online dictio-
naries.
Figure 7: System architecture
The paraphrasing process follows the steps
described below.
1. On a user’s request, the proxy server
retrieves a document through which it
searches for words with word sense anno-
tations. If found, the proxy server changes
their backgroundcolor to notify the userof
the paraphrasable words.
2. The user speciﬁes a word in the document
on the browser.
3. Receiving the word to be paraphrased, the
proxy server looks it up in online dictio-
naries using the concept ID assigned to the
word.
4. Using the retrieved word sense deﬁnition,
the proxy server attempts to integrate it
into the original document using linguistic
annotation attached to both the deﬁnition
and the original document.
5 Related Work
Recently there have beensome activities to add
semantics to the Web (Nagao et al., 2001) (Se-
manticWeb.org, http://www.semanticweb.org/)
(Heﬂin and Hendler, 2000) enabling comput-
ers to better handle online documents. As
for paraphrasing rules concerning structured
data, Inui et al. are developing Kura (Inui
et al., 2001) which is a Transfer-Based Lexico-
Structural Paraphrasing Engine.
6 Conclusion and Future Plans
Wehavedescribedamethod,“Interactive Para-
phrasing”, which enables users to interactively
paraphrase words in a document by their deﬁ-
nitions, making use of syntactic annotation and
word sense annotation.
By paraphrasing, no extra layer (window) is
necessary for showing the word sense deﬁnition
as in conventional methods, and other natural
language processing techniques such as summa-
rization, translation, and voice synthesis can be
easily applied to the results.
Our future plans include: reduction of
the annotation cost, realization of profile-based
paraphrasing using personal paraphrasing his-
tory, and retrieval of similar pages for semanti-
cally merging them using linguistic annotation.

References

Jeﬀ Heﬂin and James Hendler. 2000. Semantic In-
teroperability on the Web. In Proceedings of Ex-
treme Markup Languages 2000. Graphic Commu-
nications Association, 2000. pp. 111-120.

Kentaro Inui, Tetsuro Takahashi, Tomoya Iwakura,
Ryu Iida, and Atsushi Fujita. 2001. KURA:
ATransfer-BasedLexico-StructuralParaphrasing
Engine. In Proceedings of the 6th Natural Lan-
guage Processing Pacific Rim Symposium, Work-
shop on Automatic Paraphrasing: Theories and
Applications.

Katashi Nagao, Yoshinari Shirai, and Kevin Squire.
2001. Semantic annotation and transcoding:
MakingWebcontentmoreaccessible. IEEE Mul-
tiMedia. Vol. 8, No. 2, pp. 69–81.

Theodor Holm Nelson. 1997. Transcopyright: Deal-
ing with the Dilemma of Digital Copyright.
Educom Review, Vol. 32, No. 1, pp. 32-35.
