A web application using RDF/RDFS for metadata navigation
Xi S. Guo, Mark Chaudhary, Christopher Dozier
Yogi Arumainayagam, Venkatesan Subramanian
Research & Development
Thomson Legal & Regulatory
610 Opperman Drive
Eagan, MN 55123, USA
xi.guo@thomson.com
Abstract
This paper describes using RDF/RDFS/XML to cre-
ate and navigate a metadata model of relationships
among entities in text. The metadata we create
is roughly an order of magnitude smaller than the
content being modeled, it provides the end-user
with context sensitive information about the hyper-
linked entities in focus. These entities at the core
of the model are originally found and resolved us-
ing a combination of information extraction and
record linkage techniques. The RDF/RDFS meta-
data model is then used to  look ahead and navi-
gate to related information. An RDF aware front-
end web application streamlines the presentation of
information to the end user.
1 Introduction
As an information provider, Thomson West stores
vast quantities of documents that are served up in
response to user queries. Determining the relation-
ships between entities of interest in these documents
can be a complex and time consuming part of end-
user research. Nor is this sort of information always
explicitly presented in the documents retrieved by
searches. Automating the process of discovery is
complicated by the need to uniquely identify and
resolve ambiguities and co-references between en-
tities.
Our system relies on various NLP techniques and
name/entity taggers to identify attorney and judge
names in news articles on WestlawTM. These
names are then tagged with unique reference iden-
ti ers that link them to their records in our legal di-
rectory. The relationships between these individuals
and other entities like their  rm (or court name for
judges), and title of the document in which they are
found are stored as RDF metadata.
A simple representation of relationships among
these entities is shown in Figure 1. Documents
make references to attorneys. Using NLP tech-
niques, each occurrence is resolved to a unique ref-
erence identi cation. The metadata then allows us
Figure 1: Relationships between entities
to expose meaningful relationships among entities
in text. Storing this information as metadata in
the UI allows us to look ahead. Hovering over a
name, the end user is able to see which  rms they
are af liated with. The user is also able to look
ahead to see all the other documents that the per-
son occurs in. In addition, we also know which
 rm each attorney works for and this relationship
allows us to see all the other attorneys who work for
the same  rm. This information is not present in
any of the documents retrieved but is inferred from
our RDF/RDFS (Lassila, 2000), (Klyne and Carroll,
2004), (W3C, 1999), (W3C, 2004) metadata model.
The RDF/RDFS metadata model helps to dynami-
cally resolve relationship among entities during the
time of front end rendering. This system could be
extended to incorporate additional relationships be-
tween other kinds of data.
2 Architecture
Content in our architecture consists of plain text
news documents and RDF metadata. Both are
stored in an XML content repository. In addition
we also store Thomson West’s legal database of at-
torney pro les in the same repository as well. With
the content stored, we use a name/entity tagger
in combination with methods described in (Dozier
and Haschart, 2000) to link occurrences of attorney
Figure 2: High Level Architecture
names within the plain text news documents to their
database pro le record.
There are several reasons that motivate us to build
this web application using RDF/RDFS. Firstly, our
existing data model put metadata and content in the
same data repository, the relationships or links are
embedded inside content. This makes it very dif -
cult to build new business products since developers
have to write programs to look at content  rst, ex-
tract information out of it and then put this extracted
information somewhere to enable front-end render-
ing. The disadvantage of this approach is being able
to dynamically maintain the integrity of both data
repository and relationship repository in a rapidly
changing environment. Both of these repositories
need to be updated whenever any relationships get
updated. The use of RDF/RDFS separates relation-
ships from content so manipulation of metadata is
easier and less expensive.
RDF/RDFS’s ability to provide a data infrastruc-
ture for entities, relationships extracted from NLP
applications is the second reason for choosing it as
our data model. In our domain, we have different
kinds of entities embedded in news articles, law re-
views, legal cases etc. These entities include attor-
ney name, judge name, and law  rm names. We
are interested in not only identifying them in con-
tent but also  nding their relationships and linking
them together. RDF/RDFS allows us to accomplish
this.
Architecture for this application uses MVC
(Model View Controller) design pattern for separat-
ing graphical interface of one application from its
backend artifacts such as code and data. This classic
architectural design pattern provided the  exibility
to maintain multiple views of backend data.
2.1 RDF/RDFS/XML Data Model
Using the MVC design pattern, our data model rep-
resents data used by the application and the rules
for accessing this data. A RDF/RDFS/XML model
is created to represent the data and a set of APIs is
provided for data accessing purpose.
Our prototype contains 911274 legal profession-
als’ pro les from West’s Legal Directory and 2000
news documents. The news documents are pre-
processed using our name entity tagger. The tagging
process is able to generate a list of people templates
that are then fed into an entity reference resolution
program. This allows us to resolve each extracted
name template to its speci c record from West’s Le-
gal Directory.
Our data model environment contains separate
metadata and content repositories, the XML content
repository and the RDF metadata repository. We
convert the news articles to XML and load them to
XML content repository. Our search API features of
this repository allow us to perform full text search-
ing inside content. Each news article takes the form
of one XML document identi ed by a unique refer-
ence number. Names found inside these documents
by the name tagger are identi ed with xml elements.
Besides 2000 news articles, WLD legal profession-
als’ pro les are also loaded to this content reposi-
tory with each pro le also associated with a unique
identifying number.
Our RDF metadata repository employs on
RDF/RDFS model. A simple RDF schema formally
speci es groups of related resources and the rela-
tionships between these resources. Figure 3 demon-
strates three major RDF resources; Document, Peo-
ple and Organization. The Attorney and Judge re-
sources are subclasses of the People resource. Each
instance of these resources has a URI associated
with it. Resource related properties are also de-
 ned in this schema. The ranges of some properties
of resources are themselves resources from other
domains. For example, resource Document has a
property PeopleInDocument. This property has its
domain in Document but its range is in the People
domain. The schema allows us to specify the data
model so our metadata navigation application could
follow relationship links speci ed in it. More de-
tails about this schema can be found in Appendix
A.
Based on this schema, the RDF metadata repos-
itory is built to represent the relationships among
Figure 3: RDF schema of the application
news articles, attorneys, judges, courts and law
 rms. The metadata building process involves sev-
eral steps that are entity and relation extraction from
the tagged XML content repository, RDF metadata
generation, and RDF metadata loading. The end
result is an RDF metadata repository with full text
search capability. Figure 4 shows samples of a por-
tion of the metadata model depicting the occurrence
of two attorneys in a Wall Street Journal document.
During the time the metadata repository was
built, our schema was only used for data validation
purpose. Currently we are exploring one approach
that leverages the expressive power of logic pro-
gramming tool such as Prolog to navigate the RDF
schema graph; this schema navigation should be
able to enable automatic metadata collection about
particular concepts and then build corresponded
RDF metadata based upon.
Note that in this application, URIs (unique ref-
erence identi cation) are used extensively. Each
document in both content and metadata reposito-
ries has a unique number associated with it. This
unique number works as a unique resource link and
is utilized by the RDF documents in the metadata
repository. With this unique number, the RDF docu-
ment can then be linked to any xml or rdf document,
and even to elements inside these documents using
Figure 4: Sample RDF metadata
XPATH.
In the sample of the RDF data presented in Table
1, the WSJ document with URI  WSJ210572229 
entitled  Market on a High Wire contains ref-
erences to two attorneys; Froehlich and Madden.
Figure 5: Small RDF Graph of one metadata sample
Froehlich has URI  WLD0293087701 and Mad-
den has URI  WLD0293086676 . The metadata
also contains the XPATH of the attorney names in-
side this WSJ document as well as the XPATH to
other properties of the document such as news title
and news content.
Figure 5 shows a small RDF graph gener-
ated from samples in Table 1. In this graph,
 WSJ210572229 and  WLD0293087701 are two
major resources from two different domains. The
RDF properties of both resources point to each other
through predicates. These pointing edges represent
relationships among multiple entities and they form
the infrastructure for our navigational map that will
eventually be presented to end-user.
Besides metadata and content storage, the data
model in MVC also provides a set of APIs for ac-
cessing both metadata and content. In XML content
repository, APIs exist for single XML document re-
trieval by URI and full text search by user queries.
In the RDF metadata repository, APIs exist for sin-
gle RDF document retrieval by URI, RDF resource
link retrieval using ARP, an RDF parser from HP
and RDF metadata full text search.
2.2 Application Controller
The Controller in our MVC patterned application
contains our metadata navigation logic. The pur-
pose of this layer is to capture all requests from the
front view and to interact with the data model to
provide the data wanted by the end user.
The general scenario of our application starts out
with a user typing in queries. These queries are then
passed to the XML content repository which re-
turns matched search results with navigation meta-
data embedded inside. All of this metadata is gener-
ated through the controller layer that interacts with
both RDF and XML repository. The results then are
presented to the user who can click on entities of in-
terest (which are RDF resources) and thus navigate
through our metadata repository.
2.3 Front View
All information rendering happens in the front view
layer. This layer interacts with end users and speci-
 es how  nal data can be represented. Since back-
end data is either RDF or XML, we use XSLT to
convert this to HTML/JSP pages that work in the
front end browser.
Appendix B shows a snapshot of our application
depicting a single Wall Street Journal article con-
taining attorney names. The end user can roll over
this name link and using the pop-up menu, navigate
to other corresponding entities such as other news
documents that mention the same name, or law  rm
this attorney is working in. This metadata-based
navigation is described in detail in next section.
3 Metadata based Navigation
By tagging entity information and resolving cross
document co-references for attorneys and judges,
we were able to identify all the documents a partic-
ular attorney or judge appeared in. The RDF meta-
data model goes a step further weaving together
the relationships between attorneys, judges,  rms,
courts and the documents that reference them.
With the metadata model it now becomes easier
for the user to see all related information from any
particular node. The combination of information
extracted from documents with information from
authority  les, gives us a dynamic view of rela-
tionships in the content that can answer questions
such as  What other attorneys were mentioned in
the same article? and  Who else works at the same
 rm as this attorney? These relationships facilitate
navigation between related entities. Figure 6 shows
how the metadata model allows the user to navigate
from one related node to the next. Not only are we
able to tell the  rm an attorney belongs to even if
that wasn’t speci cally mentioned in the text of the
document, but we can also use the metadata model
to shift our focus onto the  rm node and immedi-
ately see a list of other attorneys related to that  rm.
Switching to any one of those nodes (attorneys) im-
mediately shows us articles related to the next attor-
ney. In a similar fashion we can move from judges
to courts and articles and back.
4 Conclusion
This application utilizes RDF/RDFS to build a data
model that allows for easy maintenance of reference
links embedded in content. This data model also fa-
cilitates development of metadata navigation. By
just looking through metadata repository, the appli-
cation can decide the best way to utilize rich infor-
mation buried inside content repository.
We feel that this application can be extended to
provide inferencing capability. The hard wiring of
the logic inside the metadata repository does not
currently provide any formalism to infer hidden re-
lationships from the facts. Implementing this infer-
encing mechanism would bring us closer to our se-
mantic web goal.

References
Christopher Dozier and Robert Haschart. 2000.
Automatic extraction and linking of person
names in legal text. Proceedings of RIAO-2000:
Recherche d’Informations Assiste par Ordina-
teur.
Graham Klyne and Jeremy J. Carroll.
2004. Resource description framework
Figure 6: Navigation between related metadata
(rdf): Concepts and abstract syntax.
http://www.w3.org/TR/2004/REC-rdf-concepts-
20040210/.
Ora Lassila. 2000. The resource description frame-
work. IEEE Intelligent Systems, 15(6):67 69.
W3C. 1999. Resource description
framework (rdf) model and syntax.
http://www.w3.org/TR/1999/REC-rdf-syntax-
19990222/.
W3C. 2004. Rdf vocabulary descrip-
tion language 1.0: Rdf schema.
http://www.w3.org/TR/2004/REC-rdf-schema-
20040210/.
