XML Viewer - w04-0603

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0603_intro.xml
Size: 9,664 bytes
Last Modified: 2025-10-06 14:02:26
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0603">
  <Title>A web application using RDF/RDFS for metadata navigation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Architecture
</SectionTitle>
    <Paragraph position="0"> Content in our architecture consists of plain text news documents and RDF metadata. Both are stored in an XML content repository. In addition we also store Thomson West's legal database of attorney pro les in the same repository as well. With the content stored, we use a name/entity tagger in combination with methods described in (Dozier and Haschart, 2000) to link occurrences of attorney  names within the plain text news documents to their database pro le record.</Paragraph>
    <Paragraph position="1"> There are several reasons that motivate us to build this web application using RDF/RDFS. Firstly, our existing data model put metadata and content in the same data repository, the relationships or links are embedded inside content. This makes it very dif cult to build new business products since developers have to write programs to look at content rst, extract information out of it and then put this extracted information somewhere to enable front-end rendering. The disadvantage of this approach is being able to dynamically maintain the integrity of both data repository and relationship repository in a rapidly changing environment. Both of these repositories need to be updated whenever any relationships get updated. The use of RDF/RDFS separates relationships from content so manipulation of metadata is easier and less expensive.</Paragraph>
    <Paragraph position="2"> RDF/RDFS's ability to provide a data infrastructure for entities, relationships extracted from NLP applications is the second reason for choosing it as our data model. In our domain, we have different kinds of entities embedded in news articles, law reviews, legal cases etc. These entities include attorney name, judge name, and law rm names. We are interested in not only identifying them in content but also nding their relationships and linking them together. RDF/RDFS allows us to accomplish this.</Paragraph>
    <Paragraph position="3"> Architecture for this application uses MVC (Model View Controller) design pattern for separating graphical interface of one application from its backend artifacts such as code and data. This classic architectural design pattern provided the exibility to maintain multiple views of backend data.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 RDF/RDFS/XML Data Model
</SectionTitle>
      <Paragraph position="0"> Using the MVC design pattern, our data model represents data used by the application and the rules for accessing this data. A RDF/RDFS/XML model is created to represent the data and a set of APIs is provided for data accessing purpose.</Paragraph>
      <Paragraph position="1"> Our prototype contains 911274 legal professionals' pro les from West's Legal Directory and 2000 news documents. The news documents are pre-processed using our name entity tagger. The tagging process is able to generate a list of people templates that are then fed into an entity reference resolution program. This allows us to resolve each extracted name template to its speci c record from West's Legal Directory.</Paragraph>
      <Paragraph position="2"> Our data model environment contains separate metadata and content repositories, the XML content repository and the RDF metadata repository. We convert the news articles to XML and load them to XML content repository. Our search API features of this repository allow us to perform full text searching inside content. Each news article takes the form of one XML document identi ed by a unique reference number. Names found inside these documents by the name tagger are identi ed with xml elements.</Paragraph>
      <Paragraph position="3"> Besides 2000 news articles, WLD legal professionals' pro les are also loaded to this content repository with each pro le also associated with a unique identifying number.</Paragraph>
      <Paragraph position="4"> Our RDF metadata repository employs on RDF/RDFS model. A simple RDF schema formally speci es groups of related resources and the relationships between these resources. Figure 3 demonstrates three major RDF resources; Document, People and Organization. The Attorney and Judge resources are subclasses of the People resource. Each instance of these resources has a URI associated with it. Resource related properties are also dened in this schema. The ranges of some properties of resources are themselves resources from other domains. For example, resource Document has a property PeopleInDocument. This property has its domain in Document but its range is in the People domain. The schema allows us to specify the data model so our metadata navigation application could follow relationship links speci ed in it. More details about this schema can be found in Appendix A.</Paragraph>
      <Paragraph position="5"> Based on this schema, the RDF metadata repository is built to represent the relationships among  news articles, attorneys, judges, courts and law rms. The metadata building process involves several steps that are entity and relation extraction from the tagged XML content repository, RDF metadata generation, and RDF metadata loading. The end result is an RDF metadata repository with full text search capability. Figure 4 shows samples of a portion of the metadata model depicting the occurrence of two attorneys in a Wall Street Journal document.</Paragraph>
      <Paragraph position="6"> During the time the metadata repository was built, our schema was only used for data validation purpose. Currently we are exploring one approach that leverages the expressive power of logic programming tool such as Prolog to navigate the RDF schema graph; this schema navigation should be able to enable automatic metadata collection about particular concepts and then build corresponded RDF metadata based upon.</Paragraph>
      <Paragraph position="7"> Note that in this application, URIs (unique reference identi cation) are used extensively. Each document in both content and metadata repositories has a unique number associated with it. This unique number works as a unique resource link and is utilized by the RDF documents in the metadata repository. With this unique number, the RDF document can then be linked to any xml or rdf document, and even to elements inside these documents using  In the sample of the RDF data presented in Table 1, the WSJ document with URI WSJ210572229 entitled Market on a High Wire contains ref- null Froehlich has URI WLD0293087701 and Madden has URI WLD0293086676 . The metadata also contains the XPATH of the attorney names inside this WSJ document as well as the XPATH to other properties of the document such as news title and news content.</Paragraph>
      <Paragraph position="8"> Figure 5 shows a small RDF graph generated from samples in Table 1. In this graph, WSJ210572229 and WLD0293087701 are two major resources from two different domains. The RDF properties of both resources point to each other through predicates. These pointing edges represent relationships among multiple entities and they form the infrastructure for our navigational map that will eventually be presented to end-user.</Paragraph>
      <Paragraph position="9"> Besides metadata and content storage, the data model in MVC also provides a set of APIs for accessing both metadata and content. In XML content repository, APIs exist for single XML document retrieval by URI and full text search by user queries. In the RDF metadata repository, APIs exist for single RDF document retrieval by URI, RDF resource link retrieval using ARP, an RDF parser from HP and RDF metadata full text search.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Application Controller
</SectionTitle>
      <Paragraph position="0"> The Controller in our MVC patterned application contains our metadata navigation logic. The purpose of this layer is to capture all requests from the front view and to interact with the data model to provide the data wanted by the end user.</Paragraph>
      <Paragraph position="1"> The general scenario of our application starts out with a user typing in queries. These queries are then passed to the XML content repository which returns matched search results with navigation meta-data embedded inside. All of this metadata is generated through the controller layer that interacts with both RDF and XML repository. The results then are presented to the user who can click on entities of interest (which are RDF resources) and thus navigate through our metadata repository.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Front View
</SectionTitle>
      <Paragraph position="0"> All information rendering happens in the front view layer. This layer interacts with end users and species how nal data can be represented. Since back-end data is either RDF or XML, we use XSLT to convert this to HTML/JSP pages that work in the front end browser.</Paragraph>
      <Paragraph position="1"> Appendix B shows a snapshot of our application depicting a single Wall Street Journal article containing attorney names. The end user can roll over this name link and using the pop-up menu, navigate to other corresponding entities such as other news documents that mention the same name, or law rm this attorney is working in. This metadata-based navigation is described in detail in next section.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML