File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-4005_metho.xml

Size: 7,408 bytes

Last Modified: 2025-10-06 14:10:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-4005">
  <Title>the Semantic Web</Title>
  <Section position="4" start_page="269" end_page="270" type="metho">
    <SectionTitle>
2 Linguistic Component
</SectionTitle>
    <Paragraph position="0"> The Linguistic Component's task is to map the NL input query to the Query-Triple. GATE (Cunningham, 2002) infrastructure and resources (e.g. processing resources like ANNIE) are part of the Linguistic Component.</Paragraph>
    <Paragraph position="1"> After the execution of the GATE controller, a set of syntactic annotations associated with the input query are returned. These annotations include information about sentences, tokens, nouns and verbs. For example, we get voice and tense for the verbs and categories for the nouns, such as determinant, singular/plural, conjunction, possessive, determiner, preposition, existential, wh-determiner, etc. When developing AquaLog we extended the set of annotations returned by GATE, by identifying terms, relations, question indicators (which/who/when. etc.) and patterns or types of questions. This is achieved through the use of Jape grammars, which consist of a set of phases, that run sequentially, and each phase is defined as a set of pattern rules, which allow us to recognize regular expressions using previous annotations in documents.</Paragraph>
    <Paragraph position="2"> Thanks to this architecture that takes advantage of the Jape grammars, although we can still only deal with a subset of NL, it is possible to extend this subset in a relatively easy way by updating the regular expressions in the Jape grammars. This ensures the easy portability of the system with respect to both ontologies and natural languages.</Paragraph>
    <Paragraph position="3"> Currently, the linguistic component, through the Jape grammars, dynamically identifies around 14 different linguistic categories or intermediate representations, including: basic queries requiring an affirmation/negation or a description as an answer; or the big set of queries constituted by a wh-question (such as the ones starting with: what, who, when, where, are there any, does anybody/anyone or how many, and imperative commands like list, give, tell, name, etc.), like &amp;quot;are there any PhD students in dotkom?&amp;quot; where the relation is implicit or unknown or &amp;quot;which is the job title of John?&amp;quot; where no information about the type of the expected answer is provided; etc.</Paragraph>
    <Paragraph position="4"> Categories tell us not only the kind of solution that needs to be achieved, but also they give an indication of the most likely common problems that the system will need to deal with to understand this particular NL query and in consequence it guides the process of creating the equivalent intermediate representation. Categories are the driving force to generate an answer by combining the triples in an appropriate way. For example, in &amp;quot;who are the academics involved in the semantic web?&amp;quot; the triple will be of the form &lt;generic term, relation, second term&gt;, i.e. &lt;academics, involved, semantic web&gt;. A query with a equivalent triple representation is &amp;quot;which technologies has KMi produced?&amp;quot;, where the triple will be &lt;technologies, has produced, KMi&gt;. However, a query like &amp;quot;are there any PhD students in akt?&amp;quot; has another equivalent representation, where the relation is implicit or unknown &lt;phd students, ?, akt&gt; . Other queries may provide little information about the type of the expected answer, i.e. &amp;quot;what is the job title of John?&amp;quot;, or they can be just a generic enquiry about someone or something, i.e. &amp;quot;who is Vanessa?&amp;quot;, &amp;quot;what is an ontology?&amp;quot; At this stage we do not have to worry about getting the representation completely right as the interpretation is completely domain independent.</Paragraph>
    <Paragraph position="5"> The role of the triple-based intermediate representation is simply to provide an easy way to represent the NL query and to manipulate the input for the RSS. Consider the request &amp;quot;List all the projects in the knowledge media institute about the semantic web&amp;quot;, where both &amp;quot;in knowledge media institute&amp;quot; and &amp;quot;about semantic web&amp;quot; are modifiers (i.e. they modify the meaning of other syntactic constituents). The problem here is to identify the constituent to which each modifier has to be attached. The RSS is responsible for resolving this ambiguity through the use of the ontology, or by interacting with the user. The linguistic component's task is therefore to pass the ambiguity problem to the RSS through the intermediate representation.</Paragraph>
    <Paragraph position="6">  Nevertheless, a query can be a composition of two basic queries. In this case, the intermediate representation usually consists of two triples, one triple per relationship. There are different ways in which queries can be combined. Firstly, queries can be combined by using a &amp;quot;and&amp;quot; or &amp;quot;or&amp;quot; conjunction operator, as in &amp;quot;which projects are funded by epsrc and are about semantic web?&amp;quot;. This query will generate two Query-Triples: &lt;projects, funded, epsrc&gt; and &lt;projects, ?, semantic web&gt; and the subsequent answer will be a combination of both lists obtained after resolving each triple. Secondly, a query may be conditioned to a second query, as in &amp;quot;which researchers wrote publications related to social aspects?&amp;quot; which generates the Query-Triples &lt;researchers, wrote, publications&gt; and &lt;which are, related, social aspects&gt;,where the second clause modifies one of the terms in the first triple. In this example, ambiguity cannot be solved by linguistic procedures; therefore the term to be modified by the second clause remains uncertain.</Paragraph>
  </Section>
  <Section position="5" start_page="270" end_page="270" type="metho">
    <SectionTitle>
3 Relation Similarity Service
</SectionTitle>
    <Paragraph position="0"> This is the backbone of the QA system. The RSS component is invoked after the NL query has been transformed into a term-relation form and classified into the appropriate category. Essentially the RSS tries to make sense of the input query by looking at the structure of the ontology, string metrics  , WordNet, and a domain-dependent lexicon obtained by the Learning Mechanism.</Paragraph>
    <Paragraph position="1"> In any non-trivial NL system, it is important to deal with the various sources of ambiguity. Some sentences are structurally ambiguous and although general world knowledge does not resolve this ambiguity, within a specific domain it may happen that only one of the interpretations is possible. The key issue here is to determine some constraints derived from the domain knowledge and to apply them in order to resolve ambiguity. Whether the ambiguity cannot be resolved by domain knowledge the only reasonable course of action is to get the user to choose between the alternative readings. Moreover, since every item on the onto-triple is an entry point in the KB or ontology the user has the possibility to navigate through them. In fact, to ensure user acceptance of the system justifications are provided for every step of the user interaction.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML