File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1705_metho.xml

Size: 11,422 bytes

Last Modified: 2025-10-06 14:09:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1705">
  <Title>Indexing Student Essays Paragraphs using LSA over an Integrated Ontological Space</Title>
  <Section position="3" start_page="1" end_page="1" type="metho">
    <SectionTitle>
3 Indexing Essays Paragraphs
</SectionTitle>
    <Paragraph position="0"> An index of relations within the ontologies related to the semantic space is obtained for each binary relation derived from the essay question.</Paragraph>
    <Paragraph position="1"> Then a subset containing the higher ranked relations is selected and the similarity between each of the relations in the subset and all the documents containing essays paragraphs is also calculated by applying LSA. Finally, an average similarity value is obtained for the paragraph over the number of relations in the subset.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.1 An Ontology Integration Method to Build
the Semantic Space
</SectionTitle>
      <Paragraph position="0"> A collection of &amp;quot;pseudo&amp;quot; documents is created for each of the classes within the ontologies describing the domains tackled in the essay. The ontologies are described quantitatively using probabilistic knowledge (Florescu et al., 1997).</Paragraph>
      <Paragraph position="1"> Each of these documents contains information (name, properties and relations) about a class. The documents are represented by a vector space model (Salton et al., 1971) where each column in the term-to-document matrix represents the ontological classes and the rows represent terms occurring in the pseudo documents describing those knowledge entities.</Paragraph>
      <Paragraph position="2"> Relations within the available ontologies are also represented by a vector space model where the columns in the term-to-document matrix are a combination of two or more vectors from the term-to-document matrix representing classes. Each column represents the relation held between the combined classes. A new column representing the binary relation derived from the essay question is added to the new matrix: this new column contains the weighted frequencies of terms appearing as arguments within the relation. For each essay question, one or more binary relations are derived through parsing. For instance: given the query &amp;quot;Do koalas live in the jungle?&amp;quot; the binary relation is live_in (koala, jungle). In the case of this example, the vector representing the question contains a frequency of one in the rows corresponding to the terms koala and jungle.</Paragraph>
      <Paragraph position="3"> LSA is applied to the term-to-document matrix representing the ontological relations, the vector space model is reduced and the cosine similarity is calculated to obtain the semantic similarity between the vectors of the reduced space model.</Paragraph>
      <Paragraph position="4"> For each column, a ranking of similarity with the rest of the columns will be obtained.</Paragraph>
      <Paragraph position="5">  Given the term-to-document matrix containing a frequency f ij , the occurrence of a term in all the pseudo documents j is weighted to obtain matrix. The entries of the matrix are defined as,</Paragraph>
      <Paragraph position="7"> is the local weight for term i in the pseudo document j, g j is the global weight for term i in the collection and d ij is a normalisation factor. Then, as defined by Guo and Berry (Guo and</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1" end_page="1" type="metho">
    <SectionTitle>
4 Experiments on semantic similarity
</SectionTitle>
    <Paragraph position="0"> In order to evaluate how well LSA captures similarity, this section will describe three preliminary experiments for measuring semantic similarity between knowledge entities (i.e. binary relations and classes) of three different ontologies, the Aktive Portal Ontology (APO), the Koala Ontology (KO) and the Newspaper Ontology (NO).</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.1 Experiment 1
</SectionTitle>
      <Paragraph position="0"> The aim of this experiment is to evaluate how well LSA captures similarity between classes that belong to different ontologies. Eight classes have been selected randomly from within three ontologies and described in &amp;quot;pseudo&amp;quot; documents. The words included in each of the documents correspond to the names of the classes and slots related to the class described. The terms have been stemmed and stop words deleted before applying LSA to the term-to-document matrix built using the weighted frequencies of the term occurring within the eight documents describing the classes.</Paragraph>
      <Paragraph position="1"> Terms have been weighted according to the weighting scheme presented in section 2.1.1 with d j =1, the only difference being that terms corresponding to classes names have been multiplied by two. The similarity measures for the eight classes are obtained (See Table 1) after applying LSA with a rank of two and the cosine similarity measure to the term-to-document matrix.</Paragraph>
      <Paragraph position="2"> The results from this experiment show that, in terms of the cosine similarity measure, the class &amp;quot;Researcher&amp;quot; appears to be very similar to the class &amp;quot;Student&amp;quot; in a different ontology. The same results also show that the two classes &amp;quot;Newspaper&amp;quot; belonging to two different ontologies are very similar to each other.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.2 Experiment 2
</SectionTitle>
      <Paragraph position="0"> The aim of this experiment is to evaluate the ability of LSA to measure similarity between a predicate argument and different classes. The query is represented as an added column in the term-to-document matrix which already contains as columns the same documents representing the eight classes used in the first experiment. The column representing the query argument contains only one term corresponding to the name of one of the classes within the ontologies used in this experiment. The frequency of this term is the entry in the added column with a frequency of one multiplied by two as all the other terms representing names of classes. The results for the cosine similarity measure between the eight classes plus the query containing the term &amp;quot;student&amp;quot;, &amp;quot;newspaper&amp;quot; and &amp;quot;animal&amp;quot; after applying LSA with a rank of four (see Table 2) indicate that the most similar classes for the query containing the term &amp;quot;student&amp;quot; are the following classes: &amp;quot;Student&amp;quot; from KO, &amp;quot;Researcher&amp;quot; from APO, and &amp;quot;Parent&amp;quot; from KO.</Paragraph>
      <Paragraph position="1">  and classes belonging to different ontologies For the query containing the term &amp;quot;newspaper&amp;quot; the results shows that the most similar classes are &amp;quot;Newspaper&amp;quot; from APO, &amp;quot;Newspaper&amp;quot; from NO and &amp;quot;Standard Advertising&amp;quot; also from NO.</Paragraph>
      <Paragraph position="2"> Finally, for the query containing the term &amp;quot;animal&amp;quot;, the most similar classes in order of similarity closeness are &amp;quot;Parent&amp;quot; from KO and &amp;quot;Koala&amp;quot; also from KO.</Paragraph>
      <Paragraph position="3"> The results of this experiment indicate that LSI may be accurately used as a measure of similarity between a keyword representing a query predicate argument and a set of documents representing classes that belong to a set of different available ontologies.</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.3 Experiment 3
</SectionTitle>
      <Paragraph position="0"> The aim of this experiment is to evaluate the cosine similarity measure as a measure of semantic similarity between binary relations derived from a question or query and relations held between two classes. This measure is based on the same methodology and procedures applied to both experiments described above. For this experiment, eighteen classes have been selected arbitrarily from the three available ontologies (see Table 3).</Paragraph>
      <Paragraph position="1"> The binary relations held among the selected classes are represented as documents in a term-to-document matrix that is the union of the two pseudo documents describing the related classes.</Paragraph>
      <Paragraph position="2"> Following the same procedure as in the previous experiment, a new column representing the binary relation derived from a question is added to the matrix, but in this case it contains the terms describing the two arguments of the binary relation.</Paragraph>
      <Paragraph position="3">  used in Experiment 3 The cosine similarity between fifteen predicates and the available relations after applying LSA with a rank of four (see Table 4) show that, in eight of the fifteen cases, the similarity value is higher for the relations held between classes than between predicate arguments. In the rest of the cases, the similarity values are very close for two or more relations including the one held between classes that are the same as the predicate arguments.</Paragraph>
      <Paragraph position="4"> Another interesting observation is that, Question Binary Relation 3 (QBR3) has a cosine value more similar to Ontological Binary Relation 9 (OBR9), OBR3 and OBR4. In the case of QBR5, the cosine value is higher when measuring similarity with OBR11 and OBR12 than, for example, the cosine value when measuring similarity with OBR3 and OBR4. Similar results were obtained for QBR6 where, apart from OBR6, OBR9 has the cosine value closest to one. Other similar results are obtained for QBR11 and QBR12 where OBR5 is closer to a value of one than OBR7, OBR8 and OBR9.</Paragraph>
    </Section>
    <Section position="4" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
Relations (OBR).
</SectionTitle>
      <Paragraph position="0"> The results of this experiment indicate that the presented methodology is able to detect similarity between compact representations (binary relation arguments) and more expanded representations such as the pseudo documents representing the binary relations within the three available ontologies.</Paragraph>
    </Section>
    <Section position="5" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.4 Experiments discussion
</SectionTitle>
      <Paragraph position="0"> We expect that using LSA together with the cosine similarity measure, we will be able to pick up semantic similarity between the compacted and expanded representations of the binary relation and paragraphs from student essays. The main difference between our approach and other essay scoring approaches (e.g. The Intelligent Essay Assessor; Laundauer et al., 2000) where the scores are calibrated using LSA with pre-scored essay examples, is that our approach scores paragraphs using LSA and the cosine similarity with ontologies describing the essay domain. The experiment results in the previous sections validate our view showing that the cosine similarity may be used as a reliable score for semantic similarity between knowledge entities belonging to different data sources (i.e. terms, classes and binary relations).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML