File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0905_metho.xml

Size: 13,008 bytes

Last Modified: 2025-10-06 14:09:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0905">
  <Title>Evaluating the Performance of the OntoSem Semantic Analyzer</Title>
  <Section position="4" start_page="0" end_page="1" type="metho">
    <SectionTitle>
3 Automated Evaluation of Ontological
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
Semantic Analyses
</SectionTitle>
      <Paragraph position="0"> Once the gold standard TMRs are produced, the evaluation of OntoSem proceeds fully matically. For each in &amp;quot;runs&amp;quot; as follows: * as is: we simply input the text and evaluate the outputs; analyzer to use the first lexical sense of each word; the typically the most central and frequent ones; baseline 2: same as baseline 1, except we use the first sense that has the correct part of speech (as specified in the gold standar preprocessor results); correct preprocessor output: we use the gold standard preprocessor output as input to the syntactic and semantic * corrected syntax output: we use the gold standard syntax (and gold standard preprocessor output) as the in semantic analyzer.</Paragraph>
      <Paragraph position="1"> each run, we produce four output files: rocessor results; performed by automatically comparing the actual preprocessor, syntax or semantic results to the corresponding gold standard outputs. The evaluation produces statistics and/or measurements as follows.</Paragraph>
      <Paragraph position="2"> General text-level statistics are collected from the golden stan w that are not in the OntoSem lexicon; c) the syntactic ambiguity count, which is the number of phrases and clauses in the syntactic output; d) the semantic ambiguity count, which is the product of the number of senses of each word, which provides an estimate of the overall theoretical complexity of semantic analysis; and e) the word sense ambiguity count, which is the number of semantic combinations the analyzer actually needed to examine to produce the result; this number provides an estimate for the actual complexity of semantic analysis: syntactic clues often help prune many spurious analyses and the efficient semantic analysis algorithm (Beale, et. al. 1995) reduces the total number of combinations that have to be examined while maintaining accuracy.</Paragraph>
      <Paragraph position="3"> For this evaluation, the lexicon provided almost complete lexical coverage of the input texts (in fact only one word was missing). We will use the are ollected for each evaluation run.</Paragraph>
      <Paragraph position="4"> ches between an ctual run and the gold standard, n is the number of of hrases, and phrase attachment.</Paragraph>
      <Paragraph position="5"> turned for each phrase, with 1.0 reflecting a perfect match.  gstart is the gold standard word number ber at the start of the phrase being b) ach phrase in the gold standard syntax, it is determined if there exists a phrase with the same part of speech c) ure looks for a phrase that overlaps with it that has the same part of A s as Ana ll Score is then the average score of  , b and c.</Paragraph>
      <Paragraph position="6"> (SD) determination. For WSD, three measures are computed.</Paragraph>
      <Paragraph position="7"> on is marked with the word number from the input text from which it  ontologically &amp;quot;close&amp;quot; to the correct sense is results of this first evaluation as a baseline for future evaluation of the degradation of the results due to incompleteness of the static knowledge. Results from the operation of the preprocessor, syntactic analysis and semantic analysis c The preprocessor statistics are recorded as follows (m is the number of mat a mismatches): a) abbreviations, time, date and number recognition (m/n); b) named entity recognition (m/n); c) part of speech tagging (m/n). The overall score of the preprocessor is calculated as the average of m/m+n for all three measures. Syntactic analysis statistics measure the quality of the determination of phrase boundaries, heads p a) For phrase boundaries, an overall score between 0.0 and 1.0 is re Each phrase in the gold standard syntax output is compared to its closest match in the output under consideration.The output phrase that has the same label (NP, CL, etc.), the same head word, and the closest matching starting and ending points is used for the comparison. Each phrase is given the score: - (|gstart - start |+ |gend - end|)/(gend - gstart) where at the start of the phrase and start is the word num evaluated. Thus, if the gold standard phrase began at word 10 and ended at word 16, and the closest matching phrase in the output being evaluated began at word 9 and ended at word 17, then the score for this phrase would be 1 (|10 - 9 |+ |16 - 17|) / (16 - 10) = (1 - (2 / 6)) = 2/3. If no matching phrase could be found (i.e. no overlapping phrase could be found with the same phrase label and head word), then a score of 0.0 is assigned. The score for the whole sentence under evaluation is the average of the scores for each of the phrases.</Paragraph>
      <Paragraph position="8"> For phrase head determination, the standard (m/n) measure is used. For e and head word that overlaps with the gold standard phrase.</Paragraph>
      <Paragraph position="9"> Attachment is also measured as (m/n). For each phrase in the gold standard syntax, the evaluation proced speech, the same head word and the same constituents. For example, if the gold standard output has a PP attached to a NP, it will be shown to be a constituent of that NP. If the output being evaluated attaches the PP at a different constituent, then a mismatch will be identified.</Paragraph>
      <Paragraph position="10"> core between 0.0 and 1.0 is assigned for b and c follows: Score = m/(m+n). The Syntactic</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
lysis Overa
</SectionTitle>
      <Paragraph position="0"> a Semantic analysis statistics measure the quality of word sense disambiguation (WSD) and semantic dependency A) First, the standard match/mismatch (m/n) is used. Each TMR element in the gold standard semantic representati arose. The TMR element in the semantic representation being evaluated that corresponds to that same word number is then compared with it.</Paragraph>
      <Paragraph position="1"> Second, the evaluation system produces a weighted score for WSD complexity. An overall score betwe penalized less than a mismatch of a word with fewer senses. The score for each mismatch is 1 - (2 / number-of-senses), if the word has more than 2 senses, and 0.0 if it has less than or equal to 2 senses. An exact match is given a score of 1.0. The overall score for the sentence is the average score for each TMR element. The system also computes a weighted score for WSD &amp;quot;distance.&amp;quot; An overall score between 0.0 and 1.0 is returned. A mismatch that penalized less than a mismatch that is ontologically &amp;quot;far&amp;quot; from the correct semantics. The ontological distance is computed using the Ontosearch algorithm (Onyshkevych 1997) that returns a score between 0.0 and 1.0 reflecting how close the two concepts are in the ontology, with a score of 1.0 indicating a</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
Example Semantic Evaluation perfect match. The overall score for the
</SectionTitle>
      <Paragraph position="0"> sentence is the average score of each TMR element.</Paragraph>
      <Paragraph position="1"> The quality of semantic dependency determination is computed using the standard (m/n) measur We will now exemplify the evaluation of the semantic analysis of the sample sentence in 1:  D) e. Each TMR element in the gold standard is compared to the corresponding 1. Hall is scheduled to embark on the 12 hour  TMR element in the semantics being evaluated. Each property modifying the gold standard TMR element that is also in the evaluation TMR element increments the m count, each property in the gold standard TMR element that is not in the evaluation TMR element increments the n count. The fillers of matching properties are also compared. If the filler of the gold standard property is another TMR element (as opposed to being a literal), then the filler is also matched against the corresponding filler in the semantic representation being evaluated, incrementing the m and n counters as appropriate. The relations between TMR elements is one of the central aspects of Ontological Semantics which goes beyond simple word sense disambiguation. This score reflects how well the dependency determination was performed. The analyzer produces the syntactic analysis shown in Figure 3. This analysis contains many spurious parses (along with the correct ones). The gold standard parse of this sentence is shown in  the number of edges can be visually compared. In order to make an interesting evaluation example, we forced the semantic analyzer to misinterpret capital. The analyzer actually chose the correct sense, CAPITAL-CITY, but here we will force it to select the monetary sense, CAPITAL.</Paragraph>
      <Paragraph position="2"> We will now demonstrate the calculation and significance of the semantic evaluation parameters. A) Match/mismatch of TMR elements. In this example, there will be six matches and one mismatch - the CAPITAL concept that should be CAPITAL-CITY. A score of 6/7 = 0.86 is also calculated for use in the overall semantic score.</Paragraph>
      <Paragraph position="3"> B) Weighted score for WSD complexity. The word capital has three senses in our English lexicon, corresponding to the CAPITAL-CITY, CAPITAL (i.e. monetary) and CAPITAL-EQUIPMENT meanings. It will receive a score of 1 - 2/number-of-senses = 1 - 2/3 = 0.33. If there were two or less senses, it would have received a score of 0.0. If there were many senses of capital, its score would have been higher, reflecting the fact that there was a more complex disambiguation problem. The other six TMR elements receive a score of 1.0. The total score for the sentence is therefore 6.33/7 =0.90.</Paragraph>
      <Paragraph position="4">  A normalized score between 0.0 and 1.0 is calculated for a and d as follows: Score = m/(m+n).</Paragraph>
      <Paragraph position="5"> C) Weighted score for WSD distance. We determine the distance between the chosen eaning, -CITY, by submitting the concept pair (ontosearch capital capital-city) - 0.525</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1" end_page="1" type="metho">
    <SectionTitle>
CT
EED
DEED IS-A DOCUMENT
</SectionTitle>
    <Paragraph position="0"> d to onnect the two concepts reflects this. So the score lysis of capital. In other cases, mismatched dependencies can arise by incorrect linking A score Our first evaluation run returned the results summarized in Tables 1 and 2. The motivation for was given in future evaluations, for stance, by using the corresponding components of the Stanford Lexicalized Parser (accessible from http://nl u/).</Paragraph>
    <Paragraph position="1"> meaning, CAPITAL, and the correct m</Paragraph>
  </Section>
  <Section position="6" start_page="1" end_page="1" type="metho">
    <SectionTitle>
CAPITAL IS-A FINANCIAL-OBJE
FINANCIAL-OBJECT SUBCLASSES D
DOCUMENT PRODUCED-BY NATION
NATION LOCATION-OF CITY
CITY SUBCLASSES CAPITAL-CITY
</SectionTitle>
    <Paragraph position="0"> Ontosearch returns a score between 0.0 and 1.0 reflecting the closeness of the two concepts. An exact match would return a score of 1.0.</Paragraph>
    <Paragraph position="1"> Ontosearch also returns the path traversed to link the two concepts. In this case, the score returned is relatively low, and the &amp;quot;strange&amp;quot; path neede c for this TMR element is 0.52. The other TMR elements in the sentence all receive a score of 1.0, so the score for the sentences is 6.52/7 = 0.93. D. Semantic dependency determination. In the example input, there are six links between TMR elements. Thus, the instance of SCHEDULE-EVENT has as its THEME the instance of TRAVEL-EVENT, which has an instance of CAPITAL as its DESTINATION, an instance of HUMAN as its AGENT and an instance of HOUR as its DURATION. CAPITAL is linked to NATION and CITY. Each link is checked against the gold standard. In this case, all six links match. This increments the dependecy match counter by six. The fillers of the link, i.e. the TMR element that it points to, are also checked. For this example, the DESTINATION of the TRAVEL- EVENT should be CAPITAL-CITY, but it is CAPITAL. This increments the mismatch counter by one. The other five fillers match with the gold standard, thus the match counter is incremented by 5. For the whole sentence, the dependency matches will be 11 and the mismatches will be 1. In this case, the mismatched dependency was caused by the misana between syntactic and semantic structures.</Paragraph>
    <Paragraph position="2"> of 11/12 = 0.92 is calculated for use in the overall score.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML