File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-1104_evalu.xml
Size: 4,081 bytes
Last Modified: 2025-10-06 13:58:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1104"> <Title>Semantic Indexing using WordNet Senses</Title> <Section position="8" start_page="41" end_page="42" type="evalu"> <SectionTitle> 77 Results </SectionTitle> <Paragraph position="0"> The system was tested on the Cranfield collection, including 1400 documents, SGML formated 4. From the 225 questions provided with this collection, we randomly selected 50 questions and used them to create a benchmark against which we have performed the three runs described in the previous sections: RW N Stem , RW N O f f set and 1-~W N HyperO f f set. For each of .these questions, the system forms three types of queries, as described above. Below, we present 10 of these questions and show the results obtained in Table 2.</Paragraph> <Paragraph position="1"> I. Has anyone investigated the effect of surface mass transfer on hypersonic ~'L~cwas interactions? $. What is the combined effect of surface heat and mass transfer on hypersonic flow? 3. What are the existing solutions for hypersonic viscous interactions over an insulated fiat plate? 4. What controls leading-edge attachment at transonic velocities ? 5. What are wind-tunnel corrections for a two-dimensional aerofoil mounted off-centre in a tunnel? 6. What is the present state of the theory of quasi-conical flows ? 7. References on the methods available for accurately estimating aerodynamic heat transfer to conical bodies for both laminar and turbulent flow.</Paragraph> <Paragraph position="2"> 8. What parameters can seriously influence natural transition from laminar to turbulent flow on a model in a wind tunnel? 9. Can a satisfactory e~perimental technique be devel null oped for measuring oscillatory derivatives on slender stingmounted models in supersonic wind tunnels? I0. Recent data on shock-induced boundary-layer separation. Three measures are used in the evaluation of the system performance: (1) precision, de.. fined as the number of relevant documents retrieved over the total number of documents retrieved; (2) real/, defined as the number of relevant documents retrieved over the total number of relevant documents found in the collection and (3) F-measure, which combines both the precision and recall into a single formula: null</Paragraph> <Paragraph position="4"> where P is the precision, R is the recall and is the relative importance given to recall over precision. In our case, we consider both precision and recall of equal importance, and thus the factor fl in our evaluation is 1.</Paragraph> <Paragraph position="5"> The tests over the entire set of 50 questions led to 0.22 precision and 0.25 recall when the WordNet stemmer is used, 0.23 precision and 0.29 recall when using a combined word-based and synset-based indexing. The usage of hypernym synsets led to a recall of 0.32 and a precision of 0.21.</Paragraph> <Paragraph position="6"> The relative gain of the combined word-based and synset-based indexing respect to the basic word-based indexing was 16% increase in recall and 4% increase in precision. When using the hypernym synsets, there is a 28% increase in recall, with a 9% decrease in precision.</Paragraph> <Paragraph position="7"> The conclusion of these experiments is that indexing by synsets, in addition to the classic word-based indexing, can actually improve IR effectiveness. More than that, this is the first time to our knowledge when a WSD algorithm for open text was actually used to automaticaUy disambiguate a collection of texts prior to indexing, with a disambiguation accuracy high enough to actually increase the recall and precision of an IR system.</Paragraph> <Paragraph position="8"> An issue which can be raised here is the efficiency of such a system: we have introduced a WSD stage into the classic IR process and it is well known that WSD algorithm.~ are usually computationally intensive; on the other side, the disambiguation of a text collection is a process which can be highly parallelized, and thus this does not constitute a problem anymore.</Paragraph> </Section> class="xml-element"></Paper>