File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-2016_concl.xml

Size: 2,620 bytes

Last Modified: 2025-10-06 13:55:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2016">
  <Title>Investigating Cross-Language Speech Retrieval for a Spontaneous Conversational Speech Collection</Title>
  <Section position="6" start_page="63" end_page="63" type="concl">
    <SectionTitle>
5 Conclusions and Further Investigation
</SectionTitle>
    <Paragraph position="0"> The system described in this paper obtained the best results among the seven teams that participated in the CLEF 2005 CL-SR track. We believe that this results from our use of the 38 training topics to find a term weighting scheme that is particularly suitable for this collection. Relevance judgments are typically not available for training until the second year of an IR evaluation; using a search-guided process that does not require system results to be available before judgments can be performed made it possible to accelerate that timetable in this case. Table 2 shows that performance varies markedly with the choice of weighting scheme. Indeed, some of the classic weighting schemes yielded much poorer results than the one we ultimately selected. In this paper we presented results on the test queries, but we observed similar effects on the training queries.</Paragraph>
    <Paragraph position="1"> On combined manual and automatic data, the best MAP score we obtained for English topics is 0.4647. On automatic data, the best MAP is 0.2176.</Paragraph>
    <Paragraph position="2"> This difference could result from ASR errors or from terms added by human indexers that were not available to the ASR system to be recognized. In future work we plan to investigate methods of removing or correcting some of the speech recognition errors in the ASR transcripts using semantic coherence measures.</Paragraph>
    <Paragraph position="3"> In ongoing further work we are exploring the relationship between properties of the collection and the weighting schemes in order to better understand the underlying reasons for the demonstrated effectiveness of the mpc.ntn weighting scheme.</Paragraph>
    <Paragraph position="4"> The challenges of CLEF CL-SR task will continue to expand in subsequent years as new collections are introduced (e.g., Czech interviews in 2006). Because manually assigned segment boundaries are available only for English interviews, this will yield an unknown topic boundary condition that is similar to previous experiments with automatically transcribed broadcast news the Text Retrieval Conference (Garafolo et al, 2000), but with the additional caveat that topic boundaries are not known for the ground truth relevance judgments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML