File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/p03-1032_evalu.xml
Size: 5,399 bytes
Last Modified: 2025-10-06 13:58:57
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1032"> <Title>Extracting Key Semantic Terms from Chinese Speech Query for Web Searches</Title> <Section position="5" start_page="2" end_page="2" type="evalu"> <SectionTitle> 5 Experiments and analysis </SectionTitle> <Paragraph position="0"> As our system aims to correct the errors and extract CSS components in spoken queries, it is important to demonstrate that our system is able to handle queries of different characteristics. To this end, we devised two sets of test queries as follows.</Paragraph> <Paragraph position="1"> a) Corpus with short queries We devised 10 queries, each containing a CSS with only one basic component. This is the typical type of queries posed by the users on the web. We asked 10 different people to &quot;speak&quot; the queries, and used the IBM ViaVoice 98 to perform the speech to text conversion. This gives rise to a collection of 100 spoken queries. There is a total of 1,340 Chinese characters in the test queries with a speech recognition error rate of 32.5%.</Paragraph> <Paragraph position="2"> b) Corpus with long queries In order to test on queries used in standard test corpuses, we adopted the query topics (1-10) employed in TREC-5 Chinese-Language track. Here each query contains more than one key semantic component. We rephrased the queries into natural language query format, and asked twelve subjects to &quot;read&quot; the queries. We again used the IBM ViaVoice 98 to perform the speech recognition on the resulting 120 different spoken queries, giving rise to a total of 2,354 Chinese characters with a speech recognition error rate of 23.75%.</Paragraph> <Paragraph position="3"> We devised two experiments to evaluate the performance of our techniques. The first experiment was designed to test the effectiveness of our query model in extracting CSSs. The second was designed to test the accuracy of our overall system in extracting basic query components.</Paragraph> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 5.1 Test 1:Accuracy of extracting CSSs </SectionTitle> <Paragraph position="0"> The test results show that by using our query model, we could correctly extract 99% and 96% of CSSs from the spoken queries for the short and long query category respectively. The errors are mainly due to the wrong tagging of some query components, which caused the query model to miss the correct query structure, or match to a wrong structure.</Paragraph> <Paragraph position="1"> For example: given the query &quot;a0a2a1 a3a6a5a23a7a9a8a9a10 which is a nonsensical sentence. Since the probabilities of occurrence both query structures [0,9,7] and [7,9,10] are 0, we could not find the CSS at all. This error is mainly due to the mis-recognition of the last query component &quot;a7a6a8 (news)&quot; to &quot;a7a1a8 (afternoon)&quot;. It confuses the Query Model, which could not find the correct CSS.</Paragraph> <Paragraph position="2"> The overall results indicate that there are fewer errors in short queries as such queries contain only one CSS component. This is encouraging as in practice most users issue only short queries.</Paragraph> </Section> <Section position="2" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 5.2 Test 2: Accuracy of extracting basic query </SectionTitle> <Paragraph position="0"> components In order to test the accuracy of extracting basic query components, we asked one subject to manually divide the CSS into basic components, and used that as the ground truth. We compared the following two methods of extracting CSS components: null a) As a baseline, we simply performed the standard stop word removal and divided the query into components with the help of a dictionary. However, there is no attempt to correct the speech recognition errors in these components. Here we assume that the natural language query is a bag of words with stop word removed (Ricardo, 1999). Currently, most search engines are based on this approach.</Paragraph> <Paragraph position="1"> b) We applied our query model to extract CSS and employed the multi-tier mapping approach to extract and correct the errors in the basic CSS components.</Paragraph> <Paragraph position="2"> Tables 3 and 4 give the comparisons between Methods (a) and (b), which clearly show that our method outperforms the baseline method by over 20.2% and 20 % in F1 measure for the short and long queries respectively.</Paragraph> <Paragraph position="3"> The improvement is largely due to the use of our approach to extract CSS and correct the speech recognition errors in the CSS components. More detailed analysis of long queries in Table 3 reveals that our method performs worse than the baseline method in recall. This is mainly due to errors in extracting and breaking CSS into basic components. Although we used the multi-tier mapping approach to reduce the errors from speech recognition, its improvement is insufficient to offset the lost in recall due to errors in extracting CSS. On the other hand, for the short query cases, without the errors in breaking CSS, our system is more effective than the baseline in recall. It is noted that in both cases, our system performs significantly better than the baseline in terms of precision and F1 measures.</Paragraph> </Section> </Section> class="xml-element"></Paper>