XML Viewer - p98-1040

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1040_evalu.xml
Size: 7,627 bytes
Last Modified: 2025-10-06 14:00:27
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1040">
  <Title>Dialogue Management in Vector-Based Call Routing</Title>
  <Section position="6" start_page="259" end_page="261" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="259" end_page="260" type="sub_section">
      <SectionTitle>
5.1 The Routing Module
</SectionTitle>
      <Paragraph position="0"> We performed an evaluation of the routing module of our call router on a fresh set of 389 calls to a human operator. 5 Out of the 389 requests, 307 are unambiguous and routed to their correct destinations, and 82 were ambiguous and annotated with a list of candidate destinations.</Paragraph>
      <Paragraph position="1"> Unfortunately, in this test set, only the caller's first utterance was transcribed. Thus we have no information about where the ambiguous calls were eventually routed.</Paragraph>
      <Paragraph position="2"> The routing decision made for each call is classified into one of 8 groups, as shown in Figure 3. For instance,  group la contains those calls which are 1) actually unambiguous, 2) considered unambiguous by the router, and 3) routed to the correct destination. On the other hand, group 3b contains those calls which are 1) actually ambiguous, 2) considered by the router to be unambiguous, and 3) routed to a destination which is not one of the potential destinations.</Paragraph>
      <Paragraph position="3"> We evaluated the router's performance on three sub-sets of our test data, unambiguous requests alone, ambiguous requests alone, and all requests combined. For each set of data, we calculated a lowerbound performance, which measures the percentage of calls that are correctly routed, and an upperbound performance, which measures the percentage of calls that are either correctly routed or have the potential to be correctly routed. Table 3 shows how the upperbounds and lowerbounds are computed based on the classification in Figure 3 for each of the three data sets. For instance, for unambiguous requests (classes 1 and 2), the lowerbound is the number of calls actually routed to the correct destination (la) divided by the number of total unambiguous requests, while the upperbound is the number of calls actually routed to the correct destination (1 a) plus the number of calls which the router finds to be ambiguous between the correct destination and some other destination(s) (2a), divided by the number of unambiguous queries. The calls in category 2a are considered to be potentially correct because it is likely that the call will be routed to the correct destination after disambiguation.</Paragraph>
      <Paragraph position="4"> Table 4 shows the upperbound and lowerbound performance for each of the three test sets. These results show  that the system's overall performance will fall somewhere between 75.6% and 97.2%. The actual performance of the system is determined by two factors: 1) the performance of the disambiguation module, which determines the correct routing rate of the 16.6% of the un-ambiguous calls that were considered ambiguous by the router (class 2a), and 2) the percentage of calls that were routed correctly out of the 40.4% ambiguous calls that were considered unambiguous and routed by the router (class 3a). Note that the performance figures in Table 4 are the result of 100% automatic routing, since no request in our test set failed to evoke at least one candidate destination. In the next sections, we discuss the performance of the disambiguation module, which determines the overall system performance, and show how allowing calls to be punted to operators affects the system's performance. null</Paragraph>
    </Section>
    <Section position="2" start_page="260" end_page="261" type="sub_section">
      <SectionTitle>
5.2 The Disambiguation Module
</SectionTitle>
      <Paragraph position="0"> To evaluate our disambiguation module, we needed dialogues which satisfy two criteria: 1) the caller's first utterance is ambiguous, and 2) the operator asked a follow-up question to disambiguate the query and subsequently routed the call to the appropriate destination. We used 157 calls that meet these two criteria as our test set. Note that this test set is disjoint from the test set used in the evaluation of the router (Section 5. I), since none of the transcribed calls in the latter test set satisfied criterion (2).</Paragraph>
      <Paragraph position="1"> For each ambiguous call, the first user utterance was given to the router as input. The outcome of the router was classified as follows:  1. Unambiguous: in this case the call was routed to the selected destination. This routing was considered correct if the selected destination was the same as the actual destination and incorrect otherwise.</Paragraph>
      <Paragraph position="2"> 2. Ambiguous: in this case the router attempted to initiate disambiguation. The outcome of the routing of  these calls were determined as follows: (a) Correct, if a disambiguation query was generated which, when answered, led to the correct destination.</Paragraph>
      <Paragraph position="3"> (b) Incorrect, if a disambiguation query was generated which, when answered, could not lead to a correct destination.</Paragraph>
      <Paragraph position="4"> (c) Reject, if the router could not form a sensible query or was unable to gather sufficient information from the user after its queries and routed the call to an operator.</Paragraph>
      <Paragraph position="5"> Table 5 shows the number of calls that fall into each of the 5 categories. Out of the 157 calls, the router automatically routed 115 of them either with or without disambiguation (73.2%). Furthermore, 87.0% of these routed calls were routed to the correct destination. Notice that out of the 52 ambiguous calls that the router considered unambiguous, 40 were routed correctly (76.9%).  This is simply because our vector-based router is able to distinguish between cases where an ambiguous query is equally likely to be routed to more than one destination, and situations where the likelihood of one potential destination overwhelms that of the other(s). In the latter case, the router routes the call to the most likely destination instead of initiating disambiguation, which has been shown to be an effective strategy; not surprisingly, human operators are also prone to guess the destination based on likelihood and route callers without disambiguation.</Paragraph>
    </Section>
    <Section position="3" start_page="261" end_page="261" type="sub_section">
      <SectionTitle>
5.3 Overall Performance
</SectionTitle>
      <Paragraph position="0"> Combining results from Section 5.2 for ambiguous calls with results from Section 5.1 for unambiguous calls leads to the overall performance of the call router in Table 6.</Paragraph>
      <Paragraph position="1"> The table shows the number of calls that will be correctly routed, incorrectly routed, and rejected, if we apply the performance of the disambiguation module (Table 5) to the calls that fall into each class in the evaluation of the routing module (Section 5.1). Our results show that out of the 389 calls in our test set, 89.8% of the calls will be automatically routed by the call router. Of these calls, 93.8% (which constitutes 84.2% of all calls) will be routed to their correct destinations. This is substantially better than the results obtained by Gorin et al., who report an 84% correct routing rate with a 10% false rejection rate (routed to an operator when the call could have been automatically routed) on 14 destinations (Gorin et al., to appear). 6</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML