XML Viewer - a00-1022

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-1022_metho.xml
Size: 23,760 bytes
Last Modified: 2025-10-06 14:07:01
<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-1022">
  <Title>Message Classification in the Call Center</Title>
  <Section position="3" start_page="0" end_page="158" type="metho">
    <SectionTitle>
2 Data Characteristics
</SectionTitle>
    <Paragraph position="0"> A closer look at the data the ICe-MAIL system is processing will clarify the task further. We carried out experiments with unmodified e-mail data accumulated over a period of three months in the call center database. The total amount was 4777 e-mails.</Paragraph>
    <Paragraph position="1">  We used 47 categories, which contained at least 30 documents. This minimum amount of documents turned out to render the category sufficiently distinguishable for the SML tools. The database contained 74 categories with at least 10 documents, but the selected ones covered 94% of all e-malls, i.e. 4490 documents.</Paragraph>
    <Paragraph position="2"> It has not yet generally been investigated how the type of data influences the learning result (Yang, 1999), or under which circumstances which kind of preprocessing and which learning algorithm is most appropriate. Several aspects must be considered: Length of the documents, morphological and syntactic well-formedness, the degree to which a document can be uniquely classified, and, of course, the language of the documents.</Paragraph>
    <Paragraph position="3"> In our application domain the documents differ very much from documents generally used in benchmark tests, for example the Reuters corpus 1. First of all, we have to deal with German, whereas the Reuters data are in English. The average length of our e-mails is 60 words, whereas for documents of Reuters-21578 it is 129 words. The number of categories we used compares to the top 47 categories of the Reuters TOPICS category set. While we have 5008 documents, TOPICS consists of 13321 instances 2. The Reuters documents usually are morphologically and syntactically well-formed. As e-mails are a more spontaneously created and informal type of document, they require us to cope with a large amount of jargon, misspellings and gram- null dards was a major argument in favor of STP instead of in-depth syntactic and semantic analysis.</Paragraph>
    <Paragraph position="4"> The degree to which a document can be uniquely classified is hard to verify and can only be inferred from the results in general terms. 3 It is, however, dependent on the ability to uniquely distinguish the classes. In our application we encounter overlapping and non-exhaustive categories as the category system develops over time.</Paragraph>
  </Section>
  <Section position="4" start_page="158" end_page="159" type="metho">
    <SectionTitle>
3 Integrating Language Technology
With Machine Learning
</SectionTitle>
    <Paragraph position="0"> STP and SML correspond to two different paradigms. STP tools used for classification tasks promise very high recall/precision or accuracy values. Usually human experts define one or several template structures to be filled automatically by extracting information from the documents (cf. e.g.</Paragraph>
    <Paragraph position="1"> (Ciravegna et al., 1999)). Afterwards, the partially  only be treated manually, as described in Section 5. filled templates are classified by hand-made rules. The whole process brings about high costs in analyzing and modeling the application domain, especially if it is to take into account the problem of changing categories in the present application.</Paragraph>
    <Paragraph position="2"> SML promises low costs both in analyzing and modeling the application at the expense of a lower accuracy. It is independent of the domain on the one hand, but does not consider any domain specific knowledge on the other.</Paragraph>
    <Paragraph position="3"> By combining both methodologies in ICe-MAIL, we achieve high accuracy and can still preserve a useful degree of domain-independence. STP may use both general linguistic knowledge and linguistic algorithms or heuristics adapted to the application in order to extract information from texts that is relevant for classification. The input to the SML tool is enriched with that information. The tool builds one or several categorizers 4 that will classify new texts. In general, SML tools work with a vector representation of data. First, a relevancy vector of relevant features for each class is computed (Yang and Pedersen, 1997). In our case the relevant features consist of the user-defined output of the linguistic preprocessor. Then each single document is translated into a vector of numbers isomorphic to the defining vector. Each entry represents the occurrence of the corresponding feature. More details will be given in Section 4 The ICe-MAIL architecture is shown in Figure 1.</Paragraph>
    <Paragraph position="4"> The workflow of the system consists of a learning step carried out off-line (the light gray box) and an online categorization step (the dark gray box). In the off-line part, categorizers are built by processing classified data first by an STP and then by an SML tool. In this way, categorizers can be replaced by the system administrator as she wants to include new or remove expired categories. The categorizers are used on-line in order to classify new documents after they have passed the linguistic preprocessing. The resulting category is in our application associated with a standard text that the call center agent uses in her answer. The on-line step provides new classified data that is stored in a dedicated ICe-MAIL database (not shown in Figure 1). The relearning step is based on data from this database.</Paragraph>
    <Section position="1" start_page="158" end_page="159" type="sub_section">
      <SectionTitle>
3.1 Shallow Text Processing
</SectionTitle>
      <Paragraph position="0"> Linguistic preprocessing of text documents is carried out by re-using sines, an information extraction core system for real-world German text processing (Neumann et al., 1997). The fundamental design criterion of sines is to provide a set of basic, powerful, robust, and efficient STP components and  generic linguistic knowledge sources that can easily be customized to deal with different tasks in a flexible manner, sines includes a text tokenizer, a lexical processor and a chunk parser. The chunk parser itself is subdivided into three components. In the first step, phrasal fragments like general nominal expressions and verb groups are recognized. Next, the dependency-based structure of the fragments of each sentence is computed using a set of specific sentence patterns. Third, the grammatical functions are determined for each dependency-based structure on the basis of a large subcategorization lexicon.</Paragraph>
      <Paragraph position="1"> The present application benefits from the high modularity of the usage of the components. Thus, it is possible to run only a subset of the components and to tailor their output. The experiments described in Section 4 make use of this feature.</Paragraph>
    </Section>
    <Section position="2" start_page="159" end_page="159" type="sub_section">
      <SectionTitle>
3.2 Statistics-Based Machine Learning
</SectionTitle>
      <Paragraph position="0"> Several SML tools representing different learning paradigms have been selected and evaluated in different settings of our domain: Lazy Learning: Lazy Learners are also known as memory-based, instance-based, exemplarbased, case-based, experience-based, or k-nearest neighbor algorithms. They store all documents as vectors during the learning phase.</Paragraph>
      <Paragraph position="1"> In the categorization phase, the new document vector is compared to the stored ones and is categorized to same class as the k-nearest neighbors. The distance is measured by computing e.g. the Euclidean distance between the vectors.</Paragraph>
      <Paragraph position="2"> By changing the number of neighbors k or the kind of distance measure, the amount of generalization can be controlled.</Paragraph>
      <Paragraph position="3"> We used IB (Aha, 1992), which is part of the MLC++ library (Kohavi and Sommerfield, 1996).</Paragraph>
      <Paragraph position="4"> Symbolic Eager Learning: This type of learners constructs a representation for document vectors belonging to a certain class during the learning phase, e.g. decision trees, decision rules or probability weightings. During the categorization phase, the representation is used to assign the appropriate class to a new document vector. Several pruning or specialization heuristics can be used to control the amount of generalization. null We used ID3 (Quinlan, 1986), C4.5 (Quinlan, 1992) and C5.0, RIPPER (Cohen, 1995), and the Naive Bayes inducer (Good, 1965) contained in the MLCq-q- library. ID3, C4.5 and C5.0 produce decision trees, RIPPER isa rule-based learner and the Naive Bayes algorithm computes conditional probabilities of the classes from the instances.</Paragraph>
      <Paragraph position="5"> Support Vector Machines (SVMs): SVMs are described in (Vapnik, 1995). SVMs are binary learners in that they distinguish positive and negative examples for each class. Like eager learners, they construct a representation during the learning phase, namely a hyper plane supported by vectors of positive and negative examples. For each class, a categorizer is built by computing such a hyper plane. During the categorization phase, each categorizer is applied to the new document vector, yielding the probabilities of the document belonging to a class. The probability increases with the distance of thevector from the hyper plane. A document is said to belong to the class with the highest probability.</Paragraph>
      <Paragraph position="6"> We chose SVM_Light (Joachims, 1998).</Paragraph>
      <Paragraph position="7"> Neural Networks: Neural Networks are a special kind of &amp;quot;non-symbolic&amp;quot; eager learning algo-</Paragraph>
      <Paragraph position="9"> rithm. The neural network links the vector elements to the document categories The learning phase defines thresholds for the activation of neurons. In the categorization phase, a new document vector leads to the activation of a single category. For details we refer to (Wiener et al., 1995).</Paragraph>
      <Paragraph position="10"> In our application, we tried out the Learning Vector Quantization (LVQ) (Kohonen et al., 1996). LVQ has been used in its default configuration only. No adaptation to the application domain has been made.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="159" end_page="161" type="metho">
    <SectionTitle>
4 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> We describe the experiments and results we achieved with different linguistic preprocessing and learning algorithms and provide some interpretations.</Paragraph>
    <Paragraph position="1"> We start out from the corpus of categorized e-mails described in Section 2. In order to normalize the vectors representing the preprocessing results of texts of different length, and to concentrate on relevant material (cf. (Yang and Pedersen, 1997)), we define the relevancy vector as follows. First, all documents are preprocessed, yielding a list of results for each category. From each of these lists, the 100 most frequent results - according to a TF/IDF measure - are selected. The relevancy vector consists of all selected results, where doubles are eliminated.</Paragraph>
    <Paragraph position="2"> Its length was about 2500 for the 47 categories; it slightly varied with the kind of preprocessing used.</Paragraph>
    <Paragraph position="3"> During the learning phase, each document is preprocessed. The result is mapped onto a vector of the same length as the relevancy vector. For every position in the relevancy vector, it is determined whether the corresponding result has been found. In that case, the value of the result vector element is 1, otherwise it is 0.</Paragraph>
    <Paragraph position="4"> In the categorization phase, the new document is preprocessed, and a result vector is built as described above and handed over to the categorizer (cf. Figure 1).</Paragraph>
    <Paragraph position="5"> While we tried various kinds of linguistic preprocessing, systematic experiments have been carried out with morphological analysis (MorphAna), shallow parsing heuristics (STP-Heuristics), and a combination of both (Combined).</Paragraph>
    <Paragraph position="6"> MorphAna: Morphological Analysis provided by sines yields the word stems of nouns, verbs and adjectives, as well as the full forms of unknown words. We are using a lexicon of approx. 100000 word stems of German (Neumann et al., 1997).</Paragraph>
    <Paragraph position="7"> STP-Heuristics: Shallow parsing techniques are used to heuristically identify sentences containing relevant information. The e-mails usually contain questions and/or descriptions of problems. The manual analysis of a sample of the data suggested some linguistic constructions frequently used to express the problem. We expected that content words in these constructions should be particularly influential to the categorization. Words in these constructions are extracted and processed as in MorphAna, and all other words are ignored. 5 The heuristics were implemented in ICC-MAIL using sines.</Paragraph>
    <Paragraph position="8"> The constructions of interest include negations at the sentence and the phrasal level, yes-no and wh-questions, and declaratives immediately preceding questions. Negations were found to describe a state to be changed or to refer to missing objects, as in I cannot read my email or There is no correct date. We identified them through negation particles. 8 Questions most often refer to the problem in hand, either directly, e.g. How can I start my email program. ~ or indirectly, e.g. Why is this the case?. The latter most likely refers to the preceding sentence, e.g. My system drops my e-mails. Questions are identified by their word order, i.e. yes-no questions start with a verb and wh-questions with a wh-particle.</Paragraph>
    <Paragraph position="9"> Combined: In order to emphasize words found relevant by the STP heuristics without losing other information retrieved by MorphAna, the previous two techniques are combined. Emphasis is represented here by doubling the number of occurrences of the tokens in the normalization phase, thus increasing their TF/IDF value. Call center agents judge the performance of ICC-MAIL most easily in terms of accuracy: In what percentage of cases does the classifier suggest the correct text block? In Table 1, detailed information about the accuracy achieved is presented. All experiments were carried out using 10-fold cross-validation on the data described in Section 2.</Paragraph>
    <Paragraph position="10"> In all experiments the SVM_Light system outperformed other learning algorithms, which confirms Yang's (Yang and Liu, 1999) results for SVMs fed with Reuters data. The k-nearest neighbor algorithm IB performed surprisingly badly although different values ofk were used. For IB, ID3, C4.5, C5.0, Naive Bayes, RIPPER and SVM_Light, linguistic preprocessing increased the overall performance. In fact, the method performing best, SVM_Light, gained 3.5% by including the task-oriented heuristics. However, the boosted RIPPER and LVQ scored a decreased accuracy value there. For LVQ the decrease may be due to the fact that no adaptations to  results, allowing to measure the accuracy of the top five alternatives (Best5). the domain were made, such as adapting the number of codebook vectors, the initial learning parameters or the number of iterations during training (cf. (Kohonen et al., 1996)). Neural networks are rather sensitive to misconfigurations. The boosting for RIPPER seems to run into problems of overfitting. We noted that in six trials the accuracy could be improved in Combined compared to MorphAna, but in four trials, boosting led to deterioration. This effect is also mentioned in (Quinlan, 1996).</Paragraph>
    <Paragraph position="11"> These figures are slightly lower than the ones reported by (Neumann and Schmeier, 1999) that were obtained from a different data set. Moreover, these data did not contain multiple queries in one e-mall.</Paragraph>
    <Paragraph position="12"> It would be desirable to provide explanations for the behavior of the SML algorithms on our data. As we have emphasized in Section 2, general methods of explanation do not exist yet. In the application in hand, we found it difficult to account for the effects of e.g. ungrammatical text or redundant categories. For the time being, we can only offer some speculative and inconclusive assumptions: Some of the tools performing badly - IB, ID3, and the Naive Bayes inducer of the MLC++ library - have no or little pruning ability. With rarely occurring data, this leads to very low generalization rates, which again is a problem of overfitting. This suggests that a more canonical representation for the many ways of expressing a technical problem should be sought for. Would more extensive linguistic preprocessing help? Other tests not reported in Table 1 looked at improvements through more general and sophisticated STP such as chunk parsing. The results were very discouraging, leading to a significant decrease compared to MorphAna. We explain this with the bad compliance of e-mall texts to grammatical standards (cf. the example in Figure 2).</Paragraph>
    <Paragraph position="13"> However, the practical usefulness of chunk parsing or even deeper language understanding such as semantic analysis may be questioned in general: In a moving domain, the coverage of linguistic knowledge will always be incomplete, as it would be too expensive for a call center to have language technology experts keep pace with the occurrence of new to~ ics. Thus the preprocessing results will often differ for e-mails expressing the same problem and hence not be useful for SML.</Paragraph>
    <Paragraph position="14"> As a result of the tests in our application domain, we identified a favorite statistical tool and found that task-specific linguistic preprocessing is encouraging, while general STP is not.</Paragraph>
  </Section>
  <Section position="6" start_page="161" end_page="163" type="metho">
    <SectionTitle>
5 Implementation and Use
</SectionTitle>
    <Paragraph position="0"> In this section we describe the integration of the ICC-MAIL system into the workflow of the call center of AOL Bertelsmann Online GmbH &amp; Co. KG, which answers requests about the German version of AOL software. A client/server solution was built that allows the call center agents to connect as clients to the ICe-MAIL server, which implements the system described in Section 3. For this purpose, it was necessary to * connect the server module to AOL's own Sybase database that delivers the incoming mail and dispatches the outgoing answers, and to Ice-MAIL'S own database that stores the classified e-mall texts; * design the GUI of the client module in a self-explanatory and easy to use way (cf. Figure 2). The agent reads in an e-mall and starts ICe-MAIL using GUI buttons. She verifies the correctness of the suggested answer, displaying and perhaps selecting alternative solutions. If the agent finds the appropriate answer within these proposals, the associated text is filled in at the correct position of the answer e-mall. If, on the other hand, no proposed solution is found to be adequate, the ICe-MAIL tool can still be used to manually select any text block  and copy them into a backup folder.</Paragraph>
    <Paragraph position="1"> Then remove the AOL-Software using the Windows Control Panel and reinstall it from your CD.</Paragraph>
    <Paragraph position="2"> Alter reinstallation please copy the data from the bac~p folder into the dght destinations.</Paragraph>
    <Paragraph position="3">  input is based on the following original text, which is similarly awkward though understandable: Wie mache ich zurn mein Programm total deinstalieren, und wieder neu instalierem, mit, wen Sic mir senden Version 4.0 ??????????????. The suggested answer text is associated with the category named &amp;quot;Delete &amp; Reinstall AOL 4.0&amp;quot;. Four alternative answers can be selected using the tabs. The left-hand side window displays the active category in context.</Paragraph>
    <Paragraph position="4"> from the database. The ICe-MAIL client had to provide the functionality of the tool already in use since an additional tool was not acceptable to the agents, who are working under time pressure.</Paragraph>
    <Paragraph position="5"> In the answer e-mail window, the original e-mail is automatically added as a quote. If an e-mail contains several questions, the classification process can be repeated by marking each question and iteratively applying the process to the marked part. The agent can edit the suggested texts before sending them off.</Paragraph>
    <Paragraph position="6"> In each case, the classified text together with the selected category is stored in the ICe-MAIL database for use in future learning steps.</Paragraph>
    <Paragraph position="7"> Other features of the ICe-MAIL client module include a spell checker and a history view. The latter displays not only the previous e-mails of the same author but also the solutions that have been proposed and the elapsed time before an answer was sent.</Paragraph>
    <Paragraph position="8"> The assumed average time for an agent to answer an e-mail is a bit more than two minutes with AOL's own mail processing system. ~ With the ICC-MAIL system the complete cycle of fetching the mail, checking the proposed solutions, choosing the appropriate solutions, inserting additional text fragments and sending the answer back can probably be achieved in half the time. Systematic tests sup~This system does not include automatic analysis of mails. porting this claim are not completed yet, s but the following preliminary results are encouraging: * A test under real-time conditions at the call-center envisaged the use of the ICe-MAIL system as a mail tool only, i.e. without taking advantage of the system's intelligence. It showed that the surface and the look-and-feel is accepted and the functionality corresponds to the real-time needs of the call center agents, as users were slightly faster than within their usual environment. null * A preliminary test of the throughput achieved by using the STP and SML technology in Ice-MAIL showed that experienced users take about 50-70 seconds on average for one cycle, as described above. This figure was gained through experiments with three users over a duration of about one hour each.</Paragraph>
    <Paragraph position="9"> Using the system with a constant set of categories will improve its accuracy after repeating the off-line learning step. If a new category is introduced, the accuracy will slightly decline until 30 documents are manually classified and the category is automatically included into a new classifier. Relearning may take place at regular intervals. The definition of new categories must be fed into ICe-MAIL by a &amp;quot;knowledge  engineer&amp;quot;, who maintains the system. The effects of new categories and new data have not been tested yet.</Paragraph>
    <Paragraph position="10"> The optimum performance of ICe-MAIL can be achieved only with a well-maintained category system. For a call center, this may be a difficult task to achieve, espescially under severe time pressure, but it will pay off. In particular, all new categories should be added, outdated ones should be removed, and redundant ones merged. Agents should only use these categories and no others. The organizational structure of the team should reflect this by defining the tasks of the &amp;quot;knowledge engineer&amp;quot; and her interactions with the agents.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML