File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-1015_evalu.xml

Size: 7,135 bytes

Last Modified: 2025-10-06 14:00:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1015">
  <Title>A Comparative Study of the Application of Different Learning Techniques to Natural Language Interfaces</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> As case study for investigating the feasibility of the implemented machine learning algorithms, we use a multilinguM natura ! language interface to a production planning and control system (PPC). The PPC performs the mean-term scheduling of products and resources involved in the manufacturing processes, i.e. material, machines, and labor. The resulting master production schedule forms the basis of the coordination of related business services such as engineering, manufacturing, and finance. The modeled enterprise makes precision tools by using job order production and serial manufacture as basic strategies. The efficient realization of the high demands of the application exceeds the power of relational database technology. Therefore, it represents an excellent choice for deriving full advantage of the extended functionality of deductive object-oriented database systems, i Furthermore, the sophisticated functionality justifies the effective use of a natural language interface.</Paragraph>
    <Paragraph position="1"> During previous research (Winiwarter, 1994) we developed a German natural language interface based on 1000 input sentences that had been collected from users by means of questionnaires. The input sentences were then mapped to 100 command classes (10 for each class). The mapping was performed by elaborate semantic analysis; for the development of the underlying rule base we spent several man-months.</Paragraph>
    <Paragraph position="2"> Therefore, we were eager to see if we could replace this extensive effort by a machine learning component that learns the same linguistic knowledge automatically. For this purpose we divided the 1000 sentences into 900 training cases and 100 test cases. In addition, we collected 100 Japanese and 100 English test sentences to check whether the learned knowledge really operates at a semantic level independent from language-specific phenomena.</Paragraph>
    <Paragraph position="3"> As result of the encoding of the training set (see Sect. 2), we obtained the large number of 316 features, 289 for the DFL and 27 for the UVL. For the evaluation of the different machine learning algorithms we used as performance measures the success rate, i.e. the proportion of correctly classified test cases, and the top-3 rate. The latter indicates the proportion of cases where the correct classification is among the first three predicted classes. For the case of model-based approaches we had to produce additional candidates for classes. This was achieved by applying approximate methods that allow one incorrect edge along the traversal of decision trees or one divergent literal for the test of rules (see Sect. 3). Our first experiment was the comparison of the four instance-based algorithms IB1, IBi-IG, BIN-CAT, and BIN-PRO. As can be seen from the results in Table 1, BIN-CAT clearly outperforms IB1 and IBi-IG. Concerning the method BIN-PRO, which uses prototypes of classes, we achieved results at the same quality level as for BIN-CAT. This is remarkable if one considers the much more condensed representation of the learned knowledge.</Paragraph>
    <Paragraph position="4"> The comparison between the results for the individual languages shows that there is no advantage for the German test sentences. On the contrary, the test results for German are inferior to that for English or Japanese. This may be partly due to a greater deviation of the German expressions and phrases used in the test set from the ones used in the training set. Besides this, the restriction of extracted features during encoding the test set to those learned from the training set certainly performs an important filtering function. It removes language-specific syntactic particles that do not contribute to the meaning Of the input. This is especially true for the case of Japanese sentences, which possess a  completely different syntactic structure in comparison with English or German including many particles with no equivalent words in the other two languages. null The second part of the evaluation was the comparison of the four algorithms for building decision trees: IGTree, BS-tree, C4.5, and BD-tree. Besides this, we also included the SE-tree constructed by a hybrid approach (see Sect. 3). The test results in Table 2 indicate that the trees with dynamic splitting are superior to those with static splitting and that C4.5, BD-tree, and SE-tree produce results of similar quality. Table 3 compares the number of nodes, leaves, and levels for the individual trees. The two trees with dynamic splitting are much more compact than those with static splitting, with C4.5 clearly outperforming BD-tree. Finally, the hybrid SE-tree is much flatter than C4.5 but possesses a larger number of nodes and leaves.</Paragraph>
    <Paragraph position="5">  As last part of our comparative study we tested the rule-based techniques FOIL, BIN-rules, and the hybrid approach C4.5-RULES. As Table 4 shows, FOIL produces the most compact representation of learned knowledge, followed by C4.5-RULES and BIN-rules. However, according to Table 5 both BIN-rules and C4.5-RULES outperform FOIL with almost identical results.</Paragraph>
    <Paragraph position="6">  An advantage of rule-based learning in comparison with other methods is that the learned knowledge can be easily presented to the user in a clear and understandable form. The derived rules allow a transparent knowledge representation that one can use for explaining decisions of the system to the user. Figure 5 gives some examples of rule sets learned by BIN-rules for several command classes.</Paragraph>
    <Paragraph position="7"> If we take a final look at Table 1, Table 2, and Table 5, we can see that independent from the applied machine learning paradigm the achieved results reached satisfactory quality for all three groups. By considering the three best representatives BIN-CAT, C4.5, and BIN-rules, we obtain an average success rate for all three languages of 94.3 % and a top-3 rate of 98.8 %. This result is surprisingly high if one considers the complexity of the task at hand. Unfortunately, we had no possibility of a direct comparison with the results of the hand-engineered interface because the previous interface had been developed only for German based on the complete collection of 1000 sentences by using a different software. In any case, we could show that machine learning represents a sound alternative to manual knowledge acquisition for the application in natural language interfaces.</Paragraph>
    <Paragraph position="8"> Winiwarter ~4 Kambayashi 132 Learning and NL Interfaces  Class description update of purchase price for material liquidation of stock for product Update of salary for operator list of product orders grouped by status query of master data for</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML