File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/w98-1307_concl.xml

Size: 2,450 bytes

Last Modified: 2025-10-06 13:58:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1307">
  <Title>Learning Finite-State Models for Language Understanding*</Title>
  <Section position="6" start_page="75" end_page="76" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> In this work, we have presented some successful experiments on a non-trivial, useful task in naturai language understanding. Finite-State models have been learnt by the OSTIA-DR algorithm.</Paragraph>
    <Paragraph position="1"> Our attention has been centered in the possibility of reducing the demand for training data by categorizing the corpus. The experiments show a very big difference in performance between the categorized and plain training procedures. In this task, we only obtain useful results if we use categories.</Paragraph>
    <Paragraph position="2"> The Error Correcting technique for translation also permits reducing the size of corpora and still obtain useful error rates. In our task, we got a 3% in semantic-symbol error rate for a training set of approximately 6000 pairs, while for the same level of performance using the standard Viterbi algorithm requires some 10000 training pairs. This 3% error rate result corresponds to a full-sentence matching rate of 90%.</Paragraph>
    <Paragraph position="3"> On-going work on these techniques is aimed at obtaining additional training data by native speakers, so as to improve the system by following a bootstrapping procedure: the system will be trained on this additional natural or spontaneous data, the acquisition of which is driven by the system itself, guided by given task-relevant semantic stimuli. This process can be repeated until the resulting system exhibits a satisfactory performance. On the other hand,</Paragraph>
    <Paragraph position="5"> transducers generated by the embedding procedur e described in this paper may turn out to be ambiguous. Work is also being done on applying stochastical extensions of transducers, so as to deal with ambiguities by reflecting the appearance probability distribution of sentences in the training corpus. These distributions are being estimated by Maximum-Likelihood, Conditional Maximum-Likelihood, or Maximum Mutual Information Estimation \[18\]. The results of this work will be Useful as a subtask of the so-called &amp;quot;Tourist Task&amp;quot;, which is a hotel reservations task introduced in the EuTraus project\[l, 25\]</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML