File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/w94-0109_intro.xml

Size: 7,426 bytes

Last Modified: 2025-10-06 14:05:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W94-0109">
  <Title>Integrating Symbolic and Statistical Approaches in Speech and Natural Language Applications</Title>
  <Section position="2" start_page="0" end_page="69" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Symbolic and statistical approaches have both made significant contributions in speech and natural language processing. However, they have traditionally been kept separate and applied to very different kinds of problems.</Paragraph>
    <Paragraph position="1"> Most speech recognition systems use statistical techniques exclusively, whereas natural language (NL) systems are mostly symbolic. We are seeing more integration of statistical methods in NL, but usually in some well defined component, such as a statistically based part of speech tagger as a preprocessor to parsing.</Paragraph>
    <Paragraph position="2"> In this paper, we have two goals: first, to characterize the kinds of problems that are most amenable to each of these approaches and, second, to show how we have integrated the approaches in our work in information extraction from speech, topic classification, and word and phrase spotting.</Paragraph>
    <Paragraph position="3"> We begin with a brief overview characterizing the two approaches, then discuss in more detail how we have integrated these two approaches in our work.</Paragraph>
    <Section position="1" start_page="0" end_page="69" type="sub_section">
      <SectionTitle>
1.1 Characterizing symbolic and statistical approaches
</SectionTitle>
      <Paragraph position="0"> Symbolic approaches have dominated work in NL. By writing rules, we can take advantage of what we already know about a language or domain and we can apply a theoretical framework or study of selected examples to leverage and extend our knowledge. Most symbolic approaches also have meaningful intermediate structures that indicate what steps a system goes through in processing. Furthermore, since in a rule based approach the system either works or fails (as opposed to being more or less likely as is the case in a statistical approach), we generally have a clearer understanding of what a system is capable of and where its weaknesses lie. However, this feature is also the greatest flaw of this kind of approach, as it makes a system brittle.</Paragraph>
      <Paragraph position="1"> Statistical approaches begin with a model and estimate the parameters of the model based on data. Since decisions are more or less likely (rather than right or wrong), systems using these approaches are more robust in the face of unseen data. In particular, statistical modeling approaches provide the conditional probability of an event, which combines both prior knowledge of the distribution of events and the distribution learned from a training set, which can take into account both how often an event is seen and the context in which it occurs. There are two important considerations in choosing to use a statistical approach: (1) the output must be representable in a model--that is, we need to understand the problem well enough to represent output and specify its relationship to the input. This can presently be done for part of speech tags, for example, but not for discourse; (2) there must be sufficient data (paired I/O) and/or prior statistical knowledge to estimate the parameters.</Paragraph>
      <Paragraph position="2">  While these approaches have been kept separate, they have influenced each other. Statistical techniques have brought to NL a clearer notion of evaluation: that there are separate training and testing corpuses and a &amp;quot;fair&amp;quot; test is on data you haven't seen before. Symbolic techniques have brought the notion of understanding a problem by looking closely at the places where it performs poorly. For example, we're seeing a renewed emphasis on tools in speech processing work.</Paragraph>
    </Section>
    <Section position="2" start_page="69" end_page="69" type="sub_section">
      <SectionTitle>
1.2 Integrating symbolic and statistical
techniques
</SectionTitle>
      <Paragraph position="0"> In determining how to most effectively combine these approaches, it is useful to view them not as a dichotomy, but rather as a continuum of approaches. Kanal and Chandrasekan (1972) take this view in their analysis of pattern recognition techniques, which they characterize as, at one end, purely &amp;quot;linguistic&amp;quot;, with generative grammars representing syntactic structure, and at the other &amp;quot;geometric&amp;quot; approaches, which are purely statistical-patterns are represented as points in a multidimensional feature space, where the &amp;quot;features&amp;quot; are left undefined in the model. In the middle are &amp;quot;structural&amp;quot; approaches, where patterns are defined as relations among a set of primitives which may or may not be associated with probabilities.</Paragraph>
      <Paragraph position="1"> Kanal and Chandrasekan argue that rather than select a linguistic or geometric solution for a particular problem, one should divide the problem into subproblems hierarchically, deciding at each level whether to apply a solution from the range between linguistic and geometric or to &amp;quot;further subdivide. In this view the various methods are complementary, rather than rivals. Important considerations in making the choice of what approach to use is how much and what kind of a priori information is available and where information is noisy or uncertain.</Paragraph>
      <Paragraph position="2"> In fact, nearly all &amp;quot;statistical&amp;quot; approaches used in NL and speech fall somewhere in this continuum, rather than at the extreme. Purely statistical topic classification techniques use words as the primitives, which are features that have some meaning and relationships to one another, even though these relationships may be exploited only through statistical correlations. The states in a hidden Markov model for speech form phonemes, which is conceptual rather than acoustic phenomenon and specific to a particular language, and the expansion of phoneme states into networks are based on a dictionary. Therefore, even in a null grammar there is a great deal of a priori knowledge being brought to bear.</Paragraph>
      <Paragraph position="3"> In the work described here, we have attempted a close integration of statistical and symbolic methods that leverages the a priori knowledge that can be represented in phrase grammars with the knowledge that can be acquired using statistical methods. For example, a classification algorithm can select which key words can be used to discriminate a topic. By adding semantic features to a text using a parser and semantic grammar, we can increase the amount of domain specific information available for the classification algorithm to operate over. Another example is in language modeling for recognition: a statistical N-gram language model provides information on the fikefihood of one word to follow another; by adding phrase grammars, we can also learn the likelihood of particular domain specific phrases, and then we can use that same grammar to actually interpret those phrases and extract the information being communicated. The body of this paper describes in detail where we have chosen to integrate linguistic and structural knowledge into our statistical algorithms.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML