File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2097_intro.xml

Size: 5,884 bytes

Last Modified: 2025-10-06 14:00:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2097">
  <Title>Compiling Language Models from a Linguistically Motivated Unification Grammar</Title>
  <Section position="2" start_page="0" end_page="670" type="intro">
    <SectionTitle>
1 Introduction ~
</SectionTitle>
    <Paragraph position="0"> Construction of speech recognizers for n:ediulnvocabulary dialogue tasks has now becolne an important I)ractical problem. The central task is usually building a suitable language model, and a number of standard methodologies have become established. Broadly speaking, these fall into two main classes. One approach is to obtain or create a domain corpus, and froln it induce a statistical language model, usually some kind of N-gram grammar; the alternative is to manually design a grammar which specifies the utterances the recognizer will accept. There are many theoretical reasons to prefer the first course if it is feasible, but in practice there is often no choice. Unless a substantial domain corpus is available, the only method that stands a chance of working is hand-construction of an exi The majority of the research reported was performed at I{IACS under NASA Cooperative Agreement~ Number NCC 2-1006. The research described in Section 3 was supported by the Defense Advanced Research Projects  plicit grammar based on the grammar-writer's intuitions.</Paragraph>
    <Paragraph position="1"> If the application is simple enough, experience shows that good grammars of this kind can be constructed quickly and efficiently using commercially available products like ViaVoice SDK (IBM 1999) or the Nuance Toolkit (Nuance 1999). Systems of this kind typically allow specification of some restricted subset of the class of context-free grammars, together with annotations that permit the grammar-writer to associate selnantic values with lexical entries and rules. This kind of framework is fl:lly adequate for small grammars. As the gran:mars increase in size, however, the limited expressive power of context-free language notation beconies increasingly burdensome. The grainn:a,r tends to beconie large and unwieldy, with many rules appearing in multiple versions that constantly need to be kept in step with each other. It represents a large developn:ent cost, is hard to maintain, and does not usually port well to new applications.</Paragraph>
    <Paragraph position="2"> It is tempting to consider the option of moving towards a :::ore expressive grammar tbrmalisln, like unification gramnm.r, writing the original grammar in unification grammar form and coml)iling it down to the context-free notation required by the underlying toolkit. At least one such system (Gemilfi; (Moore ct al 1997)) has been implemented and used to build successful and non-trivial applications, most notably ComnmndTalk (Stent ct al 1999). Gemini accepts a slightly constrained version of the unification grammar formalism originally used in the Core Language Engine (Alshawi 1992), and compiles it into context-free gran:nmrs in the GSL formalism supported by the Nuance Toolkit. The Nuance Toolkit con:piles GSL gran:mars into sets of probabilistic finite state  gra.phs (PFSGs), which form the final bmguage model.</Paragraph>
    <Paragraph position="3"> The relative success of the Gemilfi system suggests a new question. Ulfification grammars ha.re been used many times to build substantial general gramlnars tbr English and other na.tura\[ languages, but the language model oriented gra.mln~rs so far developed fi)r Gemini (including the one for ColnmandTalk) have a.ll been domain-sl)ecific. One naturally wonders how feasible it is to take yet another step in the direction of increased genera.lity; roughly, what we want to do is start with a completely general, linguistically motivated gramma.r, combine it with a domain-specific lexicon, and compile the result down to a domain-specitic context-free grammar that can be used as a la.nguage model. If this 1)tetra.mine can be rea.lized, it is easy to believe that the result would 1)e a.n extremely useful methodology tbr rapid construction of la.nguage models. It is ilnportant to note tha.t there are no obvious theoretical obstacles in our way. The clailn that English is context-free has been respectable since a.t least the early 8(Is (Pullum and Gazda.r 1982) 'e, and the idea.</Paragraph>
    <Paragraph position="4"> of using unification grammar as a. compact wa 5, of tel)resenting an ulMerlying context-fl'e~e, language is one of the main inotivations for GPSG (Gazdar et al 1985) and other formalislns based on it. The real question is whether the goal is practically achievable, given the resource limitations of current technology.</Paragraph>
    <Paragraph position="5"> In this l)a.1)er, we describe work aimed at the target outlined above, in which we used the Gemini system (described in more detail in Section 2) to a.ttempt to compile a. va.riety of linguistically principled unification gralnlna.rs into la.ngua.ge lnodels. Our first experiments (Section 3) were pertbrmed on a. large pre-existing unification gramlna.r. These were unsuccessful, for reasons that were not entirely obvious; in order to investigate the prol)lem more systematically, we then conducted a second series of experilnents (Section 4), in which we increlnentally 1)uilt up a smMler gra.lnlna.r. By monitoring; the behavior of the compilation process and the resulting langua.ge model as the gra.lmnar~s 2~1e m'e aware l, hal, this claim is most~ 1)robably not l;rue for natural languages ill gelmraI (lh'csnall cl al 1987), but furl~hcr discussion of t.his point is beyond I.he scope of t, llC paper.</Paragraph>
    <Paragraph position="6"> cover~ge was expanded, we were a.ble to identit~ the point a,t which serious problems began to emerge (Section 5). In the fina.1 section, we summarize and suggest fltrther directions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML