File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1149_metho.xml
Size: 13,832 bytes
Last Modified: 2025-10-06 14:11:56
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1149"> <Title>Anothe r St ride Towa rds Knowledge-Based Machine Translation</Title> <Section position="4" start_page="633" end_page="634" type="metho"> <SectionTitle> 3. System Overview </SectionTitle> <Paragraph position="0"> Figure 3-1 shows the architecture of our current system. As mentioned in the previous section, we modularize domain-specific semantic knowledge and domain-independent (but languagespecific) syntactic knowledge. We precompile semantic entities and LFG-style grammars into a single large grammar which is less perspicuous but more efficient. This merged grammar is further precompiled into a yet larger parsing table for added efficiency, enabling the run-time system to parse input text in a very efficient manner using the parsing algorithm recently introduced by Tomita \[26, 25\]. More on this issue shall be discussed in section 6. Doliiilli Iudepelltlli Knowlea ,'t H EIzal)'as eomaill ladepcudel! Kuowll~I!</Paragraph> <Paragraph position="2"> File entity-oriented approach to restricted.domain parsing was first proposed by Hayes \[16\] as a method of organizing semantic and symactic information about all domain conc(;pts around a collection of various entities (objects, events, commands, states, etc.) that a particular system needs to recognize. An entity definition contains information about tile internal structure of the entities, about relations to other entities, about the way tile entities will be manifested ia tile natural language input, and about the correspondence between the internal structure and multiple surface forms for each entity.</Paragraph> <Paragraph position="3"> Let us consider the domain of elector-patient conversations; in particular, the patient's initial complaint aboul some ailment.</Paragraph> <Paragraph position="4"> Entities in this domain include an event entity PA/IENI-COMPLAINT-ACT and object entities PAIN, HUMAN and so on. A fragment of an entity.oriented grammar is shown in figure 4.1.</Paragraph> <Paragraph position="5"> The notation here is slightly simplified from that of Hayes, Sentences of different surface form that should be recognized as instantiations of this entity include: I have a head ache I have a burning pain in the chest.</Paragraph> <Paragraph position="6"> The final semantic representation of the sentence &quot;1 have a (lull ache in my chest&quot; produced by insLantiating entities is shown in figure 4-2.</Paragraph> <Paragraph position="7"> \[C fnanle : MEI) I CAI_-COMPLA I N r--AC r t.yRe '. SEN iI\[NT \[At agea L : \[of name: PERSON n,lmle: *spuake~ &quot;~ \] ; tile &quot;I&quot; vthn has the cllest ache. pa irl: \[C FiIafilO; PAIN local ion: \[cfname : IIODY-PAIII name: CIU!ST \] pain-kind: I)IFFUSE\] \] Figu re 4-2: Sample Semantic Representation: Instantiated Entities The 'SurlaceRepresentation' parts of an entity guide the parsing by providing syn\[actic structures tied to the semantic portion of tile entity. A', \[he result ol parsing a sentence (see figure 4-2), a composiiion uf the semantic porlion e\[ the instantiated orltities is produced. This knowledge structure may be given to any backend process, whether it be a language generator (for the target language), a paraphraser, a data.base query system, or an expert system.</Paragraph> <Paragraph position="8"> The primary advantage of the entity-oriented grammar fortnalism hinges on its clarity of the sub-language definition (see Kittredge\[20\] for a discussion of sub-languages). Since all information relating to an entity is grouped in one place, a language definer will be able to see more clearly whether a definition is complete and what would be the consequences of any addition or change to the definition. Similarly, since syntactic and semantic information about an entity are grouped together, the former can refer to the latter in a clear and coherent way, beth in the grarnmar production and in the run time system. This advantage is even more valuable in the application to multi-lingual machine translation. Because the semantic portions of the entities are totally language independent, we can use one set of entity definitions for all languages .- merely requiring that each entity have a multiple number of surface forms; one or more for each language. In this way, on can ensure that semantic coverage is consistent across all languages.</Paragraph> <Paragraph position="9"> In addition to clarity and its multi.lingual extensibility, another advantage of the entity-oriented approach isrobustness in dealing with extragrammatical input. Robust recovery from ill-formed input is a crucial feature for practical interactive language systems, but is beyond the immediate scope of this paper. See Carbonell and Hayes \[7J for a full discussion on entity.based robust parsing.</Paragraph> <Paragraph position="10"> The major limitation of entity-oriented grammars arises from the very same close coupling of syntax and semantics: all syntactic knowledge common across domains (or across entities within one domain) must be replicated by hand for each and every entity definition. Syntactic generalities are not captured. This problem is not merely an aesthetic one; it takes prodigious efforts for grammar developers to build and perfect each domain grammar, with little cross-domain transfer. 14ow then, can one overcome this central limitation and yet retain all the advantages of semantic analysis in general and the entity-oriented approach in particular? The answer lies in decoupfing the syntactic information at grammar dew)lopment time -- thus having a general grammar for each language and integrating it via an automated precompilation process to produce highly coupled structures for the run-time system. Such an approach has been made possible through the advent of unification and flmctional gramrnars.</Paragraph> </Section> <Section position="5" start_page="634" end_page="635" type="metho"> <SectionTitle> 5. The Functional Grammar Formalism </SectionTitle> <Paragraph position="0"> Functional grammars, as presented by Kay \[18\], provide the key to automated compilation of syntactic and semantic knowledge.</Paragraph> <Paragraph position="1"> In essence, they define syntax in a functional manner based or\] syntactic rolus, ralher than by positions of constituents in the surface string. The functional framework has clear advantages for languages such as Japanese, wh~,re word order is of milch less significance than in tPSnglish, but case markings tape up the role of providin.~l Ihe surface cues for assigning syntactic and semantic roles to each constituent. Moreover, functional structures integrate far more coherently into case-frame based semantic structures such as entity definitions.</Paragraph> <Paragraph position="2"> Two well-known functional grammar formalisms are Functional Unification Grammar rUG)\[19\] and Lexical Function Grammar (LFG\] \[4\]. In this paper, however, we do not distinguish between them and refer to both by the term &quot;functional grammar&quot;.</Paragraph> <Paragraph position="3"> Application of the functional grammar formalism to machine translation is discussed in \[tg\]. Attempts have being made to implement parsers using these grammars, most notably in the PATR-II project at Stanford \[22, 24\]. However, these efforts have not been integrated with external semantic knowledge bases, and have not been applied in the context of KBMT systems.</Paragraph> <Paragraph position="4"> There are two main advantages of using the functional grammar formalism in practical machine translation systems: * A system implemented strictly within the functional grammar formalism will be reversible, in the sense that if the system maps from A to B then, to the same extent, it maps from Eli to A. lhus, we do not need to write separate grammars for parsing and generation.</Paragraph> <Paragraph position="5"> We merely compile the same grammar into an efficient uni.directional structure for parsing, and a different uni-directional structure for generation into that language.</Paragraph> <Paragraph position="6"> * Functional grammar formalisms such as UG and LFG are well-known among computational linguists, and therefore need not be trained (with some justifiable resistance) to write grammars in arcane system-specific formalisms.</Paragraph> <Paragraph position="7"> The general problem in parsing with functional gramrnars is implementation inefficiency for any practical application.</Paragraph> <Paragraph position="8"> Although much work has been done to enhance efficiency \[24, 22\], the functional grammar formalisms are considered far less efficient than formalisrns like ATNs\[28\] or (especially) context-free phrase structure grammars. We resolve this efficiency problem by precompiling a grammars written in a the functional grammar (together with a separate domain semantics specification) into an augmented context-free grammars, .as described in the following section.</Paragraph> <Paragraph position="9"> 6. Grammar Precompilation and Efficient On-Line Parsing The previous two sections have described two kinds of knowledge representation methods: the entity.oriented grammar formalism for domain specific but language general semantic knowledge ,and the functional grammar formalism for domain-independent but language specific syntactic knowledge In order to parse a sentence in real time using these knowledge bases, we precompile the semantic and syntactic knowledge, as well as morphological rules and 'dictionary, into a single large morph/syn/sem grammar. This morph/syn/sem grammar is represented by a (potentially very large) set of context-free phrase structure rules, each of which is augmented with a Lisp program for test and action as in ATNs 3 A simplified fragment of a morph/syn/sem grammar is shown in figure 6-1.</Paragraph> <Paragraph position="11"> (setvalue '(xO: semcase:) (getvalue '(x2: semcase:))) (setvalue '(xO: semcase: agent:) (getvalue '(x1: semcase:))) (setvalue '(xO: syncase:) (getvalue '(x2: syncase:))) (setvalue '(xO: syncase: subj:) (getvalue '{xt:))) (return (getvalue '(xO:)))) complaint-act-l-VP --> complaint-act-l-V ((setvalue&quot;(xO: semcase:) (getvalue '(x1: semcase:))) (setvalue '(xO: syncase: prod:) (getvalue '(x1:))) (setvalue '(xO: agr:) (getvalue '(x1: agr:))) (setvalue '(xO: form:) (getvalue '(x1: form:)))</Paragraph> <Paragraph position="13"> ((setvalue '(xO: semcase: cfname:) 'PATIENT-COMPLAINT-ACT) (setvalue '(xO: agr:) (getvalue '{x1: agr:))) (setvalue '(xO: form:) (getvalue '(xl: form:))) (return (getvalue '(xO)))) Figure 6-1 : A Compiled Grammar Fragment Once we have a grammar in this form, we can apply efficient context-free parsing algorithms, and whenever the parser reduces constituents into a higher-level nonterminal using a phrase structure rule, the Lisp program associated with the rule is evaluated. The Lisp program handles such aspects as construction of a semantic representation of the input sentence, passing attribute values among constituents at different levels and checking semantic and syntactic constraints such as subject-verb agreement. Recall that those Lisp programs are generated automatically by the grammar precompiler from LFG f-structures and semantic entities. Note also that the Lisp programs can be further compiled into machine code by the Lisp compiler, We adopt the algorithm introduced by Tomita \[25, 26\] as our context-free parsing algorithm to parse a sentence with the nlorph/syn/sem grammar. The Tomita algorithm can be viewed as an extended LR parsing algorithm \[t\]. We compile further the morph/syn/sem grammar further into a table called the augmented LR parsing table, with which the algorithm works very efficiently.</Paragraph> <Paragraph position="14"> The Temita algorithm has three major advantages in the application of real-time machine translation systems; raThe algorithm is fast, due to the LR table precompilation; in several tests it has proven faster than any ether general context-free parsing algorithm presently in practice. For instance, timings indicate a 5 to t0 fold speed advantage over Earley's algorithm in several experiments with English grammars and sarnple sets of sentences.</Paragraph> <Paragraph position="15"> * The efficiency of the algorithm is not affected by the size of its grammar, once the LR parsing table is obtained. This characteristic is especially important for our system, because the size of the morph/syn/sem grammar will be very large In practical applications.</Paragraph> <Paragraph position="16"> ,, The algorithm parses a sentence strictly from left to right, proving all the on-line parsing advantages describe below.</Paragraph> <Paragraph position="17"> The on-line parser starts parsing as soon as the user types in the first word of a sentence, without waiting for the end of a line or a sentence boundary. There are two main benefits from on.line parsing: raThe parser's response time can be reduced significantly. When the user finishes.typing a whole sentence, most of the input sentence has been already processed by the parser, * Any errors, such as mis-typing and ungrammatical usages, can be detected almost as soon as they occur, and the parser can warn the user immediately without waiting for the end of the line.</Paragraph> <Paragraph position="18"> Thus, on-line parsing provides major advantage for interactive applications (sucb as real-time parsing, immediate translation of telex messages, and eventual integration sith speech recognition and syntesis systems), but is transparent when operating in batch. processing mode for long texts. More discussion of on-line parsing can be found in Chapter 7 of Tomita \[25\].</Paragraph> </Section> class="xml-element"></Paper>