File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2072_metho.xml
Size: 10,893 bytes
Last Modified: 2025-10-06 14:12:26
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-2072"> <Title>An Efficient Implementation of PATR for Categorial Unification Grammar</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Grammar Fo~:ma~ism 2.1 PATR-I\]\[ as implemented in C-PATR </SectionTitle> <Paragraph position="0"> PATR-II is a formalism for describing grammars in terms of feature structures. C-PATR supports two equivalent notational systems for representing feature structures, path equations and attribute-value matrices.</Paragraph> <Paragraph position="1"> Path equations can be used to define a hierarchical system of templates \[section 4\] that encode linguistic generalizations. Internally, feature structures as are represented as directed graphs (DGs). PATR-style feature structures are capable of describing a wide variety of unification-based grammars. The present version of C-PATR is designed to support only pure categorial grammars. It does not support the use of explicit phrase structure rules, thus C- PATR is not an exhaustive implementation of PATR.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Categorial grammars as feature </SectionTitle> <Paragraph position="0"> structures A categorial grammar represents syntactic relations in a completely lexical fashion, i.e. without explicit phrase structure rules. Lexical items belong to basic or functor categories. A basic category is inert, in that it does not seek to combine with other categories. Functor categories perforln the bulk of the work by actively seeking to combine with other categories. A functor category specifies the category of its argument, a direction in which to search for the argument, and the category of the result that is produced by applying the functor to its argument. With only this simple machinery, it is possible to describe a wide range of syntactic phenomena. null In C-PATR, basic categories are those with NONE as the value of the argument attribute. (NONE is a regular atomic value that is given special status by the parser.) Functor categories must have values specified for the argument, direction, and result attributes (see Figure 1).</Paragraph> <Paragraph position="1"> The parsing algorithm manages the formation of constituents through the application of functors to their arguments \[see section 3\]. The argument and result attributes can contain information other than simple category designations. For example, the sample grammar in the appendix uses these slots to place constraints on the argument, to pass information from the argument to the functor, and to construct a semantic representation.</Paragraph> <Paragraph position="3"> argument: \[ cat:NP\]\] direction:left result:\[ cat:S\] J of Noun (basic) and V-intrans (a functor)</Paragraph> </Section> </Section> <Section position="3" start_page="0" end_page="420" type="metho"> <SectionTitle> 3 Unification and Parsing </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Algorithms </SectionTitle> <Paragraph position="0"> C-PATR offers two varieties of unification. A standard unification algorithm (adapted from D-PATR \[1\]) is used in creating the internal representation of a grammar, while a more complex algorithm featuring list unification \[see below\] is employed by the parser. The parser itself is a fairly standard active chart parser (also adapted from D-PATR).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Optimizing parsing and unification </SectionTitle> <Paragraph position="0"> Function application is the only compositional technique used by C-PATRs parser.</Paragraph> <Paragraph position="1"> More powerful techniques such as functional composition and type-raising are not used.</Paragraph> <Paragraph position="2"> In parsing a non-trivial sentence, hundreds of unifications are attempted, hence the data types and algorithms that C-PATR employs during unification must be optimized in order to achieve efficient parsing. In order to perform quick comparisons while keeping symbol names readily available, a symbol in C-PATR is designated to be the location in memory of its print name, maintained on a letter tree, where each unique symbol-name has only one entry.</Paragraph> </Section> <Section position="3" start_page="0" end_page="420" type="sub_section"> <SectionTitle> 3.2 List unification </SectionTitle> <Paragraph position="0"> Merging partial information by unification is not sufficient for the description of all the correspondences between syntactic and semantic representation. A case in point is the semantics of conjoined noun phrases \[2\].</Paragraph> <Paragraph position="1"> An appropriate semantic representation for a sentence like b and c are small is aconjoined formula, small(b) A small(c). Such representations cannot be derived by pure unification because two instances of the logical predicate small with different arguments must be produced from a single instance of the word small. The same difficulty arises with reciprocal pronouns (each other) and numeral determiners. C-PATR solves this problem by extending unification to list values, with an effect that is similar to abstraction and lambda conversion in logic. For example, a conjoined noun phrase, such as b and c, may require that the verb phrase it combines with has a list-valued semantic representation. If the verb phrase, such as are small, is not of that type, the unifier simply coerces the argument to a list value thereby producing two copies of its semantic translation.</Paragraph> <Paragraph position="2"> The algorithm for list unification is quite straightforward. (1) Two lists can be unified if they have the same number of elements, and if each corresponding pair of elements is unifiable. (2) Two lists of unequal lengths are not unifiable. (3) To unify a list of length n with a simple DG (non-list), coerce the non-list into a list by making n copies of the nonlist, unifying each instance the non-list with a successive element of the list. (4) If any single sub-unification fails, then the whole unification fails. In our system, list values are represented as feature structures using the special attributes first and rest (analogous to CAR and CDR in Lisp).</Paragraph> </Section> <Section position="4" start_page="420" end_page="420" type="sub_section"> <SectionTitle> 3.3 Chart Parser </SectionTitle> <Paragraph position="0"> C-PATRs chart parser is a simplified version of general chart parsing algorithm. In a categorial grammar, all constituents are formed from two pieces (a functor and an argument), thus the parser need only consider binary rules.</Paragraph> <Paragraph position="1"> The parser includes a subsumption filter \[1\]. Just before an edge is added to the chart, the filter checks if there are any identical edges spanning the same nodes as the candidate edge. If there are any such edges, then the duplicate edge is not placed on the chart. Subsumption checking eliminates redundant analyses, and improves parsing efficiency :for grammars that have many different ways to reach the same analysis. When a more complete parsing record is desired, the subsumption filter can be toggled off.</Paragraph> </Section> </Section> <Section position="4" start_page="420" end_page="420" type="metho"> <SectionTitle> 4 Special Features </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="420" end_page="420" type="sub_section"> <SectionTitle> 4.1 Hierarchical lexicon design </SectionTitle> <Paragraph position="0"> C-PATR allows the user to specify a grammar in terms of a hierarchical system of templates. The grammar is divided into two parts, a set of templates and a set of lexical entries. Each template consists of a name (designated by an Q-sign) followed by a set of explicit path equations and references to other templates \[see Appendix A\]. The path equations are compiled into directed graphs.</Paragraph> <Paragraph position="1"> When a template is referred to within another template definition the latter inherits the path equations of the former. The sample grammar makes use of template inheritance in the entries for @Vtrans, @Ga, and @O \[see Appendix\]. A template can also be used in a path equation (as in the sample grammat's entries for @V\Vstem and @Particle) to define a complex value.</Paragraph> <Paragraph position="2"> The format of the lexicon file is identical to that of the template file except that the labels for lexical entries do not begin with @-signt~. While a number of path equations usually constitute the body of a template, a typical lexical entry contains few explicit path equations. If a set of templates is well constructed, the list of template names mentioned in a lexical entry constitutes a meaningful high-level description of the word. \[see Appendix B\]. Path equations mentioned in a lexical entry should describe only the idiosyncratic properties of the word. The form of the entry is automatically assigned to the attribute lez unless specified otherwise.</Paragraph> </Section> <Section position="2" start_page="420" end_page="420" type="sub_section"> <SectionTitle> 4.2 Interactive grammar debugging </SectionTitle> <Paragraph position="0"> and lexicon compiling In designing a grammar, the user specifies templates or expanded lexical entries within a text file. C-PATR then compiles the text into an internal representation for the parser. This compilation task has been optimized to allow for reasonable interactive grammar development and debugging on small personal computers. On a Sun- 4, a 100K source grammar compiles into a 140K binary form in 5 seconds. On a Mac-II, the same task takes 30 seconds. To improve the grammar loading efficiency on the Macintosh, C-PATR provides a facility for pre-compiling the grammar. The Mac resource file created by pre-compilation loads in less than 2 seconds.</Paragraph> </Section> <Section position="3" start_page="420" end_page="420" type="sub_section"> <SectionTitle> 4.3 Services provided by C-PATR </SectionTitle> <Paragraph position="0"> C-PATR is driven by single character commands. These are summarized in Figure 2: Type a sentence to parse or: n to see contents of edge number n b to run a batch test f to toggle subsumption filter 1 to view lexical entries for a word m to view a micro-dump of chart 1 to load a new lexicon o to specify an output file p to review phrase that was parsed q to quit t to toggle result print format s to view a short dump of chart t to view logical translation(s) u to unify two arbitrary edges v to toggle variable style w to list words x to view extra long chart dump z to zap expanded lexicon to a file</Paragraph> </Section> </Section> class="xml-element"></Paper>