File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-3096_metho.xml
Size: 13,535 bytes
Last Modified: 2025-10-06 14:12:30
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-3096"> <Title>Lehrstuhl fur Computerlinguistlk Universit~t des Saarlandes</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> SIMPLE PARSER FOR AN HPSG-STYLE GRAMMAR IMPLEMENTED IN PROLOG </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Abstract: </SectionTitle> <Paragraph position="0"> This paper describes basic ideas of a parser for HPSG style grammars without LP component. The parser works bottom-up using the left corner method and a chart for improving efficiency. Attention is paid to the format of grammar ru~es as required by the parser, to the possibilities of direct implementation of principles of the grammar as well as to solutions of problems connected with storing partly specified categories in the chart.</Paragraph> <Paragraph position="1"> i. Kapremmntation of Grammar Rulam for the Parser The Head-driven Phrase Structure Grammar (HPSG) blurrs the distinction between rules of the grammar and the structures they generate. Put shortly, the matter is that &quot;structures&quot; and &quot;rules&quot; in HPSG differ solely in the level of abstraction over the llnguistJc material they describe. A &quot;structure&quot; describes some very concrete piece of this material (e.g., a sentence) and, hence, embodies no abstraction; a &quot;rule&quot;, on the other hand, presents by itself a prototype of a set of structures. Since in HPSG categories are understood as bundles of features (&quot;attribute&quot;=&quot;value&quot; pairs) , the &quot;structure&quot;/&quot;rule&quot; dichotomy is reflected by the fact that the rules can contain variables as values of attributes of some features while the structures must be always fully specified or that the rules can miss some (otherwise possibly obligatory) features altogether. Constraints restricting or binding together permitted values of the attributes can be associated with the rules. Naturally, different levels of abstraction can be introduced among the rules as well, which allows for capturing different levels of generalization over the linguistic data described.</Paragraph> <Paragraph position="2"> On the highest level of abstraction, the parser can deal with two types of rules: in the first type, the values of variables occurring in the rules are bound by constraints, in the second type no constraints occur. In order to support simultaneously an easily legible notation and a reasonable computer implementation of these two types of rules, two Prolog operators are defined, each describing one rule type.</Paragraph> <Paragraph position="3"> :- op(1200,xfx, is a rule if) .</Paragraph> <Paragraph position="4"> :- op(1200,xf, is a rule) The first of the two is an infix operator describing the rules containing additional constraints; the rule itself should stand in front of the operator, the constraints should follow it, separated from each other by commas &quot;,&quot;. The second one is a postfix operator describing the rules without any constraints.</Paragraph> <Paragraph position="5"> The inventory of types of rules may be arbitrarily broadened. All that is necessary for this purpose is just adding operator declarations and, possibly, also implementing feature inheritance principles corresponding to the newly introduced rule type(s). This is important because it provides for bounding the application of the principles to the whole rule types and makes thus obsolete the explicit stipulation of feature sharing among respective categories in each rule, which is still the case in many current parsers.</Paragraph> <Paragraph position="6"> TWO examples of the rule format for the parser are shown in the following: it is to be remembered that in HPSG, as well as in all other theories accepting the X-bar convention, a central role among the daughters in a rule is played by the head-daughter because of this, the head-daughter is specially marked, which provides, e.g., for application of the Head Feature Principle.</Paragraph> <Paragraph position="7"> Ex.l: - the standard &quot;S ---> NP VP&quot; rule can appear in the following form (with obvious meanings of the predicates &quot;concatenation&quot; and &quot;agreement&quot;): \[phonology=SPhonology, d trs=\[dtr=\[cat=n, bar=two, phonology=NP_Phenology,</Paragraph> <Paragraph position="9"> Ex.2: - the rule &quot;NP ---> Dot NP&quot;: note the fact that the phonology of the mother can be expressed without invocation of the &quot;concatenation&quot; predicate (since determiners eonsist of one word only) and the agreement is expressed directly in the rule by coindexing the features &quot;number&quot; in both daughters \[bar=two, phonology=\[Det_Phonology!NP_Phonology\], d trs=\[dtr=\[cat=det, bar=zero, phonology=\[DetPhono!ogy\], morphology=\[number=Number\]i, head d_tr=\[cat:n, bar=one, phonology=NPPhonoloqy, morphology=\[hum bar=Number I\]\] is a rule .</Paragraph> <Paragraph position="10"> 2. Repreeantation of Categories in the Parser AS follows from the examples, the notation adopt Z ed for categories in rules is the one describing them as (Prolog) lists of features. Keeping such kind of representation also in the underlying mechanism of the parser would be, however, quite unfelicitous a decision. The main problem consists in the fact that the parser working bottom-up may discover certain features of already parsed (sub)structures only later in the parsing process (so to say, only when it gets &quot;higher in the tree&quot;, with regard to the way the parsing proceeds). These features are to be, then, incorporated into the already parsed structures. An elegant solution of this problem was proposed in (Eisele and D~rre,1986) and adopted in the parser described. Syntactic categories are represented in the parser internally in a way slightly different from their representation in the grammar: all categories (including 434 1 those used as values of features of other categories) are represented as &quot;open-ended lists&quot;: each internal representation of a category is a list having a certain number of instantiated elements at its beginning. and an uninstantiated &quot;tail&quot;. The main idea standing behind this kind of representation is that any feature to be dlscovered (and added to the category) only later in the parsing process can be now added as the &quot;first member&quot; of the uninstantiated &quot;tail&quot;, which task is easy to perform provided that the &quot;tail&quot; is still accessible (e.g., if the free &quot;tails&quot; of categories subject to feature inheritance principles are shared logical variables). Converting categories from one kind of representation to the other one is performed by a two-argument predicate &quot;perestroika&quot; (used below).</Paragraph> <Paragraph position="11"> The representation described also supports a simple implementation of unification of categories (see Eisele and D~rre, op. cit. for more detail); in the folowing, unification of two categories is presupposed to be performed by a two-argument predicate &quot;unify&quot;.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. The ParsQr </SectionTitle> <Paragraph position="0"> The main idea of the parsing method used in the BUP par~er (Matsumoto et ai,1983) being the starting point of the system described is that a rule is to be triggered only after its left corner has been found (i.e. it has been supplied by lexical scan, in the case of iexical categories, or it has been properly parsed). The left corner of a rewriting rule is the leftmost symbol on its rlght-hand side - the name stems from depicting the rule as a local tree it generates. After a category is parsed or supplied by lexical scan, one of the grammar rules having this category as its left-corner is s@lected, the sisters of the left corner in this rule are tried, and if all of them are succesfuly parsed, the mother category of the rule is declared to be parsed and the whole process, using the mother as a left corner, is repeated &quot;on a higher level&quot;. If any failure occurs, backtracking is invoked. Thus, the parsing process is data-driven the rules of the grammar are selected in accordance with the symbols scanned in the input. From the view-point of efficiency, this is important mainly for the so-called &quot;free-word-order&quot; languages. Mentioning this, it should be further recalled that the performance of BUP is further improved by storing all the information about all subtasks that have been already tried (successfully or unsuccessfully), which avofds repetitive computations of the parses that have been performed or that have been proved impossible to perform in the preceding steps of the analysis.</Paragraph> <Paragraph position="1"> For the purposes of the implementation of the parsing process, it is necessary to extend the notion of the &quot;left corner&quot; to its reflexive and transitive closure. The transitive closure inductively states that for all triples of categories X,Y,Z such that X is a left corner of Y and Y is a left corner of Z , X is also a left corner of Zo The reflexive closure finishes the picture by saying that any category is a left corner of itself.</Paragraph> <Paragraph position="2"> Given the previously described basic philosophy of parsing, the process can be implemented in Prolog by means of two predicates performing the two tasks informally mentioned in the preceding paragraphs: - the predicate &quot;parse&quot;, parsing a given (expected) category from (a prefix of) the input string - the predicate &quot;isaleft_corner&quot;, linking the left corner category with the goal (expected) category in the parsing process.</Paragraph> <Paragraph position="3"> However, before these predicates can be explained in more detail, it is necessary to make several remarks explaining the way the processing of complex categories has been built into the system.</Paragraph> <Paragraph position="4"> First, the usual equality (&quot;=&quot;) of two categories was replaced by their unification, i.e. on all spots where equality of two categories - expressed either directly, in the form of an equation, or indirectly, by variable sharing or otherwise - occured in the original BUP, it had to be replaced by a call of the predicate &quot;unify&quot;.</Paragraph> <Paragraph position="5"> Second, in the predicates storing or retrieving the information about the (un)successfully performed parsing subtasks, the categories must be &quot;frozen&quot; exactly in the state when this subtask was started: problems would occur if the &quot;stored&quot; categories include free variables (&quot;\]n\[ormation holes&quot;) as values of some features, which variables might be matched by any real values in the moment of search for the information about previously performed parsing tasks - such a matching, however, would be incorrect, since what is required is a real identity of the subtasks. (The same holds also the other way round, i.e. problems of exactly the same nature would occur also if the stored value were instant!ated and the current one were a free variable.) The aforementioned identity of subtasks, however, requires the identity of (some of) the stored categories only, not the identity of the lists representing them, i.e. what really matters is the identity of features, but not of their order. This identity of &quot;frozen&quot; categories (represented as &quot;usual&quot; Prolog lists) is checked by the predicate &quot;identical_categories&quot;. null Now at last, the definitions of the predicates &quot;parse&quot; and &quot;is a left corner&quot; cao be given; the sup- ...... porting predicates are either elucidated in ti~e preceding text or are given (hepefully) self-expla!ni:Ig names, which should hold also for the arguments. The difference between the &quot;frozen&quot; categories represented as usual Prolog lists and those represented as &quot;open-ended&quot; lists is reflected in the variable names standing for the respective types: the &quot;open-ended&quot; categories are always marked as &quot;ReaL&quot; categories, the other ones never bear such marking.</Paragraph> <Paragraph position="6"> tegory from some prefix of the Input String has been tried (either successfully or not) in the preceding steps of the parsing process. */ parse(GoalCategory,\[RealGoal_Category,Structtlre\], where the &quot;RESULT&quot; is the only output argument, namely, the resulting structure of the parse, the TOPMOST CATEGORY Is a skeleton category (represented as a &quot;usual&quot; Prolog llst) of the expected result (most often, something like &quot;\[cat=sentence\] &quot; or &quot;\[cat=v,bar=two\]&quot; etc.), i.e. a category which is expected to unify with any result of the parse, the INPUT STRING is the input string represented as a Prolog list of wordforms and the empty string &quot;\[\]&quot; is the expected rest of the input string after the parsing process finished.</Paragraph> </Section> class="xml-element"></Paper>