File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/p86-1013_metho.xml
Size: 19,167 bytes
Last Modified: 2025-10-06 14:11:55
<?xml version="1.0" standalone="yes"?> <Paper uid="P86-1013"> <Title>PARSING CONJUNCTIONS DETERMINISTICALLY</Title> <Section position="4" start_page="0" end_page="78" type="metho"> <SectionTitle> OVERVIEW OF THE PARSER </SectionTitle> <Paragraph position="0"> For the sake of a name, we will call the parser NEXUS since it is the syntactic component of a larger system called NEXUS. This system is being developed to study the problem of learning tech.</Paragraph> <Paragraph position="1"> nical concepts from expository text. The acronym stands for Non.Expert Understanding System.</Paragraph> <Paragraph position="2"> NEXUS is a direct descendent of READER, a parser written by Ginsparg at Stanford in the late 1970's \[6\]. Like all wait-and-see parsers, it incorporates a stack to hold constituent structures being built, some variables that record the state of the parse, and a set of transition rules that control the parsing process. The stack structures and state variables in NEXUS are almost the same as in READER, but the rules have been rewritten to make them cleaner, more transparent, and more complete.</Paragraph> <Paragraph position="3"> There are two categories of rules. Segmentation rules are responsible for finding the boundaries of constituents and creating stack structures to store these results. Recombination rules are responsible for attaching one structure to another in syntactically valid ways. Segmentation operations are separate from, and always precede, recombination operations. All the rules are encoded in Lisp; there is no separate rule interpreter.</Paragraph> <Paragraph position="4"> Segmentation rules take as input a word from the input sen.</Paragraph> <Paragraph position="5"> tence and a partial-parse of the sentence up to that word. The rules are organized into procedures such that each procedure implements those rules that apply to one syntactic word class.</Paragraph> <Paragraph position="6"> When a rule's conditions are met, it adds the input word to the partial-parse, in a way specified in the rule, and returns the new partial-parse as output.</Paragraph> <Paragraph position="7"> A partial-parse has three parts: 1. The stack: A stack (not a tree) of the data structures which encode constituents. There are two types of structures in the stack, one type representing clause nuclei (the verb group, noun phrase arguments, and adverbs of a clause), and the other representing prepositional phrases. Each structure consists of a collection of slots to be filled with constituents as the parse proceeds.</Paragraph> <Paragraph position="8"> 2. The message (MSG): A symbol specifying the last action performed on the stack. In general, this symbol will indicate the type of slot the last input word was inserted in.</Paragraph> <Paragraph position="9"> 3. The stack-message (MSGI): A list of properties of the stack as a whole (e.g. the sentence is imperative).</Paragraph> <Paragraph position="10"> The various types of slots comprising stack structures are defined in Figure 1. VERB, PREP, ADV, NOTE, and FUNCTION slots are i filled during segmentation, while CASES and MEASURE slots are added during recombination. NP slots are filled with noun phrases during segmentation but may subsequently be augmented by post-modifiers during recombination.</Paragraph> </Section> <Section position="5" start_page="78" end_page="78" type="metho"> <SectionTitle> CLAUSES PREPOSITION STRUCTURES </SectionTitle> <Paragraph position="0"> Hypothesized role of the clause in the sentence, e.g. main, relative clause, infinitive adjunct, etc.</Paragraph> <Section position="1" start_page="78" end_page="78" type="sub_section"> <SectionTitle> Notes </SectionTitle> <Paragraph position="0"> Segmentation rules can leave notes about a structure that will be used in ,later processing.</Paragraph> </Section> <Section position="2" start_page="78" end_page="78" type="sub_section"> <SectionTitle> Rating </SectionTitle> <Paragraph position="0"> A numerical measure of the syntactic and semantic acceptability of the structure to be used in choosing between competing possible parses.</Paragraph> </Section> <Section position="3" start_page="78" end_page="78" type="sub_section"> <SectionTitle> Adjuncts </SectionTitle> <Paragraph position="0"> The prepositional phrases and subordinate clauses that turn out to be adjuncts to this clause.</Paragraph> <Paragraph position="1"> Figu re 1 : Stack Structures An English rendering of some segmentation rules for various word classes is given in the Appendix. The tests in a rule depend on the current word, the messages, and various properties of structures in the/stack at the time the tests are made. As each word is taken fi'om the input stream, all rules in its syntactic class(es) are tried, in order, using the current partial parse. All rules that succeed are executed. However, if the execution of some rule stipulates a return, subsequent rules for that class are ignored.</Paragraph> <Paragraph position="2"> The actions a rule can take are of five main types. For a given input word W, a rule can: * continue filling a slot in the top stack structure by inserting W * begin filling a new slot in the top structure * push a new structure onto the stack and begin filling one of its slots * collapse the stack so that a structure below the top becomes the new top * modify a slot in the top structure based on the information provided by W In addition, a rule will generally change the MSG variable, and may insert or delete items in the list of stack messages. The way the rules work is best shown by example. Suppose the input is: The children wore the socks on their hands.</Paragraph> <Paragraph position="3"> The segmentation NEXUS performs appears in Fig. 2a. On the left are the words of the sentence and their possible syntactic classes. The contribution each word makes to the development of the parse is shown to the right of the production symbol &quot;= ~>&quot;. We will draw the stack upside down so that successive parsing states are reached as one reads down the page. The contents of a stack structure are indicated by the accumulation of slot values between the dashed-line delimiters (&quot;--.-.&quot;). Empty slots are not shown.</Paragraph> </Section> <Section position="4" start_page="78" end_page="78" type="sub_section"> <SectionTitle> Input Word </SectionTitle> <Paragraph position="0"> Word Class MSG1 MSG Stack -- - nil BEGIN FUNCTION: MAIN the A => nil NOUN NPI: the children N = > nil NOUN NPI': the children wore V = > nil VERB VERB: wore the A = > nil NOUN NP2: the socks N,V => nil NOUN NP2': thesocks on P = ;> nil PREP PREP: on their N = > nil NOUN NP: their hands N,V => nil NOUN NP': theirhands Before parsing begins, the three parts of a partial-parse are initialized as shown on the first line. One structure is prestored in the stack (it will come to hold the main clause of the input sentence), the message is BEGIN, and MSG1 is empty. The parsing itself is performed by applying the word class rules for each input word to the partial-parse left after processing the previous word. For example, before the word wore is processed, MSG = NOUN, MSG1 is empty, and the stack contains one clause with FUNCTION = MAIN and NP1 = the children. Wore is a verb and so the Verb rules are tried. The third rule is found to apply since there is a clause in the stack meeting the conditions. This clause is the top one so there is no collapse. (Collapse performs recombination and is described below.) The word wore is in. serted in the VERB slot, MSG is set, and the rule returns the new partial.parse.</Paragraph> <Paragraph position="1"> It is possible for the segmentation process to yield more than one new partial-parse for a given input word. This can occur in two ways. First, a word may belong to several syntactic classes and when this is so, NEXUS tries the rules for each class. If rules in more than one class succeed, more than one new partial-parse is produced. As it happens, the two words in the example that are both nouns and verbs do not produce more than one partial-parse because the Verb rules don't apply when they are processed. Second, a word in a given class can often be added to a partial.parse in more than one way. The third and fifth Verb rules, for example, may both be applicable and hence can produce two new partial.parses. In order to keep track of the possibilities, all active partial.parses are kept in a list and NEXUS adds new words to each in parallel. The main segmentation control loop therefore has the following form: For each word w in the input sentence do For&quot; each wor&quot;d class C that w belongs to do For&quot; each partial parse P in the list do Try the C rules given w and P In contrast to segmentation rules, which add structures to a partial.parse stack, recombination rules reduce a stack by joining structures together. These rules specify the types of attachment that are possible, such as the attachment of a post-modifier to a noun phrase or the attachment of an adjunct to a clause. The successful execution of a rule produces a new structure, with the attachment made, and a rating of the semantic acceptability of the attachment. The ratings are used to choose among different attachments if more than one is syntactically possible.</Paragraph> <Paragraph position="2"> There are three rating values -- perfect, acceptable, and unacceptable .- and these are encoded as numbers so that there can be degrees of acceptability. When one structure is attached to another, its rating is added to the rating of the attachment and the sum becomes the rating of the new (recombined) structure. A structure's rating thus reflects the ratings of all its component constituents. Although NEXUS is designed to call upon an inter. preter module to supply the ratings, currently they must be supplied by interaction with a human interpreter. Eventually, we expect to use the procedures developed by Hirst \[7\]. There is also a 'no-interpreter' switch which can be set to give perfect ratings to clause attachment of right-neighbor prepositional phrases, and noun phrase (&quot;low&quot;) attachment of all other post-modifiers. The order in which attachments are attempted is controlled by the col\]apse procedure. Collapse is responsible for assembling an actual parse tree from the structures in a stack. After initializing the root of the tree to be the bottom stack structure, the remaining structures are considered in reverse stack order so that the constituents will be added to the tree in the order they appeared (left to right). For each structure, an attempt is made to attach it to some structure on the right frontier of the tree, starting at the lowest point and proceeding to the highest. (Looking only at the right frontier enforces the no-crossing condition of English grammar. 1 ) If a perfect attachment is found, no further possibilities are considered. Otherwise, the highest-rated attachment is selected and co11 apse goes on to attach the next structure. If no attachment is found, the input is ungrammatical with respect to the specifications in the recombination rules.</Paragraph> <Paragraph position="3"> 1The no-crossing condition says that one constituent cannot be attached to a non-neighboring constituent without attaching the neighbor first. For instance, if constituents are ordered A, B, and C, then C cannot be attached to A unless B is attached to A first. Furthermore, this implies that if B and C are both attached to A, B is closed to further attachments.</Paragraph> <Paragraph position="4"> After a stack has been collapsed, a formatting procedure is called to produce the final output. This procedure is primarily responsible for labeling the grammatical roles played by NPs and for computing the tense of VERBs. It is also responsible for inserting dummy nouns in NP slots to mark the position of &quot;wh. gaps&quot; in questions and relative clauses.</Paragraph> <Paragraph position="5"> Figure 2b shows the tree NEXUS would derive for the example. The code PN indicates past tense, and the role names should be self-explanatory. During collapse, the interpreter would be asked to rate the acceptability of each noun phrase by itself, the acceptability of the clause with the noun phrases in it, and the acceptability of the attachment. The former ratings are necessary to detect mis.segmented constituents, e.g., to downgrade &quot;time flies&quot; as a plausible subject for the sentence Time flies like an arrow. By Hirst's procedure, the last rating should be perfect for the attachment of the on.phrase to the clause as an adjunct since, without a discourse context, there is no referent for the socks on their hands and the verb wear expects a case marked by on.</Paragraph> </Section> </Section> <Section position="6" start_page="78" end_page="80" type="metho"> <SectionTitle> CONJUNCTION PARSING </SectionTitle> <Paragraph position="0"> To process and and or, we need to add a coordinate conjunc- null tion word class (C) and three segmentation rules for it. 2 1. If MSG = BEGIN, Push a clause with FUNCTION = w onto stack. Set MSG = CONJ and return.</Paragraph> <Paragraph position="1"> 2. If the topmost nonconjunct clause in the stack has VERB filled, Push a clause with FUNCTION = w onto stack.</Paragraph> <Paragraph position="2"> Set MSG = CONJ and return.</Paragraph> <Paragraph position="3"> 3. Otherwise, Push a preposition structure with PREP = w onto stack. Set MSG = PREP and return.</Paragraph> <Paragraph position="4"> The first rule is for sentence-initial conjunctions, the second for potential clausal conjuncts and the third is for cases where the conjunction cannot join clauses. This last case arises when noun phrases are conjoined in the subject of a sentence: John and Mary wore socks. Note that the stack structure for a noun phrase conjunct is identical to that for a prepositional phrase. To handle gaps, we also need to add one rule each to the Noun and Verb procedures. For Verb, the rule is: 4. If MSG = CON J, Set NP1 = !sub, VERB = w in top structure, Set MSG = VERB and return.</Paragraph> <Paragraph position="5"> For Noun: 5. If the top structure S is a clause conjunct with NP1 filled but no VERB and there is another clause C in the stack with VERB filled and more than one NG filled, clause. Rule 4 is for verbs that appear directly after a conjunction and rule 5 is for transitive or ditransitive conjuncts with gapped verb.</Paragraph> <Paragraph position="6"> To specify attachments for conjuncts, we need some recombination rules. In general, elements to be conjoined must have very similar syntactic structure. They must be of the same type (noun phrase, clause, prepositional phrase, etc.). If clauses, they must serve the same function (top level assertion, infinitive, relative clause, etc.), and if non-finite clauses, any ellipsed elements (wh-gaps) must be the same. If these conditions are met, an attachment is proposed.</Paragraph> <Paragraph position="7"> Additionally, in three situations, a recombination rule may also modify the right conjunct: 1. A clause conjunct without a verb can be proposed as a noun phrase conjunct.</Paragraph> <Paragraph position="8"> 2. A clause conjunct without a verb may also be proposed as a gapped verb, as in: Bob saw Sue in Paris and \[Bob saw\] Linda in London.</Paragraph> <Paragraph position="9"> 3. When constituents from the left conjunct are ellipsed, they may have to be taken from the right conjunct, as in the famous sentence: John drove through and completely demolished a plate glass window. This transformation is actually implemented in the final formatting procedure since all of the trailing cases in the right conjunct must be moved over to the left conjunct if any such movement is warranted.</Paragraph> <Paragraph position="10"> Since all these situations are structurally ambiguous, the interpreter is always called to rate the modifications. In situation 2, for instance, it may be that there is no gap: Bob saw Sue in \[Paris and London\] in the spring of last year. In situation 3, the gapped element might come from context, rather than the right conjunct: Ignoring the stop sign at the intersection, John drove through and completely demolished his reputation as a safe driver. Hence, only interpretation can determine which choice is most appropriate. null Let us now examine how these rules operate by tracing through a few examples. First, suppose the sentence from the previous section were to continue with the words &quot;and their feet&quot;. and C = > nil CONJ FUNCTION: AND their N = > nil NOUN NP1 : their feet N = > nil NOUN NP1 ': their feet Thus, the noun rules would do what they normally do in filling the first NP slot in a clause structure. If the sentence ended here, recombination would conjoin the last two noun phrases, &quot;their hands&quot; end &quot;their feet&quot;, as the complement of on, producing: If, instead, the sentence did not end but continued with a verb -- &quot;froze&quot;, say .- the segmentation would continue by adding this word to the VERB slot in the top structure, which is open. As before, the rules would do what they normally do to fill a slot. Recombination would yield conjoined clauses:</Paragraph> <Paragraph position="12"> Notice that the second clause is inserted as just another case adjunct of the first clause. There is really no need to construct a coordinate structure (wherein both clauses would be dominated by the conjunction) since it adds nothing to the interpretation.</Paragraph> <Paragraph position="13"> Moreover, as Dahl & McCord point out \[4\], it is actually better to preserve the subordination structure because it provides essential information for scoping decisions.</Paragraph> <Paragraph position="14"> Now we move on to gaps. Consider a new right conjunct for our original example sentence in which the subject is ellipsed: The children wore the socks on their hands ~nd froze their feet.</Paragraph> <Paragraph position="15"> The appearance of/sub in the second SUB slot tells the interpreter that the subject of the right conjunct is C/creferential with the subject of the left conjunct.</Paragraph> <Paragraph position="16"> Finally, to illustrate rule 5, consider the sentence: The children wore the socks on their hands and John a lampshade on his head.</Paragraph> <Paragraph position="17"> When the parser comes to &quot;a&quot;, rule 5 applies, the verb wore is copied over to the second conjunct, and &quot;a&quot; is inserted into NP2. Thus, the segmentation of the conjunct clause looks like this: and C = > nil CONJ FUNCTION: AND John N = ;> nil NOUN NPI: John a A = > nil VERB: wore NOUN NP2: s lampshade N = > nil NOUN NP2': a lampshade on P => nil PREP PREP: on his N = > nil NOUN NP: his head N,V => nil NOUN NP': hishead Recombination would produce the conjunction of two complete clauses with no shared material.</Paragraph> </Section> class="xml-element"></Paper>