File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1047_metho.xml
Size: 13,295 bytes
Last Modified: 2025-10-06 14:11:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1047"> <Title>Entity-Oriented Parsing</Title> <Section position="3" start_page="214" end_page="216" type="metho"> <SectionTitle> 4. Example Parses </SectionTitle> <Paragraph position="0"> t.et us examine first how a simple data base command like: Enro; Susan Smith in CS 101 might be parsed with the control structure and language defin;tions presented in the two previous sections. We start off with the top-level parsing strategy, RecognizeAnyEntity. This strategy first tries to identify a top-level domain entity (in this case a data base command) that might account for the entire input. It does this in a bottom-up manner by indexing from words in the input to those entities that they could appear in. In this case, the best indexer is the first word, 'enro!', which indexes EnrolCommand. In general, however, the best indexer need not be the first word of the input and we need to consider all words, thus raising the potential of indexing more than one entity. In our example, we would also index CollegeStudent, CollegeCourse, and Co!legeDepartment However, tt'ese are not top.level domain entities and are subsumed by EnrolCommand, and so can be ignored in favour of it.</Paragraph> <Paragraph position="1"> Once EnrolCommand has been identified as an entity that might account for the input, RecognizeAnyEntity initiates an attempt to recognize it. Since EnrolCommand is listed as an imperative case frama, this task is handled by the ImperativeCaseFrame recognizer strategy. In contrast to the bottom-up approach of RecognizeAnyEntity, this strategy tackles its more specific task in a top-down manner using the case frame recognition algorithm developed for the CASPAR parser \[8\]. In particular, the strategy will match the case frame header and the preposition 'in', and initiate recognitions of fillers of its direct object case and its case marked by 'in'. These subgoals are to recognize a CollegeStudent to fill the Enrollee case on the input segment &quot;Susan Smith'&quot; and a CollegeCourse to fill the Enrolln case on the segment &quot;CS 101 &quot;. Both of the~e recognitions will be successful, hence causing the ImperativeCaseFrame recognizer to succeed and hence the entire recognition. The resulting parse would be: Note how this parse result is expressed in terms of the underlying structural representation used in the entity definitions without the need for a separate semantic interpretation step.</Paragraph> <Paragraph position="2"> The last example was completely grammatical and so did not require any flexibility. After an initial bottom-up step to find a dominant entity, that entity was recognized in a highly efficient top-down manner. For an example involving input that is ungrammaUcal (as far as the parser is concerned), consider: Place Susan Smith in computer science for freshmen There are two problems here: we assume that the user intended 'place' as a synonym for 'enror, but that it happens not to be in the system's vocabulary; the user has a!so shortened the grammatically acceptable phrase, 'the computer science course for freshmen', to an equivalent phrasenot covered by the surface representation for CollegeCourse as defined earlier. Since 'place' is not a synonym for 'enrol' in the language as presently defined, the RecognizeAnyEntity strategy cannot index EnrolCommand from it and hence cannot (as it did in tl~e previous example) initiate a top-down recognition of the entire input.</Paragraph> <Paragraph position="3"> To deal with such eventualities, RecognizeAnyEntity executes a split statement specifying two continuations immediately after it has found all the entities indexed by the input. The first continuation has a zero flexibility level increment. It looks at the indexed entities to see if one subsumes all the others. If it finds one, it attempts a top-down recognition as described in the previous example. If it cannot find one, or if it does and the top-down recognition fails, then the continuation itself fails. The second continuation has a positive flexibility increment and follows a more robust bottom-up approach described below. This second continuation was established in the previous example too, but was never activated since a complete parse was found at the zero flexibility level. So we did not mention it. In the present example, the first continuation fails since there is no subsuming entity, and so the second continuation gets a chance to run.</Paragraph> <Paragraph position="4"> Instead of insisting on identifyir,g a single top-level entity, this second continuation attempts to recognize all of the entities that are indexed in the hope of later being able to piece together the various fragmentary recognitions that result. The entities directly indexed are CollegeStudent by &quot;Susan&quot; and &quot;Smith&quot;, 2 CollegeDepartment by &quot;computer&quot; and &quot;science&quot;, and CollegeClass by &quot;freshmen&quot;. So a top-down attempt is made to recognize each of these entities. We can assume these goals are fulfilled by simple top-down strategies, appropriate to the SurfaceRepresentation of the corresponding entities, and operating with no flexibility level increment.</Paragraph> <Paragraph position="5"> Having recognized the low-level fragments, the second continuation of RecognizeAnyEntity now attempts to unify them into larger fragments, with the ultimate goal of unifying them into a description of a single entity that spans the whole input. To do this, it takes adjacent fragments pairwise and looks for entities of which they are both components, and then tries to recognize the subsuming entity in the spanning segment. The two pairs here are CollegeStudent and CollegeDepartment (subsumed by CollegeStudent) and CollegeDepartment and CollegeClass (subsumed by CollegeCourse).</Paragraph> <Paragraph position="6"> To investigate the second of these pairings, RecognizeAnyEntity would try to recognize a CollegeCourse in the spanning segment 'computer science for freshmen' using an elevated level of flexibility. This gGal would be handled, just like all recognitions of CollegeCourse, by the NominalCaseFrame recognizer. With no flexibility increment, tiffs strategy fails because the head noun is missing. However. with another flexibility increment, the recognition can go through with the CcllegeDepartment being treated as an adjective and the CollegeClass being treated as a postnominal case -- it has the right case marker, &quot;for&quot;, and the adjective and post-nominal are in the right order. This successful fragment unification leaves two fragments to unify -- the old CollegeStudent and the newly derived CollegeCourse.</Paragraph> <Paragraph position="7"> There are several ways of unifying a CollegeStudent and a CollegeCourse -- either could subsume the other, or they could form the parameters to one of three database modification commands: EnrolCommand, WithdrawCommand, and TransferCommand (with the obvious interpretations). Since the commands are higher level entities than CollegeStudent and CollegeCourse, they would be preferred as top.level fragment unifiers. We can also rule out TransferCommand in favour of the first two because it requires two courses and we only have one. In addition, a recognition of EnrolCommand would succeed at a lower Ilexibility increment than WithdrawCommand, 3 since the preposition 'in' tilat marks the CollegeCourse in the input is the correct marker of the Enrolln case of EnrolCommand, but is not the appropriate marker for WithdrawFrom, the course-containing case of WithdrawCommand. Thus a fragment unification based on EnrolCommand would be preferred. Also, the alternate path of fragment amalgamation -- combining CollegeStudent and CollegeDepartment into CollegeStudent and then combining CoilegeStudent and CollegeCourse -- that we left pending above cannot lead to a complete instantiation of a top-level database command. So RecognizeAnyEntity will be in a position to assume that the user really intended the EnrolCommand.</Paragraph> <Paragraph position="8"> Since th~s recognition involved several significant assumptions, we would need to use focused interaction techniques\[7\] to present the interpretation to the user for approval before acting on it. Note that if the user does approve it, it should be possible (with further approval) to add 'place' to the vocabulary as a synonym for 'enrol' since 'place' was an unrecognized word in the surface position where 'enrol' should have been.</Paragraph> <Paragraph position="9"> For a final example, let us examine an extragrammatical input that involves continuations at several different flexibility levels: Transfel Smith from Coi,~pter Science 101 Economics 203 The problems here are that 'Computer' has been misspelt and the preposition 'to' is missing from before 'Economics'. The example is similar to the first one in that RecognizeAnyEntity is able to identify a top-level entity to be recognized top-down, in this case, TransferCommand. Like EnrolCommand, TransferCommand is an imperative case frame, and so the task of recognizing it is handled by the ImperativeCaseFrame strategy. This strategy can find the preposition 'from', and so can !nitiate the appropriate recognitions for fillers of the O.tOfCour~e and Student cases. The recognition for the student case succeeds without trouble, but the recognition for the OutOfCourse case requires a spelling correction.</Paragraph> <Paragraph position="10"> 2We assume we have a complete listing of students and SO can index from their names.</Paragraph> <Paragraph position="11"> Whenever a top-down parsing strategy fails to verify that an input word is in a specific lexical class, there is the possibility that the word that failed is a misspelling of a word that would have succeeded. In such cases, the lexical lookup mechanism executes a split statement. 4 A zero increment branch fails immediately, but a second branch with a small positive increment tries spelling correction against the words in the predicted lexical class. If the correction fails, this second branch fails, but if the correction succeeds, the branch succeeds also. In our example, the continuation involving the second branch of the lexical lookup is highest on the agenda after the primary branch has failed. In particular, it is higher than the second branch of RecognizeAnyEntity described in the previous example, since the flexibility level increment for spelling correction is small. This means that the lexical lookup is continued with a spelling correction, thus resolving the problem. Note also that since the spelling correction is only attempted within the context of recognizing a CollegeCourse -- the filler of OutOfCourse -- the target words are limited to course names. This means spelling correction is much more accurate and efficient than if correction were attempted against the whole dictionary.</Paragraph> <Paragraph position="12"> After the OutOfCourse and Student cases have been successfully filled, the ImperativeCaseFrame strategy can do no more without a flexibility level increment. But it has not filled all the required cases of TransferCommand, and it has not used up all the input it was given, so it splits and fails at the zero-level flexibility increment. However, in a continuation with a positive flexibility level increment, it is able to attempt recognition of cases without their marking prepositions. Assuming the sum of this increment and the 3pelling correction increment are still less than the increment associated with the second branch of RecognizeAnyEntity, this continuation would be the next one run.</Paragraph> <Paragraph position="13"> In this continuation, the ImperativeCaseFrameRecognizer attempts to match unparsed segments of the input against unfilled cases. There is only one of each, and the resulting attempt to recognize 'Economics 203' as the filler of IntoCourse succeeds straightforwardly. Now all required cases are filled and all input is accounted for, so the ImperativeCaseFrame strategy and hence the whole parse succeeds with the correct result.</Paragraph> <Paragraph position="14"> For the example just presented, obtaining the ideal behaviour depends on careful choice of the flexibility level increments.</Paragraph> <Paragraph position="15"> There is a danger here that the performance of the parser as a whole will be dependent on iterative tuning of these increments, and may become unstable with even small changes in the increments. It is too early yet to say how easy it will be to manage this problem, but we plan to pay close attention to it as the parser comes into operatio n .</Paragraph> <Paragraph position="16"> 3This relatively fine distinction between Enro\]Command and Withd~awCemmand. based on the appropriateness of the preposition 'in', is problem~',tical in that it assumes that the flexibility level would be incremented in very fine grained steps. If that was impractical, the final outcome of the parse would be ambiguous between an EnrolCommand and a WithdrawCommand and the user would have to be asked to make the discrimination.</Paragraph> </Section> class="xml-element"></Paper>