File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0604_metho.xml
Size: 17,486 bytes
Last Modified: 2025-10-06 14:15:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0604"> <Title>Using NOMLEX to Produce Nominalization Patterns for Information Extraction</Title> <Section position="3" start_page="25" end_page="26" type="metho"> <SectionTitle> 2 Considerations for Choosing a </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="25" end_page="26" type="sub_section"> <SectionTitle> Dictionary Encoding </SectionTitle> <Paragraph position="0"> The primary information in a NOMLEX entry is a description of a nominalizatiou's argument structure. This information can be quite complex. There are several potential argument positions, including both pre-nominal (the bomb ezplosion, the bomb's explosion) and post-nominal (the ezplosion of the bomb), and a given verbal argument may appear in one of several positions. 3 In general, individual arguments may be omitted, although there are some co-occurrence constraints, which we shall consider below. Furthermore, vhether or not one position is filled may affect the interpretation of other positions; thus, in Rome's destruction, Rome is the object, whereas in Rome's destruction of Carthage, Rome is the subject.</Paragraph> <Paragraph position="1"> In seeking an appropriate representation for SThese positions may be filled by non-arguments as well. For example, the prenominal positions may be filled by temporal NPs (NTIME1 and NTIME2 in COMLEX) like Yesterday(but not John), e.g., Yesterday's appointment of the Prime Minister and The June I, 1987 appointment of the Prime Minister. These positions correspond to temporal modifiers of clauses, e.g., .\&quot; appointed the Prime Minister Yesterday/June I, 1987. For further discussion, see Section 5.</Paragraph> <Paragraph position="2"> this information, one can compare the situation with that of English verbal complements, which have been extensively studied and recorded. In English, verbal complements are relatively fixed in composition and order. As a result, the common practice (adopted, for example, in OALD (Hornby, 1980), LDOCE (Proctor, 1978), and COMLEX Syntax (Macleod et al., forthcoming) is to enumerate and name the possible subcategorizations, where in general each subcategorization represents a fixed sequence of syntactic elements. 4 For example, in COMLEX (Wolff et al., 1994), NP-PP consists of a Noun Phrase followed by a Prepositional Phrase as in put the milk in the refrigerator, where *put the milk, *put in the refrigerator and *put in the refrigerator the milk are not acceptable. Such an approach would be unwieldy for nominalizations, where an argument may appear in several positions and may also be omitted. As a result, even a simple nominatization may entail a large number of subcategorization frames. If these were listed explicitly, the entry would be difficult to create and to read.</Paragraph> <Paragraph position="3"> On the other hand, a representation which separately listed all the complement structures which could occur with a nominalization, assuming they could freely co-occur, would fail to capture many crucial constraints between complements. For example, the nominalization confirmation has both THAT-S (His confirmation THAT HE WOULD GO) and WH-S complements (His confirmation of WHETHER HE WOULD GO.). However, these complements cannot co-occur (*His confirmation THAT HE WOULD GO of WHETHER HE IVOULD GO.). Also in the case where the associated verb has an NP-AS-NP complement (She treated them as inferiors) and no AS-NP complement (She emerged as their main competitor) the nominalization cannot have a bare AS-NP. Thus we have The consideration of HIM AS A CAN-DIDATE and HIS consideration AS A CANDI-DATE but not *The consideration AS A CANDIDATE. null Guided by these considerations, we chose an approach in which we first determine which COMLEX verbal complements can correspond</Paragraph> </Section> </Section> <Section position="4" start_page="26" end_page="28" type="metho"> <SectionTitle> 4 In COMLEX Syntax, some symbols designate sets of </SectionTitle> <Paragraph position="0"> alternative complement structures, e.g., the ditransitive alternation.</Paragraph> <Paragraph position="1"> to phrases containing nominalizations and then we specify how these complements can be mapped to arguments of the nominalizations.</Paragraph> <Paragraph position="2"> The resulting COMLEX-based encoding does not permit incompatible complement phrases to co-occur, e.g., confirmation would not simultaneously take both THAT-S and WH-S complements. Optionality, obligatoriness and alternative positions of phrases is stated in a simple notation, e.g., it can be stated in the entry for consideration that the verbal object for the NP-AS-NP complement of consider maps to either the DET-POSS position (HIS consideration as a candidate) or the PP-OF position (The consideration OF HIM as a candidate) and that this object is obligatory for mappings of NP-AS-NP, i.e., if the object is not present in the phrase containing consideration, then the phrase cannot be mapped to the NP-AS-NP complement, although other complements are possible.</Paragraph> <Paragraph position="3"> Our representation also accounts for the difference in behavior of the core arguments (the subject, the object, and the indirect object) and the other arguments, which we shall refer to as oblique complements. The core arguments, as we have noted, can appear in several positions in the nominalization, and may be independently omitted or included. The oblique complements of the verb, on the other hand, generally translate directly into nominalization complements, either unchanged or occasionally with the addition of a preposition or a &quot;that&quot; complementizer.</Paragraph> <Paragraph position="4"> 3 What is a NOMLEX Entry NOMLEX entries are organized as typed feature structures and written in a Lisp-like notation (Figures 1 and 2). Each entry lists the nominalization (:ORTH) and the associated verb (:VERB). The :NOM-TYPE feature specifies the type of nominalization: VERB-NOM for nominalizations which represent the action (destruction) or state (knowledge) of the associated verb; VERB-PART for nominalizations which incorporate a verbal particle (takeover); SUBJECT for nominalizations which represent the subject of the verb (teacher), and OBJECT for nominalizations which represent the object of the verb (appointee). The :NOUN keyword includes information about whether the word has nonnominalization noun senses (and may include some frequency information). For example, ap- null pointment has a sense which means something like &quot;date to do something&quot;. We are only interested in the nominalization sense in this paper. 5 The heart of the entry is a list of verb subcategorizations, :VERB-SUBC, taken from COM-LEX Syntax. The name for each subcategorization is prefixed by NOM- (such as NOM-NP or NOM-NP-AS-NP) and, for subcategorizations involving prepositions, :PVAL specifies those prepositions. The COMLEX complements in these lexical entries include: * NP, a noun phrase complement, e.g., IBM appointed Mary * NP-PP, a complement consisting of a noun phrase and prepositional phrase, e.g., IBM appointed Mary for the vice presidency * NP-TO-INF-OC, a complement consisting of a noun phrase object and an infinitive clause, where the subject of the infinitive corresponds to the object of the main clause, e.g., IBM appointed Mary to do the job * NP-AS-NP, a complement consisting of a noun phrase object, the word &quot;as&quot; and a second noun phrase, e.g., IBM appointed Mary as vice president For each verb complement, the entry lists the associated nominalized structure, if different from the verbal complement. The entry also lists the positions in which the object (:OBJECT) may appear. For appointment, these positions include the following for most complements: the appointment off Alice Smith SWhen two or more argument positions are frilled, the semantic classes of the arguments in the examples limit our patterns to the nominalization senses. However, patterns in which only one argmnent position is filled may match phrases that are ambiguous, e.g., Alice's appointment can refer to either a dental appointment or an appointment to the vice presidency. These cases are handled by other modules of Proteus, such as inference rules or reference resolution.</Paragraph> <Paragraph position="5"> However, NOM-NP-AS-NP does not allow tile N-N-MOD position (* the Alice Smith appointment as vice president). The :OBJECT is not indicated for the :VERB-SUBC of appointee because the nominalization itself corresponds to the verbal object (it is :NOM-TYPE ((OBJECT))).</Paragraph> <Paragraph position="6"> Because a subject argument can appear with any verbal complement, we include, at the top level, a list of positions for the subject (:VERB-SUB J). This list can be further restricted for a particular complement by including a :SUB-JECT feature under that complement in the NOMLEX entry. As a default, it is assumed that subjects always can map to prepositional by phrases. Exceptions are marked with NOT-PP-BY, as in the entry for appointee ( *the appointee by IBM (for vice president)).</Paragraph> <Paragraph position="7"> Typically, a nominalization will list multiple positions for each core argument. This doesn't mean, however, that all combinations of the positions are possible. Several constraints limit the possible role assignments; some of these constraints are general, and some are based on particular lists in each entry: * The uniqueness constraint says that any verbal role may only be filled once. For example, in Leslie's appointment o/Alice, the PP-OF position filled by Alice must map into the object role. As a result, Leslie cannot fill the object role and therefore must fill the subject role. 6 and :OBJ-ATTRIBUTE features, selectional constraints which are useful in selecting role assignments. In Figures 1 and 2, the attributes COMMUNICATOR (organization, person, or other entity capable of communicating) and NHUMAN (a human) are used.</Paragraph> <Paragraph position="8"> * Obligatoriness constraints are assumed for the mappings associated with each complement in each entry. As a default, it is assumed that only subject and object are optional. Therefore, Mary Smith's appointment would be associated with the NOM-NP complement, but not the NOM-NP-AS-NP complement. Furthermore, objects are obligatory for a particular complement NP-X for a particular nominalization N, if N takes both NP-X and X as complements, where NP-X includes all the phrases in X plus an object (e.g.,</Paragraph> <Paragraph position="10"> NOM-THAT-S, etc.). These defaults can be overridden in the dictionary with attributes on specific complements specifying which roles are :OPTIONAL or :REQUIRED. For example, appointment takes a NOM-NP complement, but no corresponding NOM-INTRANS. The object is obligatory contrary to our defaults, e.g., John's appointmerit must have the interpretation that John is the object of appoint. Thus, our entry for appointment is marked :REQUIRED ((OBJECT)).s</Paragraph> </Section> <Section position="5" start_page="28" end_page="30" type="metho"> <SectionTitle> 4 Our Procedure for Generating </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="28" end_page="30" type="sub_section"> <SectionTitle> Norninalization Patterns </SectionTitle> <Paragraph position="0"> Figure 3 diagrams how nominalization patterns are derived. The rectangles are modules of our ~This ordering constraint is assumed to hold for all nominalizations in English.</Paragraph> <Paragraph position="1"> algorithm, the ovals are data structures passed between modules and the dotted lines connect the ovals with examples of what the data structures might contain given the sample active clause IBM appointed Alice Smith. Due to space limitations, the figure does not include patterns with constituents for the tempOral adverbial NP positions (Section 5), e.g., Mary Smith's June 1, 1998 appointment by IBM and Yesterday's appointment of Mary Smith by IBM. There are an additional five such mappings for appointee and an additional twenty-three for appointment.</Paragraph> <Paragraph position="2"> First PET analyzes the sample sentence and identifies the main verb and its arguments (e.g., subject, direct object, etc.). Then it searches NOMLEX for any nominalizations which correspond to the main verb. Our example verb appoint has at least two nominalizations: appointment and appointee. Next the procedure examines the set of :VERB-SUBC classes in each nominalization entry (Figures 1 and 2) and identifies the set of classes which are compatible with the set of arguments in the input. A class is compatible if it allows all the input arguments, and none of its required arguments are missing.</Paragraph> <Paragraph position="3"> For the example sentence, only NOM-NP is chosen for each nominalization. The other phrases all require some phrase in addition to the object, e.g., NOM-NP-AS-NP requires an AS-NP phrase (e.g., as vice president). Next the permissible role mappings are generated. By default, the subject and object are optional, but the object is obligatory for appointment due to the :REQUIRED feature of its NOMLEX entry. Prepositional phrases are assumed to occur ill all orders, so that both (object: of, subject: by) and (subject: by, object: of) are listed in Figure 3. The uniqueness constraint must be obeyed (we cannot have two subjects or two objects). Finally, the syntax of noun phrases only permits the N-N-MOD slot to be filled more than once (The IBM Alice Smith Appointment), and ill that case our ordering constraint would have to be obeyed. This rules out an interpretation of The Alice Smith IBM Appointment where Alice Smith is the appointee and IBM is the appointer. 9 9Given a clausal pattern for an example sentence like They appointed Alice Smith to IBM, the NOM-NP-PP class would be matched and nominalization patterns would be generated in which IBM (the indirect object) The Alice Smith appointment by IBM Det n(C-appointment) of np(C-person) by np(C-company) The appointment of Alice Smith by IBM Det n(C-appointment) by np(C-company) of np(C-person) The appointment by IBM of Alice Smith np(C-company)'s n(C-appointment) IBM's appointment rip(C-person) 's n(C-appointment) Alice Smith's appointment Det np(C-person) n(C-appointment) The Alice Smith appointment Det n(C-appointment) of np(C-person) The appointment of Alice Smith Det n(C-appointment) by np(C-company) The appointment by IBM Det rip(C-company) n(C-appointment) The IBM appointment PET can then use these mappings to generate patterns, as it does for the various types of clauses. Using pattern matching and dictionary look-up, PET associates the verbal arguments with semantic classes. In our example, the sub-ject is a company and the objec t is a person.</Paragraph> <Paragraph position="4"> This information can be applied to each mapping to produce a pattern. The nominalization patterns in Figure 3 are generated from the role mappings listed using this semantic information and interpretting the nominal role labels. For example, the mapping (SUBJECT: DET-POSS,</Paragraph> </Section> </Section> <Section position="6" start_page="30" end_page="30" type="metho"> <SectionTitle> 5 Adjunct Mappings </SectionTitle> <Paragraph position="0"> The preceding section gave a simplified account of mapping nominalization patterns. We must also handle certain adjuncts. Temporal PPs that can occur in clauses can usually occur in nominalizations as well. The positions DET-POSS and N-N-MOD may be occupied by temporal NPs (Yesterday's appointment of Alice Smith by IBM, The January 3, 1998 appointment by IBM of Alice Smith). When an NP is temporal and occupies either of these positions it may fill a temporal slot in an IE pattern. Since temporal NPs are neither companies, nor people, they will not fill the object or subject slots in the IE patterns above. Therefore, the possibility of filling temporal slots from DET-POSS and N-N-MOD positions should cause no conflicts for appointment. null</Paragraph> </Section> <Section position="7" start_page="30" end_page="31" type="metho"> <SectionTitle> 6 Related Work </SectionTitle> <Paragraph position="0"> Other computational linguistics work on decoding nominalizations includes (Hobbs and Grishman, 1976), (DaM et al., 1987) and (Hull and Gomez, 1996). (Hull and Gomez, 1996) is the most similar to our own in that their ultimate goal is to extract information from the World Book Encyclopedia. That task is defined differently than for our MUC-related work. The lexical entries created by Hull and Gomez include would follow Afice Smith (the direct object). These patterns would match The Alice Smith IBM appointment (or Alice Smith's IBM appointment) and give it a very similar interpretation to one in which IBM is the appointer. selectional constraints tied to WordNet classes.</Paragraph> <Paragraph position="1"> Their procedure for converting nominalizations into predicate argument structure relies on this semantic information, which they use to distinguish nominalizations from nouns and arguments from adjuncts. Their coverage of arguments is limited to subjects, objects, and prepositional phrases, whereas NOMLEX provides detailed coverage of all core and all oblique arguments.</Paragraph> </Section> class="xml-element"></Paper>