File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/e93-1019_metho.xml
Size: 17,717 bytes
Last Modified: 2025-10-06 14:13:19
<?xml version="1.0" standalone="yes"?> <Paper uid="E93-1019"> <Title>Rule-based Acquisition and Maintenance of Lexical and Semantic Knowledge *</Title> <Section position="5" start_page="150" end_page="151" type="metho"> <SectionTitle> 3 Automated Knowledge Acquisition </SectionTitle> <Paragraph position="0"> A-COOL automates the acquisition of lexical and semantic knowledge in ESTR.ATO. For each entry in a Spanish lexical feature file, A-COOL creates: a new semantic concept frame for the central semantic database, a Spanish lexical frame for the Spanish central lexical database and a skeletal entry for the English lexical feature file. Once the entry from the English lexical feature file has been filled out by the editor, A-COOL will also create a lexical frame for the English central lexical database. The word-to-concept mappings for the Spanish and English words are automatically created by A-COOL in order to ensure consistency. A-COOL accomplishes all of this by means of easily modified if-then rules.</Paragraph> <Paragraph position="1"> When A-COOL creates a new concept, it automatically makes a link to a more general semantic class. The top-level hierarchy we are currently using was created at Carnegie Mellon University \[Carlson and Nirenburg, 1990\]. The insertion of semantic concepts into a hierarchy is not dependent on the specific toplevel. The rules specify the linking of the new concepts in the semantic hierarchy based on features (such as ACTION for verbs and ANIMACY for nouns) in the lexical feature files. These rules can be modified easily for adding concepts to a different top-level.</Paragraph> <Paragraph position="2"> What follows is a description of the A-COOL process using the entry for the Spanish verb &quot;funcionar&quot; (&quot;to work&quot;). The verb feature ACTION in the lexical acquisition phase is designed such that the user is</Paragraph> <Paragraph position="4"> prompted for a response to a question about the type of action the verb represents (if any at all). With this information, A-COOL can produce the preliminary value of IS-A for a semantic frame when it creates the semantic frame from the verb entry.</Paragraph> <Paragraph position="5"> The &quot;if&quot; or &quot;LHS&quot; (left-hand-side) part of the A-COOL rules specifies properties of lexical features which must be true for the rule to apply. If the rule does apply, the &quot;then&quot; or &quot;RHS&quot; (right-hand-side) specifies which slots of the central database frame to create.</Paragraph> <Paragraph position="6"> For example, figure 3 shows entries for the Spanish verb &quot;funcionar&quot; and its corresponding English verb &quot;work&quot; from the lexical feature files.</Paragraph> <Paragraph position="7"> In order to convert these entries into central database frames, the following rules apply, rulel inserts the default information that the value of the CLASS feature for &quot;funcionar&quot; is AGENT, because the reflexive value is unknown (see figure 4). It also inserts the word into the lexical hierarchy under +W-SPANISH-INTRANS-VEItB and copies the TITANS information to the new frame.</Paragraph> <Paragraph position="8"> Similarly, rule2 (see figure 5) helps to convert &quot;work&quot; by guessing at the value of the TITANS slot and setting the CLASS to AGENT-THEME.</Paragraph> <Paragraph position="9"> Finally, rule3 (see figure 6) helps to generate the template semantic frame corresponding to the meaning of &quot;funcionar&quot; and &quot;work&quot; by placing the frame under PHYSICAL-EVENT in the semantic IS-A hierarchy. null A-COOL works by using the following algorithm: 1. Read in the (Spanish or English) lexical feature file.</Paragraph> <Paragraph position="10"> 2. For each lexical item, generate a frame by applying all relevant rules to that lexical item.</Paragraph> <Paragraph position="11"> 3. Write that frame to the central frame file.</Paragraph> <Paragraph position="12"> With &quot;funcionar&quot; and &quot;work&quot; as the input lexical items, the rules generate the central frames shown in figure 7.</Paragraph> </Section> <Section position="6" start_page="151" end_page="153" type="metho"> <SectionTitle> 4 Automated Knowledge </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="151" end_page="151" type="sub_section"> <SectionTitle> Maintenance 4.1 Introduction </SectionTitle> <Paragraph position="0"> M-COOL allows the linguist to keep just one source for Spanish lexical information and one source for English lexical information (the central lexical frame by A-COOL and used as input to M-COOL.</Paragraph> <Paragraph position="1"> databases). Thus, the lexical information is not spread out over several files and can be modified easily. Each language's lexicon can also be organized hierarchically.</Paragraph> <Paragraph position="2"> Using a set of if-then rules, M-COOL automatically produces the necessary run-time lexical and semantic knowledge sources for the various NLP modules. These rules specify which features are needed for the different modules. The rules also create some lexical knowledge that can be extracted from the lexical and semantic hierarchies. This information need not be specified in the lexical entries.2 Since the various run-time lexical and semantic knowledge sources now come from common central databases, consistency is maintained and human error is minimized. Both the semantic knowledge and the lexical knowledge are stored in a standard frame-based format. This allows the linguist and domainexpert to view or modify the knowledge with a frame-based editor.</Paragraph> <Paragraph position="3"> The rest of this section describes the M-COOL program, the lexical and semantic frames used by M-COOL, and then gives an annoted example to illustrate how M-COOL works.</Paragraph> </Section> <Section position="2" start_page="151" end_page="152" type="sub_section"> <SectionTitle> 4.2 Program Description </SectionTitle> <Paragraph position="0"> In order to make the knowledge maintenance cycle fazter, M-COOL can also work incrementally as well as in batch mode. If the linguist only modifies or ~E.g., the linking of syntactic arguments to semantic roles.</Paragraph> <Paragraph position="1"> adds a small number of lexical or semantic items, the incremental version of M-COOL will only update the run-time knowledge sources which are affected by the changes, instead of re-generating all of the run-time knowledge sources. This saves considerable time over the non-incremental method.</Paragraph> <Paragraph position="2"> M-COOL works by first determining which run-time knowledge sources need to be updated. For each such knowledge source, it then applies all rules which are relevant to that knowledge source. Each rule is associated with a specific knowledge source.</Paragraph> <Paragraph position="3"> To extend M-COOL to generate the run-time knowledge source for a new NLP module, two steps are taken: 1. Define the properties of the new knowledge source in the file-type table.</Paragraph> <Paragraph position="4"> 2. Write a new set of rules for generating the en- null tries which comprise the new knowledge source. These rules specify the lexical features to be used for the entry as well as the format of the entry.</Paragraph> <Paragraph position="5"> The file-type table simply tells M-COOL whether the given knowledge source is lexical or semantic, and whether it is for generation or analysis. It also supplies miscellaneous information such as the name of the file where the run-time entries are kept and whether it can be compiled using the LISP compile command. For example, our Spanishlexical-analysis file-type is defined with this entry: &quot;Spanish/Mappings/lex-map. lisp&quot; :lexical :analysis The rule language used by M-COOL is called FRULEKIT \[Shell and Carbonell, 1986\]. FRULEKIT is an efficient CommonLisp pattern matcher with several extensions over oPs-5. The most relevant extension is that it allows rules to flexibly match against and modify frames in a hierarchy. Having such a frame-based rule language makes it easy for us to write rules to update the ESTRATO runtime knowledge sources.</Paragraph> </Section> <Section position="3" start_page="152" end_page="153" type="sub_section"> <SectionTitle> 4.3 Lexical and Semantic k'Yame Description </SectionTitle> <Paragraph position="0"> Let us briefly discuss the lexical and semantic database files which are the input to M-COOL. The lexical frames are the repository of all lexical knowledge for the ESTRATO system. These frames contain structural, grammatical and some semantic encoding information for words or phrases. They can be easily extended to include other lexical information (e.g., definitions or synonyms) for display to a human translator. For the purposes of ESTRATO, each lexical entry contains a part of speech (CAT), a lexical mapping rule (HEAD or SEM-MAP), a root form (ROOT) and a link (IS-A) to its location in the lexical hierarchy. Nouns (CAT N) contain agreement and features for syntactic-semantic argument linking (CLASS, MAPPINGS). CLASS here refers to the type of linking rules a verb or adjective \[Levin and Rappaport, 1987\] will use for its syntactic arguments (SUBJ, OBJ, OBJ2, XCOMP, and COMP \[Kaplan Bresnan, 1982l). Semantic knowledge about the world is stored in a domain model organized in an is-a hierarchy using frames that correspond to the various events (PHYSICAL-EVENT *ASSEMBLE-MONTAR) and objects (PHYSICAL-OBJECT *TRANSFORMER-TRANSFORMADOR), relations (AGENT, THEME) 3 between these objects and events and properties (COLOR, SHAPE) in the specific domain\[Carlson and Nirenburg, 1990\]. The name of each lexical frame represents a single word sense \[Meyer et al., 1992\].</Paragraph> <Paragraph position="1"> Examples of lexical frames are shown in figure 7.</Paragraph> <Paragraph position="2"> Each frame specifies a link to a parent in the lexical hierarchy or the domain model hierarchy (IS-A). This allows lexical entries to be arranged into classes which require similar &quot;mapping rules&quot; \[Mitamura, 1989\].</Paragraph> <Paragraph position="3"> Each semantic knowledge database frame in the domain model also specifies the roles which a given concept may have as well as specific restrictions on the fillers of those roles. An example of a semantic frame was shown in figure 7. The information in the databases is used in different forms and combinations depending on the NLP component's needs.</Paragraph> <Paragraph position="4"> Figure 8 shows a frame which is an alternative English lexical entry for the concept *WORK- null The value of the PATTERN slot in this frame (AGENT (IS-A *ALARM-ALAR.MA)) is used so that when the AGENT role is filled with an &quot;alarm&quot;, the English word selected for generation is &quot;go off&quot; rather than &quot;work&quot;.</Paragraph> </Section> <Section position="4" start_page="153" end_page="153" type="sub_section"> <SectionTitle> 4.4 Example </SectionTitle> <Paragraph position="0"> Now we will illustrate how M-COOL rules automatically generate various types of run-time knowledge from the frames shown in figure 7. Figure 9 shows a rule for generating lexical mapping information. This rule applies to the lexical frame Tw-sP-FUNCIONAR-V-2 in order to generate the run-time lexical analysis mapping data depicted in figure 10.</Paragraph> <Paragraph position="1"> Next we have a rule for generating the run-time Ontology database, which we call &quot;framettes&quot; (figure 11). This rule applies to the semantic frame *WORK-FUNCIONAR (shown in figure 7) to generate the framette as shown in figure 12.</Paragraph> <Paragraph position="2"> The two previous rules were fairly simple, but M-COOL can perform much more complex computations. For example, in order to generate efficient run-time knowledge which allows the translator to map from interlingua into English feature-structures, M-COOL must find, for each semantic frame, every English lexical frame which corresponds to it. It then combines this correspondence information into a single LISP function which will efficiently perform the mapping at run-time. One of the M-COOL rules responsible for constructing this knowledge is shown in figure 13. In this example, it applies to the semantic frame *WORK-FUNCIONAR.. It finds two lex- null shown in figure 14.</Paragraph> </Section> </Section> <Section position="7" start_page="153" end_page="153" type="metho"> <SectionTitle> 5 Related Work </SectionTitle> <Paragraph position="0"> Most of the effort in developing software tools for NLP has focused on user interfaces and acquisition of lexical databases from text corpora, but there are very few rule-based systems for knowledge maintenance. \[Pin-Ngern et al., 1989\] go beyond corpus analysis by augmenting the lexicM databases with knowledge supplied by human editors. The Word Manager \[Domenig, 1988\] is a system for both acquisition and maintenance of morphological knowledge, but its main strength is its user-interface. LUKE \[Knight, 1991\] is an interactive system which uses several heuristics exploiting the relationship between linguistic and world knowledge to partially automate the acquisition process.</Paragraph> <Paragraph position="1"> More effort has gone into the acquisition and maintenance of knowledge for expert-systems. 4 The focus of such efforts is to acquire smaller amounts of problem-solving knowledge, which is more complex than the semantic and lexicM knowledge used in ESTRATO. null</Paragraph> </Section> <Section position="8" start_page="153" end_page="154" type="metho"> <SectionTitle> 6 Future Work </SectionTitle> <Paragraph position="0"> We intend to extend COOL in three directions: by supporting the acquisition and maintenance of lexical and semantic information for new languages, by adding rules for completely automating the acquistion of semantic classes and lexical argument alternations \[Bresnan, 1982; Perlmutter, 1983\], and by generated by M-COOL.</Paragraph> <Paragraph position="1"> improving the functionality of the underlying system itself. Because it is easy to extend M-COOL to generate run-time knowledge sources for new modules, we plan to add, for example: English-analysis lexical tables, Spanish-generation lexical tables, and lexical tables for an external machine-translation system.</Paragraph> <Paragraph position="2"> We also have plans for integrating the various acquisition and maintenance tools we use in the ESTRATO system (which include A-COOL and M-COOL) into a single incremental lexical acquisition and maintenance program with a user-friendly interface for both experts and non-experts. The interface will prompt the non-expert for information about a word without the user needing to know linguistics. For example, determining the countablilty of a noun can be done by prompting the user with examples of the word being used in a countable context and non-countable context. This will allow non-experts to add most of the lexical and semantic knowledge. Currently the process of adding or modifying database entries and running A-COOL and M-COOL requires the user to understand both the internM representation of the lexical items and how to run the various programs. An interactive knowledge editor which hides all of the details from the user will make the user's work much more productive and simple.</Paragraph> </Section> <Section position="9" start_page="154" end_page="154" type="metho"> <SectionTitle> 7 Conclusions </SectionTitle> <Paragraph position="0"> Our idea of developing a program to help automate the task of lexical and semantic knowledge acquisition and maintenance has been very fruitful for us. We have realized the following benefits:</Paragraph> </Section> <Section position="10" start_page="154" end_page="155" type="metho"> <SectionTitle> * A-COOL and M-COOL make knowledge acquisi- </SectionTitle> <Paragraph position="0"> tion and maintenance easier, faster and more robust. By automatically generating template lexical and semantic database entries from the lexical feature files, A-COOL accelerates the acquisition process and eliminates many sources of human error. Similarly, M-COOL eliminates the need to manually update a large number of run-time knowledge sources each time a new lexical entry is added. By using a powerful and efficient frame-matching rule-based system to automatically generate the correct run-time knowledge sources, knowledge-maintenance is faster.</Paragraph> <Paragraph position="1"> * M-COOL allows us to integrate generation and analysis lexical knowledge. Because M-COOL can generate both analysis and generation lexical knowledge sources from the same central database, this makes it very easy to create Spanish generation and English analysis knowledge sources. This solves the problem of having to maintain separate versions of knowledge for the analysis and generation of the same language.</Paragraph> <Paragraph position="2"> * It is easy to extend M-COOL to new modules.</Paragraph> <Paragraph position="3"> Although we didn't anticipate it, we were able to use M-COOL to generate and maintain a wide variety of additional knowledge sources (for example, a custom glossary and a phrasal-lexicon file). M-COOL'S design makes this easy.</Paragraph> <Paragraph position="4"> Given the complexity and size of our machine-translation system, COOL has become an indispensible part of our knowledge acquisition environment.</Paragraph> </Section> class="xml-element"></Paper>