File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/p87-1005_metho.xml

Size: 29,702 bytes

Last Modified: 2025-10-06 14:12:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="P87-1005">
  <Title>AN ENVIRONMENT FOR ACQUIRING SEMANTIC INFORMATION</Title>
  <Section position="4" start_page="0" end_page="32" type="metho">
    <SectionTitle>
2 Kinds of Knowledge
</SectionTitle>
    <Paragraph position="0"> One kind of knowledge that must be acquired is lexical information. This includes morphological information, syntactic categories, complement structure (if any), and pointers to semantic information associated with individual words. Acquiring lexical information may proceed by prompting a user, as in TEAM \[13\], IRUS \[7\], and JANUS \[9\]. Alternatively, efforts are underway to acquire the information directly from on-line dictionaries \[3, 16\].</Paragraph>
    <Paragraph position="1"> Semantic knowledge includes at least two kinds of information: selectional restrictions or case frame constraints which can serve as a filter on what makes sense semantically, and rules for translating the word senses present in an input into an underlying semantic representation. Acquiring such selectional restriction information has been studied in TEAM, the Linguistic String Parser \[12\], and our system. Acquiring the meaning of the word senses has been studied by several individuals, including \[11, 17\]. This paper  focuses on acquiring such semantic knowledge using IRACQ.</Paragraph>
    <Paragraph position="2"> Basic facts about the domain must be acquired as well. This includes at least taxonomic information about the semantic categories in the domain and binary relationships holding between semantic categories. For instance, in the domain of Navy decision-making at a US Reet Command Center, such basic domain facts include: All submarines are vessels.</Paragraph>
    <Paragraph position="3"> All vessels are units.</Paragraph>
    <Paragraph position="4"> All units are organizational entities.</Paragraph>
    <Paragraph position="5"> All vessels have a major weapon system.</Paragraph>
    <Paragraph position="6"> All units have an overall combat readiness rating. Such information, though not linguistic in nature, is clearly necessary to understand natural language, since, for instance, &amp;quot;Enterprise's overall rating&amp;quot; presumes that there is such a readiness rating, which can be verified in the axioms mentioned above about the domain. However, this is cleady not a class of knowledge peculiar to language comprehension or generation, but is in fact essential in any intelligent system. General tools for acquiring such knowledge are emerging; we are employing KREME \[1\] for acquiring and maintaining the domain knowledge.</Paragraph>
    <Paragraph position="7"> Knowledge that relates the predicates in the domain to their representation and access in the underlying systems is certainly necessary. For instance, we may have the unary predicates vessel and harpoon.capable; nevertheless, the concept (i.e., unary predicate) corresponding to the logical expression ( X x) \[vessel(x) &amp; harpoon.capable(x)\] may correspond to the existence of a &amp;quot;y* in the &amp;quot;harp* field of the &amp;quot;uchar&amp;quot; relation of a data base. TEAM allows for acquisition of this mapping by building predicates &amp;quot;bottom-up&amp;quot; starting from database fields. We know of no general acquisition approach that will work with different kinds of underlying systems (not just databases). However, maintaining a distinction between the concepts of the domain, as the user would think of those concepts, separate from the organization of the database structure or of some other underlying system, is a key characteristic of the design and transportability of IRUS.</Paragraph>
    <Paragraph position="8"> Finally, a fifth kind of knowledge is a set of domain plans. Though no extensive set of such plans has been developed yet, there is growing agreement that such a library of plans is critical for understanding narrative \[20\], a user's needs \[22\], ellipsis \[8, 2\]. and ill-formed input \[28\], as well as for following the structure of discourse \[14, 15\]. Tools for acquiring a large collection of domain plans from a domain expert, rather than an AI expert, have not yet appeared.</Paragraph>
    <Paragraph position="9"> However, inferring plans from textual examples is under way \[17\].</Paragraph>
  </Section>
  <Section position="5" start_page="32" end_page="33" type="metho">
    <SectionTitle>
3 Dimensions of Acquiring Semantic Knowledge
</SectionTitle>
    <Paragraph position="0"> We discuss in this section several dimensions available in designing a tool for acquiring semantic knowledge within the overall context of an NLI. In presenting a partial description of the space of possible semantic acquisition tools, we describe where our work and the work of several other significant, recently reported systems fall in that space of possibilities. null</Paragraph>
    <Section position="1" start_page="32" end_page="32" type="sub_section">
      <SectionTitle>
3.1 Class of underlying systems.
</SectionTitle>
      <Paragraph position="0"> One could design tools for a specific subclass of underlying systems, such as database management systems, as in TEAM \[13\] and TELl \[4\]. The special nature of the class of underlying systems may allow for a more tailored acquisition environment, by having special-purpose, stereotypical sequences of questions for the user, and more powerful special-purpose inferences. For example, in order to acquire the variety of lexical items that can refer to a symbolic field in a database (such as one stating whether a mountain is a volcano), TEAM asks a series of questions, such as &amp;quot;Adjectives referencing the positive value?&amp;quot; (e.g., volcanic), and &amp;quot;Abstract nouns referencing the positive value?&amp;quot; (e.g., volcano). The fact that the field is binary allows for few and specific questions to be asked.</Paragraph>
      <Paragraph position="1"> The design of IRACQ is intended to be general purpose so that any underlying system, whether a data base, an expert system, a planning system, etc., is a possibility for the NLI. This is achieved by having a level of representation for the concepts, actions, and capabilities of the domain, the domain model, separate from the model of the entities in the underlying system. The meaning representation for an input, a logical form, is given in terms of predicates which correspond to domain model concepts and roles (and are hence referred to as domain mode/ predicates). IRules define the mappings from English to these domain model predicates. In our NLI, a separate component then translates from the meaning representation to the specific representation of the underlying system \[24, 25\]. IRACQ has been used to acquire semantic knowledge for access to both a relational database management system and an ad hoc application system for drawing maps, providing calculations, and preparing summaries; both systems may be accessed from the NLI without the user being particularly aware that there are two systems rather than one underneath the NLI.</Paragraph>
    </Section>
    <Section position="2" start_page="32" end_page="33" type="sub_section">
      <SectionTitle>
3.2 Meaning representation.
</SectionTitle>
      <Paragraph position="0"> Another dimension in the design of a semantic knowledge acquisition tool is the style of the underlying semantic representation for natural language input. One could postulate a unique predicate for almost every word sense of the language. TEAM  seems to represent this approach. At some later level of processing than the initial semantic acquisition, a level of inference or question/answering must be provided so that the commonalities of very similar word senses are captured and appropriate inferences made. A second approach seems to be represented in TELl, where the meaning of a word sense is translated into a boolean composition of more primitive predicates. IRACQ represents a related approach, but we allow a many-to-one mapping between word senses and predicates of the domain, and use a more constraining representation for the meaning of word senses. Following the analysis of Davidson \[10\] we represent the meaning of events (and also of states of affairs) as a conjunction of a single unary predicate and arbitrarily many binary predicates. Objects are represented by unary predicates and are related through binary relations. Using such a representation limits the kind and numbers of questions that have to be asked of the user by the semantic acquisition component. The representation dovetails well with using NIKL \[18, 21\], a taxonomic knowledge representation system with a formal semantics, for stating axioms about the domain.</Paragraph>
    </Section>
    <Section position="3" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
3.3 Model of the domain
</SectionTitle>
      <Paragraph position="0"> One may choose to have an explicit, separate representation for concepts of the domain, along with axioms relating them. Both IRUS and TEAM have explicit models. Such a representation may be useful to several components of a system needing to do some reasoning about the domain. The availability of such information is a dimension in the design of semantic acquisition systems, since domain knowledge can streamline the acquisition process.</Paragraph>
      <Paragraph position="1"> For example, knowing what relations are allowable between concepts in the domain, aids in determing what predicates can hold between concepts mentioned in an English expression, and therefore, what are valid semantic mappings (IRules, in our case).</Paragraph>
      <Paragraph position="2"> Our NIKL representation of the domain knowledge, the domain model, forms the semantic backbone of our system. Meaning is represented in terms of domain model predicates; its hierarchy is used for enforcing selectional restrictions and for IRule inheritance; and some limited inferencing is done based on the model. After semantic interpretation is complete, the NIKL classification algorithm is used in simplifying and transforming high level meaning expressions to obtain the underlying systems' commands \[25\]. Due to its importance, the domain model is developed carefully in consultation with domain experts, using tools to assure its correctness.</Paragraph>
      <Paragraph position="3"> This approach of developing a domain model independently of linguistic considerations or of the type of underlying system is to be distinguished from other approaches where the domain knowledge is shaped mostly as a side effect of other processes such as lexical acquisition or database field specification.</Paragraph>
    </Section>
    <Section position="4" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
3.4 Assumptions about the user of the
</SectionTitle>
      <Paragraph position="0"> acquisition tool.</Paragraph>
      <Paragraph position="1"> If one assumes a human in the semantic acquisition process, as opposed to an automatic approach, then expectations regarding the training and background of that user are yet another dimension in the space of possible designs. The acquisition component of TELl is designed for users with minimal training. In TEAM, database administrators or those capable of designing and structuring their own database use the acquisition tools. Our approach has been to assume that the user of the acquisition tool is sophisticated enough to be a member of the support staff of the underlying system(s) involved, and is familiar with the way the domain is conceived by the end users of the NLI. More particularly, we assume that the individual can become comfortable with logic so that he/she may recognize the correctness of logical expressions output by the semantic interpreter, but need not be trained in AI techniques. A total environment is provided for that class of user so that the necessary knowledge may be acquired, maintained, and updated over the life cycle of the NLI. We have trained such a class of users at the Naval Ocean Systems Center (NOSC) who have been using the acquisition tools for approximately a year and a half.</Paragraph>
    </Section>
    <Section position="5" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
3.5 Scope of utilities provided.
</SectionTitle>
      <Paragraph position="0"> It would appear that most acquisition systems have focused on the inference problem of acquiring knowledge initially and have paid relatively little attention to explaining to the user what knowledge has been acquired, providing sophisticated editing facilities above the level of the internal data structures themselves, or providing consistency checks on the database of knowledge acquired. Providing such a complete facility is a goal of our effort; feedback from non-AI staff using the tool has already yielded significant direction along those lines. The tool currently has a very sophisticated, flexible debugging environment for testing the semantic knowledge acquired independently of the other components of the NLI, can present the knowledge acquired in tables, and uses the set of domain facts as a way of checking the consistency of what the user has proposed and suggesting alternatives that are consistent with what the system already knows. Work is also underway on an intelligent editing tool guaranteeing consistency with the model when editing, and on an English paraphraser to express the content of a semantic rule.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="33" end_page="35" type="metho">
    <SectionTitle>
4 IRACQ
</SectionTitle>
    <Paragraph position="0"> The original version of IRACQ was conceived by R. Bobrow and developed by M. Moser \[19\]. From sample noun phrases or clauses supplied by the user, it inferred possible selectional restrictions and let the user choose the correct one. The user then had to supply the predicates that should be used in the interpretation of the sample phrase, for inclusion in the IRule.</Paragraph>
    <Paragraph position="1">  From that original foundation, as IRUS evolved to use NIKL. IRACQ was modified to take advantage of the NIKL knowledge representation language and the form we have adopted for representing events and states of affairs. For example, now IRACQ is able to suggest to the user the predicates to be used in the interpretation, assuring consistency with the model. Following a more compositional approach, IRules can now be defined for prepositional phrases and adjectives that have a meaning of their own, as opposed to just appearing in noun IRules as modifiers of the head noun. Thus possible modifiers of a head noun (or nominal semantic class) include its complements (if any), and only prepositional phrases or other modifiers that do not have an independent meaning (as in the case of idioms). Analogously, modifiers of a head verb (or event class) include its complements.</Paragraph>
    <Paragraph position="2"> Adjective and prepositional phrase IRules specify the semantic class of the nouns they can modify.</Paragraph>
    <Paragraph position="3"> Also, maintenance facilities were added, as discussed in sections 4.3, 4.4, and 5.</Paragraph>
    <Section position="1" start_page="34" end_page="34" type="sub_section">
      <SectionTitle>
4.1 IRules
</SectionTitle>
      <Paragraph position="0"> An IRule defines, for a particular word or (semantic) class of words, the semantically acceptable English phrases that can occur having that word as head of the phrase, and in addition defines the semantic interpretation of an accepted phrase. Since semantic processing is integrated with syntactic processing in IRUS, the IRules serve to block a semantically anomalous phrase as soon as it is proposed by the parser. Thus, selectional restrictions (or case frame constraints) are continuously applied.</Paragraph>
      <Paragraph position="1"> However, the semantic representation of a phrase is constructed only when the phrase is believed complete. null There are IRules for four kinds of heads: verbs, nouns, adjectives, and prepositions. The left hand side of the. IRule states the selectional restrictions on the modifiers of the head. The right hand side specifies the predicates that should be used in constructing a logical form corresponding to the phrase which fired the IRule.</Paragraph>
      <Paragraph position="2"> When a head word of a phrase is proposed by the parser to the semantic interpreter, all IRules that can apply to the head word for the given phrase type are gathered as follows: for each semantic property that is associated with the word, the IRules associated with the given domain model term are retrieved, along with any inherited IRules. A word can also have IRules fired directly by it, without involving the model. Since the IRules corresponding to the different word senses may give rise to separate interpretations, they are carried along in parallel as the processing continues.</Paragraph>
      <Paragraph position="3"> If no IRules are retrieved, the interpreter rejects the word.</Paragraph>
      <Paragraph position="4"> One use of the domain model is that of IRule inheritance. When an IRule is defined, the user decides whether the new IRule (the base IRule) should inherit from IRules attached to higher domain model terms (the inherited IRules), or possibly inherit from other IRules specified by the user. When a modifier of a head word gets transmitted and no pattern for it exists in a base IRule for the head word, higher IRules are searched for the pattern. If a pattern does exist for the modifier in a given IRule, no higher ones are tried even if it does not pass the semantic test. That is, inheritance does not relax semantic constraints.</Paragraph>
    </Section>
    <Section position="2" start_page="34" end_page="35" type="sub_section">
      <SectionTitle>
4.2 An IRACQ session
</SectionTitle>
      <Paragraph position="0"> In this section we step through the definition of a clause IRule for the word &amp;quot;send *, and assume that lexical information about &amp;quot;send ~ has already been entered. The sense of &amp;quot;sending&amp;quot; we will define, when used as the main verb of a clause, specifies an event type whose representation is as follows:</Paragraph>
      <Paragraph position="2"> destination(x, d)\], where the agent a must be a commanding officer, the object o must be a unit and the destination d must be a region.</Paragraph>
      <Paragraph position="3"> From the example clauses presented by the t~ser IRACQ must learn which unary and binary predicate:. are to be used to obtain the representation above Furthermore, IRACQ must acquire the most geP.e'~ semantic class to which the variables a, o, and d ,~,=~ belong.</Paragraph>
      <Paragraph position="4"> Output from the system is shown in bold face input from the user in regular face, and comments at,.. inserted in italics.</Paragraph>
      <Paragraph position="5"> Word that should trigger this IRule: send Domain model term to connect IRule to (select-K to view the network): deployment &lt;A: At this point the user may wish to view the domain mode/network using our graphical displaying and edi~ng facility KREME\[1\] to decide the correct concept that should be associated with this word (KREME may in fact be invoked at any time). The user may even add a new concept, which will be tagged with the user's name and date for later verification by the domain mode/ builder, who has full knowledge of the implications that adding a concept may have on the rest of the system. null Alternatively, the user may omit the answer for now; in that case, IRACQ can proceed as before, and at B will present a menu of the concepts it already knows to be consistent with the example phrases the  user provides. Figure 1 shows a picture of</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="35" end_page="35" type="metho">
    <SectionTitle>
DEPLOYMENT
</SectionTitle>
    <Paragraph position="0"> Enter an example sentence using &amp;quot;send&amp;quot;: An admiral sent Enterprise to the Indian Ocean.</Paragraph>
    <Paragraph position="1"> &lt;IRACQ uses the furl power of the IRUS parser and interpreter to interpret this sentence. A temporary IRule for &amp;quot;send&amp;quot; is used which accepts any modifier (it is assumed that the other words in the sentence can aJready be understood by the system.) IRACQ recognizes that an admiral is of the type COMMANDING.OFFICER, and displays a menu of the ancestors of  frame constraint on the logical subject of &amp;quot;send'. The user picks COMMANDING.OFFICER. IRACQ will perform similar inferences and present a menu for the other cases in the example phrase as well, asking each time whether the modifier is required or optional Assume that the user selects UNIT as the logical object and REGION as the object of the preposition &amp;quot;to&amp;quot;.&gt; &lt;B: If the user did not specify the concept DEPLOYMENT (or some other concept) at point A above as the central concept in this sense of &amp;quot;sending', then IRACQ would compute those unary concepts c such that there are binary predicates relating c to each case's constraint, e.g., to COMMANDING.OFFICER, REGION, and UNIT. The user would be presented with a menu of such concepts c. IRACQ would now proceed in the same way for A or B.&gt; &lt;IRACQ then looks in the NIKL domain model for binary predicates relating the event class (e.g., DEPLOYMENT) to one of the cases' semantic class (e.g. REGION), and presents the user with a menu of those binary predicates (figure 3). Mouse options allow the user to retrieve an explanation of how a predicate was found, or to look at the network around it. The user picks</Paragraph>
  </Section>
  <Section position="8" start_page="35" end_page="35" type="metho">
    <SectionTitle>
DESTINA T/ON.OF.&gt;
</SectionTitle>
    <Paragraph position="0"> Which of the following predicates should relate</Paragraph>
  </Section>
  <Section position="9" start_page="35" end_page="36" type="metho">
    <SectionTitle>
DEPLOYMENT to REGION in the MRL?:
</SectionTitle>
    <Paragraph position="0"> Should this IRule inherit from higher IRules? yes &lt;A popup window allowing the user to enter comments appears. The default comment has the creation date and the user's name.&gt; This is the IRule you just defined:  (predicate '(agent *v&amp;quot; commanding.officer.I)) (class 'DEPLOYMENT))) Do you wish to edit the IRule? no &lt;The person may, for example, want to insert something in the action part of the IRule that was not covered by the IRACQ questions.&gt; This concludes our sample IRACQ session.</Paragraph>
    <Section position="1" start_page="36" end_page="36" type="sub_section">
      <SectionTitle>
4.3 Debugging environment
</SectionTitle>
      <Paragraph position="0"> The facility for creating and extending IRules is integrated with the IRUS NLI itself, so that debugging can commence as soon as an addition is made using IRACQ. The debugging facility allows one to request IRUS to process any input sentence in one of several modes: asking the underlying system to fulfill the user request, generating code for the underlying system, generating the semantic representation only, or parsing without the use of semantics (on the chance that a grammatical or lexical bug prevents the input from being parsed). Intermediate stages of the translation are automatically stored for later inspection, editing, or reuse.</Paragraph>
      <Paragraph position="1"> IRACQ is also integrated with the other acquisition facilities available. As the example session above illustrates, IRACQ is integrated with KREME, a knowledge representation editing environment. Additionally, the IRACQ user can access a dictionary package for acquiring and maintaining both lexical and morphological information.</Paragraph>
      <Paragraph position="2"> Such a thoroughly integrated set of tools has proven not only pleasant but also highly productive.</Paragraph>
    </Section>
    <Section position="2" start_page="36" end_page="36" type="sub_section">
      <SectionTitle>
4.4 Editing an IRule
</SectionTitle>
      <Paragraph position="0"> If the user later wants to make changes to an IRule, he/she may directly edit it. This procedure, however, is error-prone. The syntax rules of the IRule can easily be violated, which may lead to cryptic errors when the IRule is used. More importantly, the user may change the semantic information of the IRule so that it no longer is consistent with the domain model.</Paragraph>
      <Paragraph position="1"> We are currently adding two new capabilities to the IRule editing environment: I.A tool that uses some of the same IRACQ software to let the user expand the coverage of an IRule by entering more example sentences.</Paragraph>
      <Paragraph position="2"> 2. In the case that the user wants to bypass IRACQ and modify an IRule, the user will be placed into a restrictive editor that assures the syntactic integrity of the IRule, and verifies the semantic information with the domain model.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="36" end_page="37" type="metho">
    <SectionTitle>
5 An IRule Paraphraser
</SectionTitle>
    <Paragraph position="0"> An IRule paraphraser is being implemented as a comprehensive means by which an IRACQ user can observe the capabilities introduced by a particular IRule. Since paraphrases are expressed in English, the IRule developer is spared the details of the IRule internal structure and the meaning representation.</Paragraph>
    <Paragraph position="1"> The IRule paraphraser is useful for three main purposes: expressing IRule inheritance so that the user does not redundantly add already inherited information, identifying omissions from the IRule's linguistic pattern, and verifying IRule consistency and completeness. This facility will aid in specifying and maintaining correct IRules, thereby blocking anomalous interpretation of input.</Paragraph>
    <Section position="1" start_page="36" end_page="37" type="sub_section">
      <SectionTitle>
5.1 Major design features
</SectionTitle>
      <Paragraph position="0"> The IRute paraphraser makes central use of the IRUS paraphraser (under development), which paraphrases user input, particularly in order to detect ambiguities. The IRUS paraphraser shares in large part the same knowledge bases used by the understanding process, and is completely driven by the IRUS meaning representation language (MRL) used to represent the meaning of user queries. Given an MRL expression for an input, the IRUS paraphraser first transforms it into a syntactic generation tree in which each MRL constituent is assigned a syntactic role to play in an English paraphrase. The syntactic roles of the MRL predicates are derived from the IRules that could generate the MRL.</Paragraph>
      <Paragraph position="1"> In the second phase of the IRUS paraphraser, the syntactic generation tree is transformed into an English sentence. This process uses an ATN grammar and ATN interpreter that describes how to combine the various syntactic slots in the generation tree into an English sentence. Morphological processing is performed where necessary to inflect verbs and adjectives, pluralize nouns, etc.</Paragraph>
      <Paragraph position="2"> The IRule paraphraser expresses the knowledge in a given IRule by first composing a stereotypical phrase from the IRule linguistic pattern (i.e., the left hand side of the IRule). For the &amp;quot;send&amp;quot; IRule of the previous section, such a phrase is &amp;quot;A commanding officer sent a unit to a region*. For inherited IRules, the IRule paraphraser composes representative phrases that match the combined linguistic patterns of both the local and the inherited IRules. Then, the IRUS parser/interpreter interprets that phrase using the given IRute, thus creating an MRL expression.</Paragraph>
      <Paragraph position="3"> Finally, the IRUS paraphraser expresses that MRL in English.</Paragraph>
      <Paragraph position="4"> Providing an English paraphrase from just the linguistic pattern of an IRule would be simple and uninteresting. The purpose of obtaining MRLs for representative phrases and using the IRUS paraphraser to go back to the English is to force the use of the right hand side of the IRule which specifies the semantic  interpretation. In this way anomalies introduced by, for example, manually changing variable names in the right hand side of the IRule (which point to linguistic constituents of the left hand side), can be detected.</Paragraph>
    </Section>
    <Section position="2" start_page="37" end_page="37" type="sub_section">
      <SectionTitle>
5.2 Role within IRACQ
</SectionTitle>
      <Paragraph position="0"> IRACQ will invoke the IRule Paraphraser at two interaction points: (1) at the start of an IRACQ session when the user has selected a concept to which to attach the new IRule (paraphrasing IRules already associated with that concept shows the user what is already handled--a new IRule might not even be needed), and (2) at the end of an IRACQ session, assisting the user in detecting anomalies.</Paragraph>
      <Paragraph position="1"> The planned use of the IRule Paraphraser is illustrated below with a shortened version of an IRACQ session.</Paragraph>
      <Paragraph position="2"> Word that should trigger this IRule: change Domain model term to connect IRule to: change.in.readiness Paraphrases for existing IRules (inherited phrases are capitalized):  Iru/e needs to be defined to capture sentences like &amp;quot;the readiness of Frederick changed from C1 to C2&amp;quot;.</Paragraph>
      <Paragraph position="3"> * Location information should not be repeated in the new CHANGE.IN.READINESS.2 /rule since it will be inherited.</Paragraph>
      <Paragraph position="4"> The/RACQ session proceeds as described in the previous example session.&gt;</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML