File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2803_metho.xml

Size: 18,617 bytes

Last Modified: 2025-10-06 14:09:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2803">
  <Title>A Little Goes a Long Way: Quick Authoring of Semantic Knowledge Sources for Interpretation</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Carmel-Tools Interpretation Framework
</SectionTitle>
    <Paragraph position="0"> Sentence: During the fall of the elevator the man and the keys have the same constant downward acceleration that the elevator has.</Paragraph>
    <Paragraph position="2"> (acceleration id5 man down constant non-zero) (acceleration id6 keys down constant non-zero) (acceleration id7 elevator down constant non-zero)) Gloss: The elevator is in a state of freefall at the same time when there is an equivalence between the elevator's acceleration and the constant downward nonzero acceleration of both the man and the keys  tates uncovering complex relationships encoded syntactically within a sentence One of the goals behind the design of Carmel-Tools is to leverage off of the normalization over surface syntactic variation that deep syntactic analysis provides. While our approach is not specific to a particular framework for deep syntactic analysis, we have chosen to build upon the publicly available LCFLEX robust parser (Ros'e et al., 2002), the CARMEL grammar and semantic interpretation framework (Ros'e, 2000), and the COMLEX lexicon (Grishman et al., 1994). This same broad coverage, domain general interpretation framework has already been used in a number of educational applications including (Zinn et al., 2002; VanLehn et al., 2002).</Paragraph>
    <Paragraph position="3"> Syntactic feature structures produced by the CARMEL grammar normalize those aspects of syntax that modify the surface realization of a sentence but do not change its deep functional analysis. These aspects include tense, negation, mood, modality, and syntactic transformations such as passivization and extraction. Thus, a sentence and it's otherwise equivalent passive counterpart would be encoded with the same set of functional relationships, but the passive feature would be negative for the active version of the sentence and positive for the passive version. A verb's direct object is assigned the obj role regardless of where it appears in relation to the verb. Furthermore, constituents that are shared between more than one verb, for example a noun phrase that is the object of a verb as well as the subject of a relative clause modifier, will be assigned both roles, in that way &amp;quot;undoing&amp;quot; the relative clause extraction. In order to do this analysis reliably, the component of the grammar that performs the deep syntactic analysis of verb argument functional relationships was generated automatically from a feature representation for each of 91 of COMLEX's verb subcategorization tags (Ros'e et al., 2002). Altogether there are 519 syntactic configurations of a verb in relation to its arguments covered by the 91 subcategorization tags, all of which are covered by the CARMEL grammar.</Paragraph>
    <Paragraph position="4"> CARMEL provides an interface to allow semantic interpretation to operate in parallel with syntactic interpretation at parse time in a lexicon driven fashion (Ros'e, 2000). Domain specific semantic knowledge is encoded declaratively within a meaning representation specification. Semantic constructor functions are compiled automatically from this specification and then linked into lexical entries. Based on syntactic head/argument relationships assigned at parse time, the constructor functions enforce semantic selectional restrictions and assemble meaning representation structures by composing the meaning representation associated with the constructor function with the meaning representation of each of its arguments. After the parser produces a semantic feature structure representation of the sentence, predicate mapping rules then match against that representation in order to produce a predicate language representation in the style of Davidsonian event based semantics (Davidson, 1967; Hobbs, 1985), as mentioned above. The predicate mapping stage is the key to the great flexibility in representation that Carmel-Tools is able to offer. The mapping rules perform two functions. First, they match a feature structure pattern to a predicate language representation.</Paragraph>
    <Paragraph position="5"> Next, they express where in the feature structure to look for the bindings of the uninstantiated variables that are part of the associated predicate language representation.</Paragraph>
    <Paragraph position="6"> Because the rules match against feature structure patterns and are thus above the word level, and because the predicate language representations associated with them can be arbitrarily complex, the mapping process is decompositional in manner but is not constrained to rigidly follow the structure of the text.</Paragraph>
    <Paragraph position="7"> Figure 2 illustrates the power in the pairing between deep functional analyses and the predicate language representation. The deep syntactic analysis of the sentence makes it possible to uncover the fact that the expression &amp;quot;constant downward acceleration&amp;quot; applies to the acceleration of all three entities mentioned in the sentence. The coordination in the subject of the sentence makes it possible to infer that both the acceleration of the man and of the keys are individually in an equative relationship with the acceleration of the elevator. The identification token of the and predicate allows the whole representation of the matrix clause to be referred to in the rel-time predicate that represents the fact that the equative relationships hold at the same time as the elevator is in a state  of freefall. But individual predicates, each representing a part of the meaning of the whole sentence, can also be referred to individually if desired using their own identification tokens.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Carmel-Tools Authoring Process
</SectionTitle>
    <Paragraph position="0"> The purpose of Carmel-Tools is to insulate the author from the details of the underlying domain specific knowledge sources. If an author were building knowledge sources by hand for this framework, the author would be responsible for building an ontology for the semantic feature structure representation produced by the parser, linking pointers into this hierarchy into entries in the lexicon, and writing predicate mapping rules. With Carmel-Tools, the author never has to deal directly with these knowledge sources. The Carmel-Tools authoring process involves designing a Predicate Language Definition, augmenting the base lexical resources by either loading raw human tutoring corpora or entering example texts by hand, and annotating example texts with their corresponding representation in the defined Predicate Language Definition. From this authored knowledge, CARMEL's semantic knowledge sources can be generated and compiled. The knowledge source inference algorithm ensures that knowledge coded redundantly across multiple examples is represented only once in the compiled knowledge sources. The authoring interface allows the author or authors to test the compiled knowledge sources and then continue the authoring process by updating the Predicate Language Definition, loading additional corpora, annotating additional examples, or modifying already annotated examples.</Paragraph>
    <Paragraph position="1"> The Carmel-Tools authoring process was designed to eliminate the most time-consuming parts of the authoring  such a way as to prevent them from introducing inconsistencies between knowledge sources, which is particularly crucial when multiple authors work together. For example, a GUI interface for entering propositional representations for example texts insures that the entered representation is consistent with the author's Predicate Language Definition. Compiled knowledge sources contain pointers back to the annotated examples that are responsible for their creation. Thus, it is also able to provide troubleshooting facilities to help authors track down potential sources for incorrect analyses generated from compiled knowledge sources. When changes are made to the Predicate Language Definition, Carmel-Tools tests whether each proposed change would cause conflicts with any annotated example texts. An example of such a change would be deleting an argument from a predicate type where some example has as part of its analysis an instantiation of a predicate with that type where that argument is bound. If so, it lists these example texts for the author and requires the author to modify the annotated examples first in such a way that the proposed change will not cause a conflict, in this case that would mean uninstantiating the variable that the author desires to remove. In cases where changes would not cause any conflict, such as adding an argument to a predicate type, renaming a predicate, token, or type, or removing an argument that is not bound in any instantiated proposition, these changes are made throughout the database automatically.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Defining the Predicate Language Definition
</SectionTitle>
      <Paragraph position="0"> The author begins the authoring process by designing the propositional language that will be the output representation from CARMEL using the authored knowledge sources. This is done on the Predicate Language Definition page of the Carmel-Tools interface, displayed in  language that is as simple or complex as is required by the type of reasoning, if any, that will be applied to the output representations by the tutoring system as it formulates its response to the student's natural language input.</Paragraph>
      <Paragraph position="1"> The interface includes facilities for defining a list of predicates and Tokens to be used in constructing propositional analyses. Each predicate is associated with a basic predicate type, which is a associated with a list of arguments. Each basic predicate type argument is itself associated with a type that defines the range of atomic values, which may be tokens or identifier tokens referring to instantiated predicates, that can be bound to it. Thus, tokens also have types. Each token has one or more basic token types. Besides basic predicate types and basic token types, we also allow the definition of abstract types that can subsume other types.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Generating Lexical Resources and Annotating
Example Sentences
</SectionTitle>
      <Paragraph position="0"> When the predicate language definition is defined, the next step is to generate the domain specific lexical resources and annotate example sentences with their corresponding representation within this defined predicate language. The author begins this process on the Example Map Page, displayed in Figure 4.</Paragraph>
      <Paragraph position="1"> Carmel-Tools provides facilities for loading a raw human tutoring corpus file. Carmel-Tools then makes a list of each unique morpheme it finds in the file and then augments both its base lexicon (using entries from COM-LEX), in order to include all morphemes found in the transcript file that were not already included in the base lexicon, and the spelling corrector's word list, so that it includes all morphological forms of the new lexical entries. It also segments the file into a list of student sentence strings, which are then loaded into a Corpus Examples list, which appears on the right hand side of the interface. Searching and sorting facilities are provided to make it easy for authors to find sentences that have certain things in common in order to organize the list of sentences extracted from the raw corpus file in a convenient way. For example, a Sort By Similarity button causes Carmel-Tools to sort the list of sentences according to their respective similarity to a given text string according to an LSA match between the example string and each corpus sentence. The interface also includes the Token List and the Predicate List, with all defined tokens and predicates that are part of the defined predicate language. When the author clicks on a predicate or token, the Examples list beside it will display the list of annotated examples that have been annotated with an analysis containing that token or predicate.</Paragraph>
      <Paragraph position="2"> Figure 5 displays how individual texts are annotated.</Paragraph>
      <Paragraph position="3"> The Analysis box displays the propositional representation of the example text. This analysis is constructed using the Add Token, Delete, Add Predicate, and Modify Predicatebuttons, as well as their subwindows, which are not shown. Once the analysis is entered, the author may indicate the compositional breakdown of the example text by associating spans of text with parts of the analysis by means of the Optional Match and Mandatory Match buttons. For example, the noun phrase &amp;quot;the man&amp;quot; corresponds to the man token, which is bound in two places. Each time a match takes place, the Carmel-Tools internal data structures create one or more templates that show how pieces of syntactic analyses corresponding to spans of text are matched up with their corresponding propositional representation.</Paragraph>
      <Paragraph position="4"> From this match Carmel-Tools infers both that &amp;quot;the man&amp;quot; is a way of expressing the meaning of the man token in text and that the subject of the verbholdcan be bound to the ?body1 argument of the become predicate. By decomposing example texts in this way, Carmel-Tools constructs templates that are general and can be reused in multiple annotated examples. It is these created templates that form the basis for all compiled semantic knowledge sources. Thus, even if mappings are represented redundantly in annotated examples, they will not be represented redundantly in the compiled knowledge sournces.</Paragraph>
      <Paragraph position="5"> The list of templates that indicates the hierarchical breakdown of this example text are displayed in the Templates list on the right hand side of Figure 5. Note that while the author matches spans to text to portions of the meaning representation, the tool stores mappings between feature structures and portions of meaning representation, which is a more general mapping.</Paragraph>
      <Paragraph position="6"> Templates can be generalized by entering paraphrases for portions of template patterns. Internally what this accomplishes is that all paraphrases listed can be interpreted by CARMEL as having the same meaning so that they can be treated as interchangeable in the context of this template. A paraphrase can be entered either as a specific string or as a Defined Type, including any type defined in the Predicate Language Definition. What this means is that the selected span of text can be replaced by any span of text that can be interpreted in such a way that its predicate representation's type is subsumed by the indicated type.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Compiling Knowledge Sources
</SectionTitle>
      <Paragraph position="0"> Each template that is created during the authoring process corresponds to one or more elements of each of the required domain specific knowledge sources, namely the ontology, the lexicon with semantic pointers, and the predicate mapping rules. Using the automatically generated knowledge sources, most of the &amp;quot;work&amp;quot; for mapping a novel text onto its predicate language representation is done either by the deep syntactic analysis, where a lot of surface syntactic variation is factored out, and during the predicate mapping phase, where feature structure patterns are mapped onto their corresponding predicate language representations. The primary purpose of the sentence level ontology that is used to generate a semantic feature structure at parse time is primarily for the purpose of limiting the ambiguity produced by the parser. Very little generalization is obtained by the semantic feature structures created by the automatically generated knowledge sources over that provided by the deep syntactic analysis alone. By default, the automatically generated ontology contains a semantic concept corresponding to each word appearing in at least one annotated example. A semantic pointer to that concept is then inserted into all lexical entries for the associated word that were used in one of the annotated examples. An exception occurs where paraphrases are entered into feature structure representations.</Paragraph>
      <Paragraph position="1"> In this case, a semantic pointer is entered not only into the entry for the word from the sentence, but also the words from the paraphrase list, allowing all of the words in the paraphrase list to be treated as equivalent at parse time.</Paragraph>
      <Paragraph position="2"> The process is a bit more involved in the case of verbs.</Paragraph>
      <Paragraph position="3"> In this case it is necessary to infer based on the parses of the examples where the verb appears which set of sub-categorization tags are consistent, thus limiting the set of verb entries for each verb that will be associated with a semantic pointer, and thus which entries can be used at parse time in semantic interpretation mode. Carmel-Tools makes this choice by considering both which arguments are present with that verb in the complete database of annotated examples as well as how the examples were broken down at the matching stage. All non-extracted arguments are considered mandatory. All extracted arguments are considered optional. Each COMLEX subcat tag is associated with a set of licensed arguments. Thus, subcat tags are considered consistent if the set of licensed arguments contains at least all mandatory arguments and doesn't license any arguments that are not either mandatory or optional. Predicate mapping rules are generated for each template by first converting the corresponding syntactic feature structure into the semantic representation defined by the automatically generated ontology and lexicon with semantic pointers. Predicate mapping rules are then created that map the resulting semantic feature structure into the associated predicate language representation. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML