File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-1021_metho.xml
Size: 12,472 bytes
Last Modified: 2025-10-06 14:14:32
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1021"> <Title>Large-Scale Acquisition of LCS-Based Lexicons for Foreign Language Tutoring</Title> <Section position="4" start_page="140" end_page="141" type="metho"> <SectionTitle> 3 Overview of LCS Acquisition </SectionTitle> <Paragraph position="0"> We use Levin's publicly available online index (Levin, 1993) as a starting point for building LCS-based verb entries. 1 While this index provides a unique and extensive catalog of verb classes, it does not define the underlying meaning components of each class. One of the main contributions of our work is that it provides a relation between Levin's classes and meaning components as defined in the LCS representation.</Paragraph> <Paragraph position="1"> Table 2 shows three broad semantic categories and example verbs along with their associated LCS representations. We have band-constructed a database containing 191 LCS templates, i.e., one for each verb class in (Levin, 1993). In addition, we have genera.ted LCS templates for 26 additional classes that are not included in Levin's system. Several of these correspond to verbs that take sentential complements (e.g., coerce).</Paragraph> <Paragraph position="2"> 1We focus on building entries for verbs; however, we have approximately 30,000 non-verb entries per language.</Paragraph> <Paragraph position="4"> A full entry in the dal:abase includes a semantic class number with a list of possible verbs, a thematic grid, and a LCS template: (3) Class 47.8: adjoin, intersect., meet, touch .... Thematic Grid: _th_loc LCS Template: (be loc (thing 2) (at loc (thing 2) (thing 11)) ( ! ! -ingly 26) ) The semantic class label 47.8 above is taken from Levin's 1993 book (Verbs of Contiguous Location), i.e., the class to which the verb touch has been assigned. 2 A verb, together with its semantic class uniquely identifies the word sense, or LCS template, to which the verb refers. The thematic grid (_th_loc) indicates that the verb has two obligatory arguments, a theme and a location. 3 The !! in the LCS Template acts as a wildcard; it will be filled by a lexeme (i.e., a root form of the verb). The resulting form is called a constant, i.e., the idiosyncratic part of the meaning that distinguishes among members of a verb class (in the spirit of (Grimshaw, 1993; Levin and Rappaport Hovav, To appear; Pinker, 1989; Talmy, 1985)). 4 Three inputs are required for acquisition of verb entries: a semantic class, a thematic grid, and a lexeme, which we will henceforth abbreviate as &quot;class/grid/lexeme.&quot; The output is a Lisp-like expression corresponding to the LCS representation. An example of input/output for our acquisition procedure is shown here: (4) Acquisition of LCS for: touch Input: 47.8: _th_loc; &quot;touch&quot; 2Verbs not occurring in Levin's book are also assigned to classes using techniques described in {Dorr and Jones, 1996; Dorr, To appear).</Paragraph> <Paragraph position="5"> ZAn underscore (_) designates an obligatory role and a comma (,) designates an optional role.</Paragraph> <Paragraph position="6"> 4The ! ! in the Lisp representation corresponds to the angle-bracketed constants ill Table 2, e.g., ! !-ingly corresponds to (MANNER}.</Paragraph> <Paragraph position="7"> Output:</Paragraph> <Paragraph position="9"> Language-specific annotations such as the .-,uarker in the LCS Output are added to the templates by processing the components of thematic grid specifications, as we will see in more detail next.</Paragraph> </Section> <Section position="5" start_page="141" end_page="141" type="metho"> <SectionTitle> 4 Language-Specific Annotations </SectionTitle> <Paragraph position="0"> In our on-going example (4), the thematic grid _th loc indicates that the theme and the loca-tion are both obligatory (in English) and should be annotated as such in the instantiated LCS. This is achieved by inserting a *-marker appropriately.</Paragraph> <Paragraph position="1"> Consider the structural divergence between the following English/Spanish equivalents:</Paragraph> </Section> <Section position="6" start_page="141" end_page="142" type="metho"> <SectionTitle> (5) Structural Divergence: </SectionTitle> <Paragraph position="0"> E: John entered the house.</Paragraph> <Paragraph position="1"> S: John entr6 a la casa.</Paragraph> <Paragraph position="2"> 'John entered into the house.' The English sentence differs structurally from the Spanish in that the noun phrase the house corresponds to a prepositional phrase a la casa. This distinction is characterized by different positionings of the *-marker in the lexical entries produced by 2, 5 and 6) are used in place of the ultimate fillers such as john and house. The structural divergence of (,5) is a.ccomnaodated as follows: the *-marked leaf node, i.e., (thing 6) in the enter definition, is filled directly, whereas the .-marked non-leaf node, i.e., ((toward 5) loc ...) in the enC/rar definition, is filled in through unification at the internal toward node.</Paragraph> </Section> <Section position="7" start_page="142" end_page="143" type="metho"> <SectionTitle> 5 Construction of Lexical Entries </SectionTitle> <Paragraph position="0"> C.onsider the construction of a lexical entry for the verb adorn. The LC, S for this verb is in the class of This list structure recursively associates logical heads with their arguments and modifiers. The logical head is represented as a primitive/field Colnbination, e.g., GOIdent is represented as (go ident ...). The arguments for CAUSE are (thing 1) and (go ident ...).</Paragraph> <Paragraph position="1"> The substructure GO itself has two arguments (thing 2) and (toward ident ...) and a modifier (with poss ...).6 The ! !-ed constant refers to a resulting state, e.g., adorned for the verb adorn. The LC.S produced by our program for this verb is:</Paragraph> <Paragraph position="3"> (with poss (*head*) (thing 16))) The variables in the representation map between LCS positions and their corresponding thematic roles. In the LCS framework, thematic roles provide semantic information about properties of the argument and modifier structures. In (7) and (8) above, the numbers 1, 2, 9, and 16 correspond to the roles agent (ag), theme (th), predicate (pred), and possessional modifier (mod-poss), respectively. These numbers enter into the construction of LCS entries: they correspond to argument positions in the LCS template (extracted using the class/grid/lexeme specification), hfformatiou is filled into the LCS template using these numbers, coupled with the thematic grid tag for the particular word being defined.</Paragraph> <Section position="1" start_page="142" end_page="142" type="sub_section"> <SectionTitle> 5.1 Pundmnentals </SectionTitle> <Paragraph position="0"> LEXICALL locates the appropriate template in the LCS database using the class/grid pairing as an in- null holder that points to the root (cause) of the overall lexicaJ entry.</Paragraph> <Paragraph position="1"> dex, and then determines the language-specifc annotations to instantiate for that template. The default position of the .-marker is the left-most occurrence of the LCS node corresponding to a particula.r thematic role. However, if a preposition occurs in the grid, the .-marker may be placed differently. In such a. case, a. primitive representation (e.g., (to loc (at loc))) is extracted from a set of predefined mappings. If this representation corresponds to a subcomponent of the LCS template, the program recognizes this as a match against the grid, and the .-marker is placed in the template at the level where this match occurs (as in the entry for entrar given in (6) above).</Paragraph> <Paragraph position="2"> If a preposition occurs in the grid but there is no matching primitive representation, the preposition is considered to be a. collocation, and it is placed in a special slot--:collocations--which indicates that the LCS already covers the semantics of the verb and the preposition is an idiosyncratic variation (as in learn about, know of, etc.).</Paragraph> <Paragraph position="3"> If a preposition is required but it is not specified (i.e., empty parentheses 0), then the .-marker is positioned at the level dominating the node that corresponds to that role--which indicates that several different prepositions might apply (as in put on, put under, put through, etc.).</Paragraph> </Section> <Section position="2" start_page="142" end_page="143" type="sub_section"> <SectionTitle> 5.2 Examples </SectionTitle> <Paragraph position="0"> The input to LEXICALL is a class/grid/lexeme specification, where each piece of information is separated by a hash sign (#): <class>#<grid>#<lexeme># <other semantic information> For example, the input specification for the verb replant (a word not classified by Levin) is: 9.7#_ag_th,mod-poss(with)#replant# !!-ed = planted (manner = again) This input indicates that the class assigned to replant is 9.7 (Levin's Spray/Load verbs) and its grid has a.n obligatory agent (ag), theme (tit), and all optional possessional modifer with preposition with (mod-poss (with) ). The information following the final # is optional; this information was previously hand-added to the assigned thematic grids. In the current example, the !!-ed designates the form of the constant planted which, in this case, is a morphological variant of the lexeme replant, r Also, the rThe constant takes one of several forms, including: ! !-ingly for a manner, ! !-er for an instrument, and !!-ed for resulting states. If this information has not been hand-added to the class/grid/lexeme specification (as is the case with most of the verbs), a default morphological process produces the appropriate form from tile lexeme.</Paragraph> <Paragraph position="1"> manner again is specified as an additional semantic coin ponent.</Paragraph> <Paragraph position="2"> For presentational purposes, the remainder of this section uses English examples. However, as we saw in Section 4, the representations used here carry over to other languages a.s well. In fact, we have used the same acquisition program, without modification, for building our Spanish and Arabic LCS-based lexicons, each of size comparable to our English LCS-based lexicon.</Paragraph> <Paragraph position="3"> I. Thematic Roles without Prepositions (9) Example: The flower decorated the room.</Paragraph> <Paragraph position="4"> Input: 9.8#_mod-poss_th#decorate# Template: (be ident (thing 2) (at ident (thing 2) (!!-ed 9)) (with poss (*head*) (thing 16))) Two thematic roles, th and mod-poss, are specified for the above sense of the English verb decorate. The thematic code numbers--2 and 16, respectively--are .-marked and the constant decorated replaces the wildcard: (10) Output: (be ident (* thing 2) (at ident (thing 2) (decorated 9)) (with poss (*head*) (* thing 16))) II. Thematic Roles with Unspecified Prepositions null (11) Example: We parked the car near the store. We parked the car in the garage.</Paragraph> <Paragraph position="5"> Input: 9. l#_ag_th_goal ( ) #park# Template:</Paragraph> <Paragraph position="7"> The input for this example indicates that the goal is headed by an unspecifed preposition. The thematic roles ag, th, and goal() correspond to code numbers 1, 2, and 6, respectively. The variable positions for ag and th are .-marked just as in the previous case, whereas goal() requires a different treatment.</Paragraph> <Paragraph position="8"> When a required preposition is left. unspecified, the .-marker is associated with a LCS node dominating Here, the mod-poss role requires the preposition 'w~th in the modifier position: (14) Output: (cause (* thing 1) (go ident (* thing 2) (toward ident (thing 2) (at ident (thing 2) (decorated 9)))) ((* with 15) poss (*head*) (thing 16))) In order to determine the position of the .-marker for a thematic role with a required preposition, LEXICALL consults a set of predefined mappings between prepositions (or postpositions, in a language like Korean) and their corresponding primitive representations, s In the current case, the preposition with is mapped to the following primitive representation: (with poss). Since this matches a sub-component of the LCS template, the program recognizes this as a match against the grid, and the .-marker is placed in the template at the level of with.</Paragraph> </Section> </Section> class="xml-element"></Paper>