File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/w93-0110_metho.xml
Size: 10,515 bytes
Last Modified: 2025-10-06 14:13:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0110"> <Title>Acquiring Predicate-Argument Mapping Information from Multilingual Texts</Title> <Section position="4" start_page="108" end_page="110" type="metho"> <SectionTitle> CAUSED-PROCESS AGENT THEME PROCESS-OR-STATE THEME AGENTIVE-ACTION AGENT INVERSE-STATE GOAL THEME </SectionTitle> <Paragraph position="0"> marker (cf. Kuno \[12\]). 2 3 So we add such information to the INVERSE-STATE mapping rule for Japanese. Generalization expressed in situation types has saved us from defining semantic mapping rules for each verb sense in each language, and also made it possible to acquire them from large corpora automatically.</Paragraph> <Paragraph position="1"> This classification system has been partially derived from Vendler and Dowty's aspectual classifications \[19, 9\] and Talmy's lexicalization patterns \[18\]. For example, all AGENTIVE-ACTION verbs are so-called activity verbs, and so-called stative verbs fall under either INVERSE-STATE (if transitive) or PROCESS-OR-STATE (if intransitive).</Paragraph> <Paragraph position="2"> However, the situation types are not for specifying the semantics of aspect, which is actually a property of the whole sentence rather than a verb itself (cf. Krifka \[11\], Dorr \[8\], Mocns and Steedman \[15\]). For instance, as shown below, the same verb can be classified into two different aspectual classes (i.e. activity and accomplishment) depending on the types of Object NP's or existence of certain PP's.</Paragraph> <Paragraph position="3"> (1) a. Sue drank wine for/*in an hour.</Paragraph> <Paragraph position="4"> b. Sue drank a bottle of wine *for/in an hour.</Paragraph> <Paragraph position="5"> (2) a. Harry climbed for/*in an hour.</Paragraph> <Paragraph position="6"> b. Harry climbed to the top *for/in an hour.</Paragraph> <Paragraph position="7"> Situation types are intended to address the issue of cross-linguistic predicate-argument mapping generalization, rather than the semantics of aspect.</Paragraph> <Section position="1" start_page="108" end_page="109" type="sub_section"> <SectionTitle> 2.2 Idiosyncrasies </SectionTitle> <Paragraph position="0"> Idiosyncrasies slots in the lexicon specify word sense-specific idiosyncratic phenomena which cannot be captured by semantic concepts or situation types. In particular, subcategorized pre/postpositions of verbs are specified here. For example, the fact that &quot;look&quot; denotes its TItEME argument by the preposition &quot;at&quot; is captured by specifying idiosyncrasies. Examples of lexical entries with idiosyncrasies in English, Spanish and Japanese are shown in Figure 1. As discussed in the next section, we derive this kind of word-specific information automatically from corpora.</Paragraph> <Paragraph position="1"> phrase. However, as Kuno \[12\] points out, since this is an idiosyncratic phenomenon, such information does not go to the default mapping rule.</Paragraph> <Paragraph position="3"/> </Section> <Section position="2" start_page="109" end_page="109" type="sub_section"> <SectionTitle> 2.3 Semantic Concepts </SectionTitle> <Paragraph position="0"> Each lexical meaning of a verb is represented by a semantic concept (or frame) in our language-independent knowledge base, which is similar to the one described in Onyshkevych and Nirenburg \[17\]. Each verb frame has thematic role slots, which have two facets, TYPE and MAPPING. A TYPE facet value of a given slot provides a constraint on the type of objects which can be the value of the slot. In the MAPPING facets, we have encoded some cross-linguistically general predicate-argument mapping information. For example, we have defined that all the subclasses of #COMMUNICATION-EVENTS (e.g.</Paragraph> <Paragraph position="1"> #REPORT#, #CONFIRM#, etc.) map their sentential complements (SENT-COMP) to THEME, as shown below.</Paragraph> </Section> <Section position="3" start_page="109" end_page="110" type="sub_section"> <SectionTitle> 2.4 Merging Predicate-Argument Mapping Information </SectionTitle> <Paragraph position="0"> For each verb, the information stored in the three levels discussed above is merged to form a complete set of mapping rules. During this merging process, the idiosyncrasies take precedence over the situation types and the semantic concepts, and the situation types over the semantic concepts. For example, the two derived mapping rules for &quot;break&quot;(i.e.</Paragraph> <Paragraph position="1"> one for &quot;break&quot; as in &quot;John broke the window&quot; and the other for &quot;break&quot; as in &quot;The window broke&quot;) are shown in Figure 2. Notice that the semantic TYPE restriction and INSTRUMENT role stored in the knowledge base are also inherited at this time.</Paragraph> <Paragraph position="2"> form two predicate-argument mappings for the verb &quot;break.&quot;</Paragraph> </Section> </Section> <Section position="5" start_page="110" end_page="113" type="metho"> <SectionTitle> 3 Automatic Acquisition from Corpora </SectionTitle> <Paragraph position="0"> In order to expand our lexicon to the size needed for broad coverage and to be able to tune the system to specific domains quickly, we have implemented algorithms to automatically build multilingual lexicons from corpora. In this section, we discuss how the situation types and lexical idiosyncrasies are determined for verbs.</Paragraph> <Paragraph position="1"> Our overall approach is to use simple robust parsing techniques that depend on a few language-dependent syntactic heuristics (e.g. in English and Spanish, a verb's object usually directly follows the verb), and a dictionary for part of speech information. We have used these techniques to acquire information from English, Spanish, and Japanese corpora varying in length from about 25000 words to 2.7 million words.</Paragraph> <Section position="1" start_page="110" end_page="113" type="sub_section"> <SectionTitle> 3.1 Acquiring Situation Type Information </SectionTitle> <Paragraph position="0"> We use two surface features to restrict the possible situation types of a verb: the verb's transitivity rating and its subject animacy.</Paragraph> <Paragraph position="1"> The transitivity rating of a verb is defined to be the number of transitive occurrences in the corpus divided by the total occurrences of the verb. In English, a verb appears in the transitive when either: that are unambiguously transitive the transitivity rating is above 0.6. The verb &quot;spend&quot; has a transitivity rating of 0.38 because most of its direct objects are numeric dollar amounts, Phrases which begin with a number are not recognized as direct objects, since most numeric amounts following verbs are adjuncts as in &quot;John ran 3 miles.&quot; We define a verb's subject animacy to be the number of times the verb appears with an animate subject over the total occurrences of the verb where we identified the subject. Any noun or pronoun directly preceding a verb is considered to be its subject. This heuristic fails in eases where the subject NP is modified by a PP or relative clause as in role. A high subject animacy does not correlate with any particular situation type, since several stative verbs take only animate subjects (e.g. perception verbs).</Paragraph> <Paragraph position="2"> The predicted situation types shown in Figure 3 were calculated with the following algorithm: 1. Assume that the verb can occur with every situation type.</Paragraph> <Paragraph position="3"> 2. If the transitivity rating is greater than 0.6, then discard the AGENTIVE-ACTION and PROCESS-OR-STATE possibilities.</Paragraph> <Paragraph position="4"> 3. If the transitivity rating is below 0.1, then discard the CAUSED-PROCESS and INVERSE-STATE possibilities.</Paragraph> <Paragraph position="5"> 4. If the subject animacy is below 0.6, then discard the CAUSED-PROCESS and</Paragraph> </Section> </Section> <Section position="6" start_page="113" end_page="113" type="metho"> <SectionTitle> AGENTIVE-ACTION possibilities. </SectionTitle> <Paragraph position="0"> We are planning several improvements to our situation type determination algorithms.</Paragraph> <Paragraph position="1"> First, because some stative verbs can take animate subjects (e.g. perception verbs like &quot;see&quot;, &quot;know&quot;, etc.), we sometimes cannot distinguish between INVERSE-STATE or PROCESS-OR-STATE and CAUSED-PROCESS or AGENTIVE-ACTION verbs. This problem, however, can be solved by using algorithms by Brent \[3\] or Dorr \[8\] for identifying stative verbs.</Paragraph> <Paragraph position="2"> Second, verbs ambiguous between CAUSED-PROCESS and PROCESS-OR-STATE (e.g. &quot;break&quot;, &quot;vary&quot;) often get inconclusive results because they appear transitively about 50% of the time. When these verbs are transitive, the subjects are almost always animate and when they are intransitive, the subjects are nearly always inanimate. We plan to recognize these situations by calculating animacy separately for transitive and intransitive cases.</Paragraph> <Section position="1" start_page="113" end_page="113" type="sub_section"> <SectionTitle> 3.2 Acquiring Idiosyncratic Information </SectionTitle> <Paragraph position="0"> We automatically identify likely pre/postpositional argument structures for a given verb by looking for pre/postpositions in places where they are likely to attach to the verb (i.e.</Paragraph> <Paragraph position="1"> within a few words to the right for Spanish and English, and to the left for Japanese).</Paragraph> <Paragraph position="2"> When a particular pre/postposition appears here much more often than chance (based on either Mutual Information or a chi-squared test \[5, 4\]), we assume that it is a likely argument. A very similar strategy works well at identifying verbs that take sententiai complements by looking for complementizers (e.g. &quot;that&quot;, &quot;to&quot;) in positions of likely attachment. Some English examples are shown in Tables 4 and 5, and Spanish examples are shown in Tables 6 and 7. The details of the exact algorithms used for English are contained in McKee and Maloney \[13\]. Areas for improvement include distinguishing between cases where a verb takes a prepositional arguments, a prepositional particle, or a common adjunct.</Paragraph> </Section> </Section> class="xml-element"></Paper>