File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/p93-1033_evalu.xml
Size: 8,852 bytes
Last Modified: 2025-10-06 14:00:07
<?xml version="1.0" standalone="yes"?> <Paper uid="P93-1033"> <Title>AN EMPIRICAL STUDY ON THEMATIC KNOWLEDGE ACQUISITION BASED ON SYNTACTIC CLUES AND HEURISTICS</Title> <Section position="7" start_page="245" end_page="73548" type="evalu"> <SectionTitle> 4. EXPERIMENT </SectionTitle> <Paragraph position="0"> As described above, the proposed acquisition method requires syntactic information of arguments as input (recall Table 1). We believe that the syntactic information is one of the most commonly available resources, it may be collected from a syntactic processor or a ;yntactically processed corpus. To test the method wita a public corpus as in Grishman92a, the PENN Tre~Bank was used as a syntactically processed co~pus for learning. Argument packets (including VP packets and NP packets) were extracted .tom ATIS corpus (including JUN90, SRI_TB, and TI_TB tree files), MARI corpus (including AMBIC~ and WBUR tree files), MUC1 corpus, and MUC2 corpus of the treebank. VP packets and NP packets recorded syntactic properties of the arguments of verbs and nouns respectively.</Paragraph> <Paragraph position="1"> Since not all constructions involving movement were tagged with trace information in the corpus, to derive the arguments, the procedure needs to consider the constructions of passivization, interjection, and unbounded dependency (e.g. in relative clauses and wh-questions). That is, it needs to determine whether a constituent is an argument of a verb (or noun), whether an argument is moved, and if so, which constituent is the moved argument. Basically, Case Theory, Theta Theory (Chomsky81), and Foot Feature Principle (Gazdar85) were employed to locate the arguments (Liu92a, Liu92b).</Paragraph> <Paragraph position="2"> Table 3 summarizes the results of the argument extraction. About 96% of the trees were extracted.</Paragraph> <Paragraph position="3"> Parse trees with too many words (60) or nodes (i.e. 50 subgoals of parsing) were discarded. ~2~1 VP packets in the parse trees were derived, but only the NP packets having PPs as modifiers were extracted. These PPs could help the system to hypothesize axgument structures of nouns. The extracted packets were assimilated into an acquisition system (called EBNLA, Liu92a) as syntactic subcategorization frames. Different morphologies of lexicons were not counted as different verbs and nouns.</Paragraph> <Paragraph position="4"> As an example of the extracted argument packets, consider the following sentence from MUCI: &quot;..., at la linea ..... where a FARC front ambushed an 1 lth brigade army patrol&quot;.</Paragraph> <Paragraph position="5"> The extraction procedure derived the following VP packet for &quot;ambushed&quot;: ambushed (NP: a FARC fxont) (WHADVP: where) (NP: an 1 lth brigade army patrol) The first NP was the external argument of the verb.</Paragraph> <Paragraph position="6"> Other constituents were internal arga:nents of the verb. The procedure could not determ,r.e whether an argument was optional or not.</Paragraph> <Paragraph position="7"> In the corpora, most packets were for a small number of verbs (e.g. 296 packets tot &quot;show&quot; were found in ATIS). Only 1 to 2 packets could be found for most verbs. Therefore, although tt.e parse trees could provide good quality of argument packets, the information was too sparse to resoNe, thematic role ambiguities. This is a weakness embedded in most corpus-based acquisition methods, since the learner might finally fail to collect sufficient information after spending much. effort to process the corpus. In that case, the ~ambiguities need to be temporarily suspended. ~To seed-up learning and focus on the usage of the proposed method, a trainer was asked to check the thematic validities (yes/no) of the sentences generated b,, the learner.</Paragraph> <Paragraph position="8"> Excluding packets of some special verbs to be discussed later and erroneous packets (due to a small amount of inconsistencies and incompleteness of the corpus and the extraction procedure), the packets were fed into the acquisition system (one packet for a verb). The average accuracy rate of the acquired argument struct~ares was 0.86. An argument structure was counted as correct if it was unambiguous and confirmed by the trainer. On average, for resolving ambiguities, 113 queries were generated for every 100 successfully acquired argument structures. The packets from ATIS caused less ambiguities, since in this corpus there were many imperative sentences to which Impe:ative Heuristic may be applied. Volition Heuristic, Thematic Hierarchy Heuristic, and Preposition Heuristic had almost equal frequencies of application in the experiment.</Paragraph> <Paragraph position="9"> As an. example of how the clues and heuristics could successfully derive argument structures of verbs, consider the sentence from ATIS: &quot;The flight going to San Francisco ...&quot;.</Paragraph> <Paragraph position="10"> Without issuing any queries, the learner concluded that an argument structure of &quot;go&quot; is &quot;{Th}, {Go}&quot; This was because, according to the clues, &quot;San Francisco&quot; couM only be Goal, while according to One-Theme Heuristic, &quot;the flight&quot; was recognized as Theme. Most argument structures were acquired using 1 to ~ queries.</Paragraph> <Paragraph position="11"> The result showed that, after (manually or automatically) acquiring an argument packet (i.e. a syntactic s t, bcategorization frame plus the syntactic constituent l 3f the external argument) of a verb, the acquisition~'rnethod could be invoked to upgrade the syntactic knowledge to thematic knowledge by issuing only 113 queries for every 100 argument packets. Since checking the validity of the generated sentences is not a heavy burden for the trainer (answering 'yes' or 'no' only), the method may be attached to various systems for promoting incremental extensibility of thematic knowledge.</Paragraph> <Paragraph position="12"> The way of counting the accuracy rate of the acquired argument structures deserves notice. Failed cases were mainly due to the clues and heuristics that were too strong or overly committed. For example, the thematic role of &quot;the man&quot; in (4.1) from MARI could not be acquired using the clues and heuristics.</Paragraph> <Paragraph position="13"> (4.1) Laura ran away with the man.</Paragraph> <Paragraph position="14"> In the terminology of Gruber76, this is an expression of accompaniment which is not considered in the clues and heuristics. As another example, consider (4.2) also from MARI.</Paragraph> <Paragraph position="15"> (4.2) The greater Boston area ranked eight among major cities for incidence of AIDS.</Paragraph> <Paragraph position="16"> The clues and heuristics could not draw any conclusions on the possible thematic roles of &quot;eight&quot;. On the other hand, the cases cour.ted as &quot;failed&quot; did not always lead to &quot;erroneous&quot; argument structures. For example, &quot;Mary&quot; in (2.9) &quot;John promised Mary to marry her&quot; was treated as Theme rather than Goal, because &quot;Mary&quot; is the only possible Theme. Although &quot;Mary&quot; may be Theme in this case as well, treating &quot;Mary&quot; as Goal is more f'me-grained. The clues and heuristics may often lead to acceptable argument structures, even if the argument structures are inherently ambiguous. For example, an NP might function as more than one thematic role within a sentence (Jackendoff87). Ia (4.3), &quot;John&quot; may be Agent or Source.</Paragraph> <Paragraph position="17"> (4.3) John sold Mary a coat.</Paragraph> <Paragraph position="18"> Since Thematic Hierarchy Heuristic assumes that subjects and objects cannot reside at the same level, &quot;John&quot; must not be assigned as Sotuce. Therefore, &quot;John&quot; and &quot;Mary&quot; are assigned as Agent and Goal respectively, and the ambiguity is resolved.</Paragraph> <Paragraph position="19"> In addition, some thematic roles may cause ambiguities if only syntactic evidences are available. Experiencer, such as &quot;John&quot; in (4.4), arid Maleficiary, such as &quot;Mary&quot; in (4.5), are the two examples. (4.4) Mary surprised John.</Paragraph> <Paragraph position="20"> (4.5) Mary suffers a headache.</Paragraph> <Paragraph position="21"> There are difficulties in distinguishing Experiencer, Agent, Maleficiary and Theme. Fortunately, the verbs with Experiencer and Maleficiary may be enumerated before learning. Therefore, the argumen,: structures of these verbs are manually constructed rather than learned by the proposed method.</Paragraph> </Section> class="xml-element"></Paper>