File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2236_metho.xml
Size: 8,863 bytes
Last Modified: 2025-10-06 14:15:09
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2236"> <Title>Automatic Construction of Frame Representations for Spontaneous Speech in Unrestricted Domains</Title> <Section position="4" start_page="1448" end_page="1448" type="metho"> <SectionTitle> 3 Resources and System Components </SectionTitle> <Paragraph position="0"> We use the following resources to build our system: * the SWITCHBOARD (SWBD) corpus (Godfrey et al., 1992) for speech data, transcripts, and annotations at various levels (e.g., for segment boundaries or parts of speech) * the JANUS speech recognizer (Waibel et al., 1996) to provide us with input hypotheses * a part of speech (POS) tagger, derived from (Brill, 1994), adapted to and retrained for the SWITCHBOARD corpus * a preprocessing pipe which cleans up speech dysfluencies (e.g., repetitions, hesitations) and contains a segmentation module to split &quot;the speech recognizer turns into short clauses * a chart parser (Ward, 1991) with a POS based grammar to generate the chunks 3 (phrasal constituents) null * WordNet 1.5 (Miller et al., 1993) for the extraction of subcategorization (subcat) frames for all senses of a verb (including semantic features, such as &quot;animacy') * a mapper which tries to find the &quot;best match&quot; between the chunks found within a short clause and the subcat frames for the main verb in that clause The major blocks of the system architecture are depicted in Figure I.</Paragraph> <Paragraph position="1"> We want to stress here that except for the development of the small POS grammar and the framemapper, the other components and resources were already present or quite simple to implement. There has also been significant work on (semi-)automatic induction of subcategorization frames (Manning, without the important knowledge source from Word-Net, a similar system could be built for other languages as well. Also, the Euro-WordNet project (Vossen et al., 1997) is currently underway in building WordNet resources for other European languages. null</Paragraph> </Section> <Section position="5" start_page="1448" end_page="1449" type="metho"> <SectionTitle> 4 Preliminary Experiments </SectionTitle> <Paragraph position="0"> We performed some initial experiments using the SWBD transcripts as input to the system. These were POS tagged, preprocessed, segmented into short clauses, parsed in chunks using a POS based grammar, and finally, for each short clause, the frame-mapper matched all potential arguments of the verb against all possible subcategorization frames listed in the lemmata file we had precomputed from WordNet (see section 2).</Paragraph> <Paragraph position="1"> In total we had over 600000 short clauses, containing approximately 1.7 million chunks. Only 18 different chunk patterns accounted for about half of these short clauses. Table 2 shows these chunk short clauses patterns and their frequencies. 4 Most of these contain main verbs and hence can be sensibly used in a mapping procedure but some of them (e.g., aff, con j, advp) do not. These are typically backchannellings, adverbial comments, and colloquial forms (e.g., &quot;yeah&quot;, &quot;and...&quot;, &quot;oh really&quot;). They can be easily dealt with a preprocessing module that assigns them to one of these categories and does not send them to the mapper.</Paragraph> <Paragraph position="2"> Another interesting observation we make here is that within these most common chunk patterns, there is only one pattern (np vb np pp) which could lead to a potential PP-attachment ambiguity. We conjecture that this is most probably due to the nature of conversational speech which, unlike for written (and more formal) language, does not make too frequent use of complex noun phrases that have one or multiple prepositional phrases attached to them.</Paragraph> <Paragraph position="3"> We selected 98 short clauses randomly from the output to perform a first error analysis.</Paragraph> <Paragraph position="4"> The results are summarized in Table 3. In over 21% of the clauses, the mapper finds at least one mapping that is correct. Another 23.5% of the clauses do not contain any chunks that are worth to be mapped in the first place (noises, hesitations),</Paragraph> </Section> <Section position="6" start_page="1449" end_page="1449" type="metho"> <SectionTitle> 4 Chunk abbreviations: conj=conjunction, aft=affirmative, </SectionTitle> <Paragraph position="0"> np=noun phrase, vb=verbai chunk, vbneg=negated verbal chunk, adjp=adjectival phrase, advp=adverbial phrase, pp=prepositional phrase.</Paragraph> <Paragraph position="1"> so these could be filtered out and dealt with entirely before the mapping process takes place, as we mentioned earlier. 28.6% of the clauses are in some sense incomplete, mostly they are lacking a main verb which is the crucial element to get the mapping procedure started. We regard these as &quot;hard&quot; residues, including well-known linguistic problems such as ellipsis, in addition to some spoken language ungrammaticalities. The last two categories (26.6% combined) in the table are due to the incompleteness and inaccuracies of the system components themselves.</Paragraph> <Paragraph position="2"> To illustrate the process of mapping, we shall present an example here, starting from the POS-tagged utterance up to the semantic frame representation:5 s short clause, annotated with POS:</Paragraph> <Paragraph position="4"> Since chunks like advp or conj are not part of the WordNet frames, we remove these from the parsed chunk sequence, before a mapping attempt is being made. 7 In our example, WordNet yields 14 frames for 6 senses of the main verb talk. The mapper already finds a &quot;perfect match &quot;s for the first, i.e., the most frequent sense 9 of the verb (mapping 4 can be estimated to be more accurate than mapping 3 since also the preposition matches to the input string).</Paragraph> <Paragraph position="5"> This will be also the default sense to choose, unless there is a word sense disambiguating module available that strongly favors a less frequent sense. Since WordNet 1.5 does not provide detailed semantic frame information but only general subcategorization with extensions such as &quot;animate/inanimate&quot;, we plan to extend this information by processing machine-readable dictionaries which provide a richer set of semantic role information of verbal heads, ldeg It is interesting to see that even at this early stage of our project the results of this shallow analysis are quite encouraging. If we remove those clauses from the test set which either should not or cannot be mapped in the first place (because they are either not containing any structure (&quot;non-mappable&quot;) or are ungrammatical), the remainder of 47 clauses already has a success-rate of 44.7%. Improvements of the system components before the mapping stage as well as to the mapper itself will further increase the mapping performance.</Paragraph> <Paragraph position="6"> 7These chunks can be easily added to the mapper's output again, as shown in the example.</Paragraph> <Paragraph position="7"> Spartial matches, such as mappings I and 2 in this example, are allowed but disfavored to perfect matches. 9In WordNet 1.5, the first sense is also supposed to be the most frequent one.</Paragraph> <Paragraph position="8"> ldegThe &quot;agent&quot; and &quot;theme&quot; assignments are currently just defaults for these types of subcat frames.</Paragraph> </Section> <Section position="7" start_page="1449" end_page="1449" type="metho"> <SectionTitle> 5 Future Work </SectionTitle> <Paragraph position="0"> It is obvious from our evaluation, that most core components, specifically the mapper need to be improved and refined. As for the mapper, there are issues of constituent coordination, split verbs, infinitival complements, that need to be addressed and properly handled. Also, the &quot;linkage&quot; between main and relative clauses has to be performed such that this information is maintained and not lost due to the segmentation into short clauses.</Paragraph> <Paragraph position="1"> Experiments with speech recognizer output instead of transcripts will show in how far we still get reasonable frame representations when we are faced with erroneous input in the first place. Specifically, since the mapper relies on the identification of the &quot;head verb&quot;, it will be crucial that at least those words are correctly recognized and tagged most of the time.</Paragraph> <Paragraph position="2"> To further enhance our representation, we could use speech act tags, generated by an automatic speech act classifier (Finke et al., 1998) and attach these to the short clauses. 11</Paragraph> </Section> class="xml-element"></Paper>