File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/w93-0109_intro.xml
Size: 2,241 bytes
Last Modified: 2025-10-06 14:05:30
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0109"> <Title>The Automatic Acquisition of Frequencies of Verb Subcategorization Frames from Tagged Corpora</Title> <Section position="3" start_page="95" end_page="96" type="intro"> <SectionTitle> 2 Method </SectionTitle> <Paragraph position="0"> The procedure to find verh-subcat frequencies, automatically, is as follows.</Paragraph> <Paragraph position="1"> (1) Make a list of verbs out of the tagged corpus. (2) For each verb on the list (the &quot;target verb&quot;), (2.1) Tokenize each sentence containing the target verb in the following way: All the noun phrases except pronouns are tokenized as &quot;n&quot; by a noun phrase parser and all the rest of the words are also tokenized following the schmema in Table 1. For example, the sentence &quot;The corresponding mental-state verbs do not follow \[target verb\] these rules in a straightforward way&quot; is transformed to a sequence of tokens &quot;bnvaknpne'.</Paragraph> <Paragraph position="2"> (2.2) Apply a set of subcat extraction rules to the tokenized sentences. These rules are written as regular expressions and they are obtained through the examination of occurrences of a small sample of verbs in a training text. Note that in the actual implementation of the procedure, all of the redundant operations are eliminated. Our NP parser also uses a finite-state grammar. It is designed b: sentence initial maker k: target verb i: pronoun n: noun phrase v: finite verb u: participial verb d: base form verb p: preposition e: sentence final maker t: &quot;to&quot; m: modal w: relative pronoun a: adverb x: punctuation c: complementizer &quot;that&quot; s: the rest especially to support identification of verb-subcat frames. One of its special features is that it detects time-adjuncts such as &quot;yesterday&quot;, &quot;two months ago&quot;, or &quot;the following day&quot;, and eliminates them in the tokenization process. For example, the sentence &quot;He told the reporters the following day that...&quot; is tokenized to &quot;bivnc...&quot; instead of &quot;bivnnc...&quot;.</Paragraph> </Section> class="xml-element"></Paper>