File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/w93-0109_intro.xml

Size: 2,241 bytes

Last Modified: 2025-10-06 14:05:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="W93-0109">
  <Title>The Automatic Acquisition of Frequencies of Verb Subcategorization Frames from Tagged Corpora</Title>
  <Section position="3" start_page="95" end_page="96" type="intro">
    <SectionTitle>
2 Method
</SectionTitle>
    <Paragraph position="0"> The procedure to find verh-subcat frequencies, automatically, is as follows.</Paragraph>
    <Paragraph position="1">  (1) Make a list of verbs out of the tagged corpus. (2) For each verb on the list (the &amp;quot;target verb&amp;quot;), (2.1) Tokenize each sentence containing the target verb in the following way:  All the noun phrases except pronouns are tokenized as &amp;quot;n&amp;quot; by a noun phrase parser and all the rest of the words are also tokenized following the schmema in Table 1. For example, the sentence &amp;quot;The corresponding mental-state verbs do not follow \[target verb\] these rules in a straightforward way&amp;quot; is transformed to a sequence of tokens &amp;quot;bnvaknpne'.</Paragraph>
    <Paragraph position="2"> (2.2) Apply a set of subcat extraction rules to the tokenized sentences. These rules are written as regular expressions and they are obtained through the examination of occurrences of a small sample of verbs in a training text. Note that in the actual implementation of the procedure, all of the redundant operations are eliminated. Our NP parser also uses a finite-state grammar. It is designed  b: sentence initial maker k: target verb i: pronoun n: noun phrase v: finite verb u: participial verb d: base form verb p: preposition e: sentence final maker t: &amp;quot;to&amp;quot; m: modal w: relative pronoun a: adverb x: punctuation c: complementizer &amp;quot;that&amp;quot; s: the rest  especially to support identification of verb-subcat frames. One of its special features is that it detects time-adjuncts such as &amp;quot;yesterday&amp;quot;, &amp;quot;two months ago&amp;quot;, or &amp;quot;the following day&amp;quot;, and eliminates them in the tokenization process. For example, the sentence &amp;quot;He told the reporters the following day that...&amp;quot; is tokenized to &amp;quot;bivnc...&amp;quot; instead of &amp;quot;bivnnc...&amp;quot;.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML