XML Viewer - w98-0507

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0507_metho.xml
Size: 27,847 bytes
Last Modified: 2025-10-06 14:15:06
<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0507">
  <Title>I ! I I</Title>
  <Section position="4" start_page="58" end_page="59" type="metho">
    <SectionTitle>
2 A dependency formalism
</SectionTitle>
    <Paragraph position="0"> The basic idea of dependency is that the syntactic structure of a sentence is described in terms of binary relations (dependency relations) on pairs of words, a head (or parent), and a dependent (daughter), respectively; these relations form a tree, the dependency tree. In this section we introduce a formal dependency system, which expresses the syntactic knowledge through dependency rules. The grammar and the lexicon coindde, since the rules are lexicalized: the head of the rule is a word of a certain category, namely the lexical anchor. The formalism is a shnplified version of (Lombardo and Lesmo, 1998); we have left out the treatment of long-distance dependencies to focus on the subcategorization knowledge, which is to be represented in a hierarchy.</Paragraph>
    <Paragraph position="1"> A dependency grammar is a five-tuple &lt;W,C,S,D, H&gt;, where W is a finite set of words of a natural language; C is a finite set of syntactic categories; S is a non-empty set of categories (S _C C) that can act as head of a sentence; D is the set of dependency relations, for instance SUB J, OBJ, XCOMP, P-OB3, PRED; H is a set of dependency rules of the form</Paragraph>
    <Paragraph position="3"> 3) an dement &lt;rjYj&gt; is a d-pair (which describes a dependent); the sequence of d-pairs, ineluding the special symbol # (representing the linear position of the head), is called the d-pair sequence. We have that</Paragraph>
    <Paragraph position="5"> Intuitively, a dependency rule constrains one node (head) and its dependents in a dependency tree: the d-pair sequence states the order of elements, both the head (# position) and the dependents (d-pairs).</Paragraph>
    <Paragraph position="6"> The grammar is lexicalized, because each dependency rule has a lexieal anchor in its head (z:X).</Paragraph>
    <Paragraph position="7"> A d-pair &lt;riYi&gt; identifies a dependent of category Yi, connected with the head via a dependency relation rl.</Paragraph>
    <Paragraph position="8"> As an example, consider the grammar 2: G--&lt; W : {gli, un, amici, eroe, lo, credevano} 2We use Italian terms to label grammatical relations see table 1. Since subcategorization frames are language- dependent, we prefer to avoid confusions due to different terminology across languages. For example, the relation Termine - see the caption of figure 4 - actually corresponds to the indirect object in English. However l-Objundergoes the double accusative transformation into Obj, while Termine does not.</Paragraph>
    <Paragraph position="9">  credevano un eroe, &amp;quot;The friends considered him a hero&amp;quot;, given the grammar G. The word order is indicated by the numbers 1, 2,... associated with the nodes - am/c/, ~riend', is a left dependent of the head, as it precedes the head in the linear order of the input string, eroe, &amp;quot;hero', is a right dependent.</Paragraph>
    <Paragraph position="11"> where H includes the following dependency rules:  I. gli: DETERM (#); 2. un: DETERM (#); 3. amici: NOUN (&lt;SPEC DETERM&gt; #); 4. eroe: NOUN (&lt;SPEC DETERM&gt; #); 5. lo: PRON (#); 6. credevano: VERB (&lt;SOGG NouN&gt; &lt;OGG</Paragraph>
    <Paragraph position="13"> By applying the rules of the grammar, we obtain the dependency tree in-figure 1 .for the sentence Gli arnici lo credevano un eroe, '~he friends considered him a hero&amp;quot;.</Paragraph>
  </Section>
  <Section position="5" start_page="59" end_page="61" type="metho">
    <SectionTitle>
3 A hierarchy of subcategories
</SectionTitle>
    <Paragraph position="0"> The formalization of dependency grammar illustrated above, like all lexicalizations, suffers from the problem of redundancy of the syntactic knowledge.</Paragraph>
    <Paragraph position="1"> In fact, for each w E W, a different rule for each configuration of the dependents for which w can act as a head must be included in the lexicon. Some tool is required to represent lexical information in a compact and perspicuous way. We propose to remedy the problem of redundancy by using a hierarchy of subcategorization frames.</Paragraph>
    <Section position="1" start_page="59" end_page="60" type="sub_section">
      <SectionTitle>
3.1 A basic hierarchy
</SectionTitle>
      <Paragraph position="0"> The description of the dependency rules is given on the basis of a hierarchy of subcategories, each of which has a subcategorization frame associated 3 Each subcategorization frame is, in turn, a compact representation of a set of dependency rules. The forreal definition of the hierarchy is the following.</Paragraph>
      <Paragraph position="1"> A subcategorization hierarchy is a 6-tuple &lt;T, L, D, Q,F, --&lt;r&gt;, where: T is a finite set of subcategorie.r, L is a mapping between W (the words, defined in the Sin this paper we focus our attention to verbal subcategorization frames.</Paragraph>
      <Paragraph position="2"> grammar) and sets of subcategories, L : W --~ 2 T{}. That is, each word can &amp;quot;belong&amp;quot; to one or mo- null re subcategories; D is a set of dependency relations (as in section 2); Q is a set of subcategorization frames. Each subcategorization frame is a total mapping q : D -4 Rx 2 T, where R is the set of pairs of natural numbers &lt;nl,n~&gt; such that nl _&gt; 0,n2 _&gt; 0 and nl ~ n2; F is a bijection between subcategories and subcatvgorization frames, F : T -4 Q; --T is an ordering relation among subcategories. -- null In order to define _&lt;T, we need some notation: N~(d), where q E Q and d E D, is the first element of q(d), i.e. the number restr/ct/ons associated with the relation d in the subcategorization frame q.</Paragraph>
      <Paragraph position="3"> Vq (d), where q E Q and d E D, is the second dement of q(d), i.e. the value restrictions associated with In the relation d in the subcategorization frame q. tuitively, Nq(d) is the number of times the dependency relation d can be instantiated according to the subcategorization frame q; Vq (d) is the set of subcategories that can be in relation d with a subcategory having q as a subcategorization frame.</Paragraph>
      <Paragraph position="4"> Let _&lt;a, be an order relation of number restrictions; given two pairs of natural numbers R, and R2, R, &lt; R,, R2 iff</Paragraph>
      <Paragraph position="6"> namely, the range RI is inside the range R2.</Paragraph>
      <Paragraph position="7"> Let -&lt;av be an order relation of value restrictions; given two sets of subcategories V\] and V2, V~ _&lt;av V~ iff V~ C_ V2 Now, we can say that, for each h, t~ E T:</Paragraph>
      <Paragraph position="9"> The relation --&lt;T is a partial order on T. If we assume the existence of a most general element TOP, it can act as the root of a hierarchy defined on -----r.</Paragraph>
      <Paragraph position="10"> In the definitions above, each subcategory in the hierarchy defined by _&lt;r is associated, through F, with a subcategorization frame. So, through L and F, each word in the lexicon is associated with one or more subcategorization frames. Actually, lexical ambiguity is due to L since F is a bijection.</Paragraph>
      <Paragraph position="11"> In the rest of this section we show that each subcategorization frame q defines a set of dependency rules, in the sense nsed in section 2 for the formal definition of the grammar. In this way, we get that the hierarchy specifies a correspondence between words and rules. Moreover, we show that the hierarchy acts as a taxonomy: given that rules(t,) C H is the set of dependency rules whose head is the syntactic category t,, we have that</Paragraph>
      <Paragraph position="13"> In order to specify the correspondence between sub-categorization frames and dependency rules, we first</Paragraph>
      <Paragraph position="15"> Given a subcategorization frame q and a relation d, Depq(d) is the set of all multisets of pairs &lt; d, t &gt;, where t is a subcategory E Vq(d). The multisets come from the fact that the same relation can be instantiated many times (depending on the range).</Paragraph>
      <Paragraph position="16"> In order to compute the sets of dependency relations that the subcategorization frame includes, we form the cartesian product of the various Depe(d):</Paragraph>
      <Paragraph position="18"> and we evaluate the union of each member of Carte; each of them is extended by including the special</Paragraph>
      <Paragraph position="20"> where the union is a mukiset union, preserving duplications. Finally, by picking all the permutations of each member of DepSetC/, we get the set of rules (also called subcategorization patterns): Rulesq = {rJ r E Permute(m) A m 6 DepSetq} An example should make clear how the above definitions work. Let's assume that</Paragraph>
      <Paragraph position="22"> &lt;099, &lt;&lt;0, z&gt;, {N, C}&gt;&gt;, &lt;compL &lt;&lt;0, 2&gt;, {P}&gt;&gt;} (where C is short for CHESUB - subordinating conjunction - and P for PREP).</Paragraph>
      <Paragraph position="23">  Then we have:</Paragraph>
      <Paragraph position="25"> {}, &gt;, .</Paragraph>
      <Paragraph position="26"> &lt;{ &lt;sogg, N&gt;}, {}, {&lt;compl, P&gt;, &lt;compl, P&gt; } &gt;, &lt;{&lt;sogg, N&gt;}, {&lt;ogg,N&gt;}, {} &gt;, &lt;{&lt;8ogg, N&gt;}, {&lt;ogg, N&gt;}, {&lt;compt, P&gt;} &gt;, &lt;{&lt;8og9, Jr&gt;}, {&lt;og~, N&gt;}, { &lt;compz, P&gt;, &lt;compl, P&gt;} &gt;, &lt;{&lt;8o9g, N&gt;}, {&lt;ogg, c&gt;}, {} &gt;, &lt;{&lt;8099, N&gt;}, {&lt;o~9, C&gt;}, {&lt;compZ, P&gt;} &gt;, &lt;{ &lt;so99, N&gt;}, {&lt;o99, C&gt;}, {&lt;compZ, P&gt;, &lt;compl, P&gt; } &gt;} DepSetq = { {&lt;,ogg, N&gt;, #}, { &lt;sogg, N &gt; , &lt;oom~, P&gt; , #}, {&lt;#og 9, N&gt;, &lt;compS, P&gt;, &lt;compl, P&gt;, #}, {&lt;sogg, N&gt;, &lt;ogg, N&gt;, #}, { &lt;aogg, N&gt; , &lt;ogg, N&gt; , &lt;compl, P&gt; , #}, {&lt;aogg, N&gt;, &lt;ogg, N&gt;, &lt;compl, P&gt;, &lt;compl, P&gt;, #}, { &lt;aogg, N&gt;, &lt;ogg, C&gt; , #}, {&lt;aogg, N&gt;, &lt;ogg, C&gt;, &lt;compl, P&gt;, #}, { &lt;.oqg, N&gt; , &lt;ogg, C&gt; , &lt;~,.W, P&gt; , &lt;com~, P&gt; , #}  If we take all the permutations of the various subsets, we finally obtain the rules. So that if we have</Paragraph>
      <Paragraph position="28"> we obtain dependency rules of the form in the previous section: to apron9 : ttsr(&lt;sogg, N&gt; ~) to spron 9 : tzsT(# &lt;so99,N&gt;) to sprong : tz3?( &lt;sog9 , N&gt; &lt;compl, PREP&gt; #) to aprong : t,av( &lt;sogg, N&gt; # &lt;comp/, PREP&gt;) This procedure has the goal of mapping the subcategorization frames onto the dependency rules. In the actual practice, the frames are not multiplied out before processing (for instance, exactly 200 rules would be generated for our very simple example). Processing issues will be sketched in section 4.</Paragraph>
    </Section>
    <Section position="2" start_page="60" end_page="61" type="sub_section">
      <SectionTitle>
3.2 Ordering among dependents
</SectionTitle>
      <Paragraph position="0"> The hierarchy, and in particular the subcategorization frames, does not enforce a specific ordering among dependents of the same head. We propose an extension of the formalism that prevents some permutations of the rules from being generated. The definition of subcategorization frame is modified in the following way: Q is a set of ordered snbcategorization frames. Each of them is a pair consisting of a subcategorization frame and a set of ordering constraints.</Paragraph>
      <Paragraph position="1"> Vq E Q \[q :&lt;&lt;D ~ R x 2T&gt; x20&gt;\], where 1t is as before and O is a set of pairs &lt;dl,dz&gt; where d,,d2 e DU{#}.</Paragraph>
      <Paragraph position="2"> The pairs in O define a partial order on the relative positions of the dependency relations and the head. If both dl and d2 are members of D, the constraint specifies that the dependent whose grammatical relation is d, (if any) must precede linearly the dependent whose grammatical relation is d2 (if any). If the first (second) member of the constraint is #, it is specified that the dependent whose grammatical relation is d2 (dl respectively), if any, must follow (precede) the head. The &amp;quot;if any&amp;quot; clauses say that in all cases where one of the two elements is optionally present (minimum of the range equal to</Paragraph>
      <Paragraph position="4"> 0), the constraint is assumed to be respected in case the number of actual instantiations is 0.</Paragraph>
      <Paragraph position="5"> The ordering relation is transitive, namely:</Paragraph>
      <Paragraph position="7"> We require that the set of ordering constraints O= associated with any subcategorization frame be consistent: null '0 for at e, e D u {4#}, &lt;e,, e,&gt;C/ Of b)/or at e,, e~ e D U {#}, il &lt;e~, e~&gt;e Of men &lt;e~,e~&gt;C/ Of Finally, we modify the -~T relation (which defines the hierarchy):</Paragraph>
      <Paragraph position="9"> This corresponds to the requirement that a sub-category tl, which is more specific than t2, does not have looser constraints on linear order than t2 has.</Paragraph>
      <Paragraph position="10"> If we refer to our previous example, a possible Oq is {&lt;sogg, #&gt;, &lt;#,ogg&gt;}, specifying that the sub-ject must precede the verbal head, which, in turn, must precede the direct object. If each p~mutation in Rulesq is checked to verify if it satisfies the constraints, then only 40 rules are left, corresponding to the possible (free) positions of the (0 to 2) complements. null</Paragraph>
    </Section>
    <Section position="3" start_page="61" end_page="61" type="sub_section">
      <SectionTitle>
3.3 Inheritance
</SectionTitle>
      <Paragraph position="0"> We briefly mention here a notational convention which is useful to simplify the description of the sub-categorization frames; this convention is widespread in almost all taxonomic hierarchies. For details about inheritance we remind to the extensive literature on semantic networks, frames and description logics (Nebel, 1990).</Paragraph>
      <Paragraph position="1"> We define: tl &lt;T t2 iff tl ~T t2 A -,(t2 ~--T tl) If we define in the same way &lt;R,~ and &lt;R~, it is easy to verify that:</Paragraph>
      <Paragraph position="3"> subcategory, there must be a differentia keeping them apart. This enables us to represent tl as</Paragraph>
      <Paragraph position="5"> identify t~ from tl, and Dif/(tl, t~) is a notation for specifying the difference between the constraints associated with tt, and the ones associated with t2.</Paragraph>
      <Paragraph position="6"> So, we can say that the constraints associated with t, are determined as the composition of the ones inher/ted from t2 and the ones specified locally (the differentia) for tl.</Paragraph>
      <Paragraph position="7"> Graphically, an arc from t2 to t, represents the subsumption relation (P~ef(t2) in previous terms), parsimoniously represented by the immediate ancestor.</Paragraph>
      <Paragraph position="8"> We show in figure 2 an example of subsumption between two subcategories, t\]u - corresponding to the subcategorization frame q shown in the example of paragraph 3.1 - and tlso.</Paragraph>
      <Paragraph position="9"> For the sake of clarity, we show the subcategorization frame associated with t137 with a graph. In tlso (subsumed by t137), we specify the local constraint restrictions: the number restrictions of eGG become \[1,1\], and those of COMPL become \[0, 0\]. Moreover, the value restrictions of OGG become {N} (CHESUB is ruled out). By inheriting the constraints of t,s7 and restricting them locally, we obtain that tlso requires an obligatory nominal subject and an obligatory nominal object, and cannot have any complement. The order constraints - not shown in the figure - are also inherited in the obvious way.</Paragraph>
      <Paragraph position="10"> A more significative example is in figure 4, that we will describe in section 5.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="61" end_page="61" type="metho">
    <SectionTitle>
4 Parsing issues
</SectionTitle>
    <Paragraph position="0"> Computational desiderata point towards a processing model that is input-driven, predictive, and able to prune the parsing space as early as possible.</Paragraph>
    <Paragraph position="1"> In this section, we propose an Earley-type parsing model with left-corner filtering 4 The parser goes left-to-fight and builds a structure that is always connected, by hypothesizing templates for the lexical items which are predicted but not yet encountered in the input. It uses the information in the 4The basis of our work is (Lombardo and Lesmo, 1996), where the authors present an Earley-type recognizer for dependency grammar, and propose the compilation of dependency rules into parse tables.</Paragraph>
    <Paragraph position="2"> hierarchy, by descending from the top class towards more specific classes. The descent is motivated by the fact that lower subcategories provide stronger constraints. It is possible to specify a procedure described in (Barbero, 1998) - that consults the hierarchy just one time, in a compilation phase (during parsing it would be very time-consuming), and builds a parse table that guides the parser moves. In the following we give an intuitive description of the algorithm by assuming the dependency tree as data structure instead of the sets of items that characterize Earley's parsing style.</Paragraph>
    <Paragraph position="3"> Initially, the parser guesses the presence of a node of a root category in the dependency tree. Then, given a node n associated to the subcategory t and a word w, the parser can perform three types of action:  PREDICTION, SCANNING and COMPLETION.</Paragraph>
    <Paragraph position="4"> 1. Prediction: the parser guesses the presence of the dependents of n (by using left-corner information), given the constraints of the subcategory t of n. When the parser analyses a dependent which is distinctive for a possible speciaiization from the subcategory t to one of its children in the hierarchy, tl replaces t as the subcategory of n (for instance, if a direct object is hypothesized, we can directly descend from VERB to VERB-TRANS).</Paragraph>
    <Paragraph position="5"> 2. Scanning: the parser scans the head word of n (the word w in the input). The subcategory of w must be in the subtree rooted by t (including t itself). The left dependents of n that have been hypothesized in the prediction phase must fulfill the specific requirements imposed by the subcategory of the head (otherwise, the path is abandoned).</Paragraph>
    <Paragraph position="6"> 3. Completion: when the node n is &amp;quot;complete&amp;quot;, namely all the dependents required by the sub-category t have been found, the next elements  of the string can be analysed as dependents of the father node of n. If n has no father, i.e. it is the root of the dependency tree, and the end of the input string has been reached, the analysis ends successfully.</Paragraph>
    <Paragraph position="7"> For example, the analysis of the sentence Gli amid lo credevano un eroe, &amp;quot;The friends considered him a hero&amp;quot;, begins with the creation of a verbal root template (figure 3, &amp;quot;Initiaiization~). The first word in the input string is a determiner (Gli, &amp;quot;the ~). A determiner can be the left-corner of a nominal group, so a prediction phase on the root node hypothesizes a left dependent of category NOUN labelled as subject (SoGG) 5. The control goes to this node, from which a left dependent of category Determ is hypothesized.</Paragraph>
    <Paragraph position="8">  here only one (non-deterministic) analysis path.</Paragraph>
    <Paragraph position="9"> This last one is associated with the input word Gi/, Uthe'. The control returns to the node of category NOUN, that is associated with the next word amid, &amp;quot;friends ~. The node of category NOUN can be considered &amp;quot;complete ~ (no other dependent is required), and the control goes back to the root node.</Paragraph>
    <Paragraph position="10"> At this point, the pronoun/o, Uhim~, is read in input. A direct object is hypothesized and associated with it. A specialization from the top of the hierarchy to the subcategory Of transitive verbs is possible: we know, in fact, that the root verb must be transitive, because a direct object has been hypothesized. The word credevano (&amp;quot;considered&amp;quot;) is then read in input, and it is associated with the root node (scanning phase). Suppose that the verb credere, ~consider r, belongs to a class V-TR that requires a nominal subject (the hypothesis on the left dependent amid comes out to be correct), an object and a predicative complement.</Paragraph>
    <Paragraph position="11"> The next input word, un, &amp;quot;a ~, is a determiner.</Paragraph>
    <Paragraph position="12"> Again, a nominal group is hypothesized, composed by a noun, playing the role of predicative complement, and a dependent of the noun, that is of category Determ and is associated with the word un.</Paragraph>
    <Paragraph position="13"> The next input word, eroe, &amp;quot;hero', is associated with the node playing the role of predicative complement.</Paragraph>
    <Paragraph position="14"> The completion phase ends successfully the analysis of the sentence, as all the dependents required by the verb credevano (subject, object and predicative complement) have been found in the input sentence.</Paragraph>
  </Section>
  <Section position="7" start_page="61" end_page="64" type="metho">
    <SectionTitle>
5 The classification of 101 Italian
</SectionTitle>
    <Paragraph position="0"> verbs In investigating the empirical properties of a hierarchicai grammar two issues must be addressed: the linguistic adequacy of the classification, and the parsimony of the hierarchy. We present some quantitative analyses of a corpus, showing that the proposed hierarchy reduces considerably the redundancy of a grammar for naturally occurring texts, while at the same time being sufficiently fine-grained to represent even very idiosyncratic items.</Paragraph>
    <Paragraph position="1"> The hierarchy we propose encodes 101 Italian verbs taken from the grammar of Italian (Renzi, 1988) as the most representative of the main structures of Italian.</Paragraph>
    <Section position="1" start_page="61" end_page="63" type="sub_section">
      <SectionTitle>
5.1 Materials and Method
</SectionTitle>
      <Paragraph position="0"> The main sources of information used to carry out the classification are: (Renzi, 1988)'s Italian grammar, (Palazzi and Folena, 1992)'s Italian dictionary, and an Italian corpus of about 500 000 words. The corpus includes dally newspapers articles (367578 words), scientific dissertations (40013), young students compositions (27531), Verga's novels (12905), short news reports (6757), stories and various texts (5012). It is a varied corpus, representative of sev- null eral literary genres of written Italian.</Paragraph>
      <Paragraph position="1"> The information required by our formalism -- the grammatical relations associated to the dependents, their number (Nq(d)) and the set of categories (Vq(d)) that can realize them -- was partly obtained by consulting Italian dictionaries, partly based on native speakers intuitions, and mostly from the analysis of the corpus.</Paragraph>
      <Paragraph position="2"> All the sentences containing the verbs under analysis were automatically extracted from the corpus, and the subcategorization patterns (rules) exhibited by the verbs in those sentences were manually coldeg lected.</Paragraph>
      <Paragraph position="3"> We represented the set of subcategorization patterns (rules) as subcategorization frames, by associating with each grammatical relation - according to the formalism - the related number (Nv(d)) and value (Vq(d)) restrictions computed on the corpus. In this test, we have kept the order between the dependents of a verb free, so there are no ordering constraints. Each class tt is connected to its supexclass t2.</Paragraph>
      <Paragraph position="4"> Diff(tl, t2), the difference between the constraints associated with tl and the ones associated with t.~, is expressed by specifying, for each relation that is restricted from t2 to tl, the relation itself with the new number and value restrictions.</Paragraph>
    </Section>
    <Section position="2" start_page="63" end_page="63" type="sub_section">
      <SectionTitle>
5.2 Hierarchy
</SectionTitle>
      <Paragraph position="0"> Figure 4 illustrates a small portion of the resulting hierarchy. This hierarchy is based on the dependency relations for a generic Italian verb summarized in Table 1 s.</Paragraph>
      <Paragraph position="1"> 6Usually the adjuncts are not indicated as part of the sub- categorization frames of the verbs: they are not obligatorily required by the verbs themselves. We have specified them anyway, as the hierarchy represents the grammar - which  includes all the information about the dependents, adjuncts included. Moreover, by specifying the information about the adjuncts at the top level, we maintain the clarity of the representation and the mapping on the formal grammar.</Paragraph>
      <Paragraph position="2"> The whole hierarchy has 6 levels: the top level (class VERB) represents the general constraints for Italian verbs, the top+l level distinguishes the constraints for impersonal(V1), intransitive (VERB-INTR) and transitive (VERB-TRANS) verbs, the top+2, top+3, etc. levels represent specific classes of verbs (from V2 to VS0).</Paragraph>
    </Section>
    <Section position="3" start_page="63" end_page="64" type="sub_section">
      <SectionTitle>
5.3 Results
</SectionTitle>
      <Paragraph position="0"> The graph in figure 5 shows the distribution of verbs by type, namely how the number of verbs covered by the classes grows in relation to the number of classes.</Paragraph>
      <Paragraph position="1"> We can see that the first (more common) class covers 15 verbs, the first and second more common classes together covers 26 verbs, etcetera. With the first 9 classes we cover 63 verbs, giving rise to a reduction of 85.7% compared to having a distinct subcategorization frame for each verb. With the first 18 classes we cover 81 verbs (reduction of 77.7%). The whole set of verbs requires, however, 50 classes (reduction of 50.5%): in fact, we have found many verbs with very idiosyncratic behaviours.</Paragraph>
      <Paragraph position="2"> Table 2 shows the distribution of verbs by token (sum of the occurrences, in the corpus, of all the verbs referring to each class), level by level. The fact that some rare classes occur is interesting if compared to the percentage of reduction in the representation. There is a compression of 55,7%, while still taking care of very low frequency patterns, where compression is almost 0%.</Paragraph>
      <Paragraph position="3"> In Table 3, we show, for each level, the number of subcategorization patterns represented by all the classes of that level, namely the sum of the patterns of each class at that level. The number of patterns decreases rapidly by d~,cending the hierarchy.</Paragraph>
      <Paragraph position="4"> The representation of the syntactic knowledge concerning adjuncts is currently a research goal. Most authors tend to avoid it in the representation of subcategorlzation frames see (Hudson, 1990) and the &amp;quot;adjoining&amp;quot; operation in LTAG (Josh |and Schabes, 1996).</Paragraph>
      <Paragraph position="5">  For the patterns found in the texts, we observe a decrease similar but less marked than the grammatical patterns. Even the more specific classes describe a good portion of the patterns in the texts, so confirming the usefulness of very specific information in the analysis.</Paragraph>
      <Paragraph position="6"> Table 2 show this point more clearly. The lower, more specific levels, while having fewer classes, still cover many occurrences of verbs in the text.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML