File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-1059_metho.xml
Size: 16,994 bytes
Last Modified: 2025-10-06 14:08:20
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1059"> <Title>Learning the Countability of English Nouns from Corpus Data</Title> <Section position="4" start_page="0" end_page="3" type="metho"> <SectionTitle> 3 Resources </SectionTitle> <Paragraph position="0"> Information about noun countability was obtained from two sources. One was COMLEX 3.0 (Grishman et al., 1998), which has around 22,000 noun entries. Of these, 12,922 are marked as being countable (COUNTABLE) and 4,976 as being uncountable (NCOLLECTIVE or :PLURAL *NONE*). The remainder are unmarked for countability.</Paragraph> <Paragraph position="1"> The other was the common noun part of ALT-J/E's Japanese-to-English semantic transfer dictionary (Bond, 2001). It contains 71,833 linked Japanese-English pairs, each of which has a value for the noun countability preference of the English noun. Considering only unique English entries with different countability and ignoring all other information gave 56,245 entries. Nouns in the ALT-J/E dictionary are marked with one of the five major countability preference classes described in Section 2. In addition to countability, default values for number and classifier (e.g. blade for grass: blade of grass) are also part of the lexicon.</Paragraph> <Paragraph position="2"> We classify words into four possible classes, with some words belonging to multiple classes. The first class is countable: COMLEX's COUNTABLE and ALT-J/E's fully, strongly and weakly countable. The second class is uncountable: COMLEX's NCOLLECTIVE or :PLURAL *NONE* and ALT-J/E's strongly and weakly countable and uncountable.</Paragraph> <Paragraph position="3"> The third class is bipartite nouns. These can only be plural when they head a noun phrase (trousers), but singular when used as a modifier (trouser leg).</Paragraph> <Paragraph position="4"> When they are denumerated they use pair: a pair of scissors. COMLEX does not have a feature to mark bipartite nouns; trouser, for example, is listed as countable. Nouns in ALT-J/E marked plural only with a default classifier of pair are classified as bipartite. The last class is plural only nouns: those that only have a plural form, such as goods. They can neither be denumerated nor modified by much. Many of these nouns, such as clothes, use the plural form even as modifiers (a clothes horse). The word clothes cannot be denumerated at all. Nouns marked :SINGULAR *NONE* in COMLEX and nouns in ALT-J/E marked plural only without the default classifier pair are classified as plural only. There was some noise in the ALT-J/E data, so this class was handchecked, giving a total of 104 entries; 84 of these were attested in the training data.</Paragraph> <Paragraph position="5"> Our classification of countability is a subset of ALT-J/E's, in that we use only the three basic ALT-J/E classes of countable, uncountable and plural only, (although we treat bipartite as a separate class, not a subclass). As we derive our countability classifications from corpus evidence, it is possible to reconstruct countability preferences (i.e. fully, strongly, or weakly countable) from the relative token occurrence of the different countabilities for that noun.</Paragraph> <Paragraph position="6"> In order to get an idea of the intrinsic difficulty of the countability learning task, we tested the agreement between the two resources in the form of classification accuracy. That is, we calculate the average proportion of (both positive and negative) countability classifications over which the two methods agree. E.g., COMLEX lists tomato as being only countable where ALT-J/E lists it as being both countable and uncountable. Agreement for this one noun, therefore, is 4, as there is agreement for the classes of countable, plural only and bipartite (with implicit agreement as to negative membership for the latter two classes), but not for uncountable. Averaging over the total set of nouns countability-classified in both lexicons, the mean was 93.8%. Almost half of the disagreements came from words with two countabilities in ALT-J/E but only one in COMLEX.</Paragraph> </Section> <Section position="5" start_page="3" end_page="3" type="metho"> <SectionTitle> 4 Learning Countability </SectionTitle> <Paragraph position="0"> The basic methodology employed in this research is to identify lexical and/or constructional features associated with the countability classes, and determine the relative corpus occurrence of those features for each noun. We then feed the noun feature vectors into a classifier and make a judgement on the membership of the given noun in each countability class.</Paragraph> <Paragraph position="1"> In order to extract the feature values from corpus data, we need the basic phrase structure, and particularly noun phrase structure, of the source text. We use three different sources for this phrase structure: part-of-speech tagged data, chunked data and fullyparsed data, as detailed below.</Paragraph> <Paragraph position="2"> The corpus of choice throughout this paper is the written component of the British National Corpus (BNC version 2, Burnard (2000)), totalling around 90m w-units (POS-tagged items). We chose this because of its good coverage of different usages of English, and thus of different countabilities. The only component of the original annotation we make use of is the sentence tokenisation.</Paragraph> <Paragraph position="3"> Below, we outline the features used in this research and methods of describing feature interaction, along with the pre-processing tools and extraction techniques, and the classifier architecture.</Paragraph> <Paragraph position="4"> The full range of different classifier architectures tested as part of this research, and the experiments to choose between them are described in Baldwin and Bond (2003).</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.1 Feature space </SectionTitle> <Paragraph position="0"> For each target noun, we compute a fixed-length feature vector based on a variety of features intended to capture linguistic constraints and/or preferences associated with particular countability classes. The feature space is partitioned up into feature clusters, each of which is conditioned on the occurrence of the target noun in a given construction.</Paragraph> <Paragraph position="1"> Feature clusters take the form of one- or two-dimensional feature matrices, with each dimension describing a lexical or syntactic property of the construction in question. In the case of a one-dimensional feature cluster (e.g. noun occurring in singular or plural form), each component feature feats in the cluster is translated into the 3-tuple:</Paragraph> <Paragraph position="3"> In the case of a two-dimensional feature cluster (e.g. subject-position noun number vs. verb number agreement), each component feature feat s;t is translated into the 5-tuple:</Paragraph> <Paragraph position="5"> See Baldwin and Bond (2003) for further details.</Paragraph> <Paragraph position="6"> The following is a brief description of each feature cluster and its dimensionality (1D or 2D). A summary of the number of base features and prediction of positive feature correlations with countability classes is presented in Table 1.</Paragraph> <Paragraph position="7"> Head noun number:1D the number of the target noun when it heads an NP (e.g. a shaggy dog</Paragraph> </Section> </Section> <Section position="6" start_page="3" end_page="3" type="metho"> <SectionTitle> = SINGULAR) </SectionTitle> <Paragraph position="0"> Modifier noun number:1D the number of the target noun when a modifier in an NP (e.g. dog food = SINGULAR) Subject-verb agreement:2D the number of the target noun in subject position vs. number agreement on the governing verb (e.g. the dog barks = hSINGULAR,SINGULARi) Coordinate noun number:2D the number of the target noun vs. the number of the head nouns of conjuncts (e.g. dogs and mud = hPLURAL,SINGULARi) N of N constructions:2D the number of the target noun (N2) vs. the type of the N1 in an N1 of N2 construction (e.g. the type of dog = hTYPE,SINGULARi). We have identified a total of 11 N1 types for use in this feature cluster (e.g. COLLECTIVE, LACK, TEMPORAL).</Paragraph> <Paragraph position="1"> Occurrence in PPs:2D the presence or absence of a determiner ( DET) when the target noun occurs in singular form in a PP (e.g. per dog = hper, DETi). This feature cluster exploits the fact that countable nouns occur determinerless in singular form with only very particular prepositions (e.g. by bus, *on bus, *with bus) whereas with uncountable nouns, there are fewer restrictions on what prepositions a target noun can occur with (e.g. on furniture, with furniture, ?by furniture).</Paragraph> <Paragraph position="2"> Pronoun co-occurrence:2D what personal and possessive pronouns occur in the same sentence as singular and plural instances of the target noun (e.g. The dog ate its dinner = hits,SINGULARi). This is a proxy for pronoun binding effects, and is determined over a total of 12 third-person pronoun forms (normalised for case, e.g. he, their, itself ).</Paragraph> <Paragraph position="3"> Singular determiners:1D what singular-selecting determiners occur in NPs headed by the target noun in singular form (e.g. a dog = a).</Paragraph> <Paragraph position="4"> All singular-selecting determiners considered are compatible with only countable (e.g. another, each) or uncountable nouns (e.g. much, little). Determiners compatible with either are excluded from the feature cluster (cf. this dog, this information). Note that the term &quot;determiner&quot; is used loosely here and below to denote an amalgam of simplex determiners (e.g. a), the null determiner, complex determiners (e.g. all the), numeric expressions (e.g. one), and adjectives (e.g. numerous), as relevant to the particular feature cluster.</Paragraph> <Paragraph position="5"> Plural determiners:1D what plural-selecting determiners occur in NPs headed by the target noun in plural form (e.g. few dogs = few). As with singular determiners, we focus on those plural-selecting determiners which are compatible with a proper subset of count, plural only and bipartite nouns.</Paragraph> <Paragraph position="6"> Non-bounded determiners:2D what non-bounded determiners occur in NPs headed by the target noun, and what is the number of the target noun for each (e.g. more dogs = hmore,PLURALi).</Paragraph> <Paragraph position="7"> Here again, we restrict our focus to non-bounded determiners that select for singularform uncountable nouns (e.g. sufficient furniture) and plural-form countable, plural only and bipartite nouns (e.g. sufficient dogs).</Paragraph> <Paragraph position="8"> The above feature clusters produce a combined total of 1,284 individual feature values.</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.2 Feature extraction </SectionTitle> <Paragraph position="0"> In order to extract the features described above, we need some mechanism for detecting NP and PP boundaries, determining subject-verb agreement and deconstructing NPs in order to recover conjuncts and noun-modifier data. We adopt three approaches. First, we use part-of-speech (POS) tagged data and POS-based templates to extract out the necessary information. Second, we use chunk data to determine NP and PP boundaries, and mediumrecall chunk adjacency templates to recover interphrasal dependency. Third, we fully parse the data and simply read off all necessary data from the dependency output.</Paragraph> <Paragraph position="1"> With the POS extraction method, we first Penntagged the BNC using an fnTBL-based tagger (Ngai and Florian, 2001), training over the Brown and WSJ corpora with some spelling, number and hyphenation normalisation. We then lemmatised this data using a version of morph (Minnen et al., 2001) customised to the Penn POS tagset. Finally, we implemented a range of high-precision, low-recall POS-based templates to extract out the features from the processed data. For example, NPs are in many cases recoverable with the following Perl-style regular expression over Penn POS tags: (PDT)* DT (RB|JJ[RS]?|NNS?)* NNS? [^N].</Paragraph> <Paragraph position="2"> For the chunker, we ran fnTBL over the lemmatised tagged data, training over CoNLL 2000style (Tjong Kim Sang and Buchholz, 2000) chunkconverted versions of the full Brown and WSJ corpora. For the NP-internal features (e.g. determiners, head number), we used the noun chunks directly, or applied POS-based templates locally within noun chunks. For inter-chunk features (e.g. subject-verb agreement), we looked at only adjacent chunk pairs so as to maintain a high level of precision.</Paragraph> <Paragraph position="3"> As the full parser, we used RASP (Briscoe and Carroll, 2002), a robust tag sequence grammar-based parser. RASP's grammatical relation output function provides the phrase structure in the form of lemmatised dependency tuples, from which it is possible to read off the feature information. RASP has the advantage that recall is high, although precision is potentially lower than chunking or tagging as the parser is forced into resolving phrase attachment ambiguities and committing to a single phrase structure analysis.</Paragraph> <Paragraph position="4"> Although all three systems map onto an identical feature space, the feature vectors generated for a given target noun diverge in content due to the different feature extraction methodologies. In addition, we only consider nouns that occur at least 10 times as head of an NP, causing slight disparities in the target noun type space for the three systems. There were sufficient instances found by all three systems for 20,530 common nouns (out of 33,050 for which at least one system found sufficient instances).</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.3 Classifier architecture </SectionTitle> <Paragraph position="0"> The classifier design employed in this research is four parallel supervised classifiers, one for each countability class. This allows us to classify a single noun into multiple countability classes, e.g. demand is both countable and uncountable. Thus, rather than classifying a given target noun according to the unique most plausible countability class, we attempt to capture its full range of countabilities.</Paragraph> <Paragraph position="1"> Note that the proposed classifier design is that which was found by Baldwin and Bond (2003) to be optimal for the task, out of a wide range of classifier architectures.</Paragraph> <Paragraph position="2"> In order to discourage the classifiers from over-training on negative evidence, we constructed the gold-standard training data from unambiguously negative exemplars and potentially ambiguous positive exemplars. That is, we would like classifiers to judge a target noun as not belonging to a given countability class only in the absence of positive evidence for that class. This was achieved in the case of countable nouns, for instance, by extracting all countable nouns from each of the ALT-J/E and COM-LEX lexicons. As positive training exemplars, we then took the intersection of those nouns listed as countable in both lexicons (irrespective of membership in alternate countability classes); negative training exemplars, on the other hand, were those contained in both lexicons but not classified as count- null was constructed in a similar fashion. We used the ALT-J/E lexicon as our source of plural only and bipartite nouns, using all the instances listed as our positive exemplars. The set of negative exemplars was constructed in each case by taking the intersection of nouns not contained in the given countability class in ALT-J/E, with all annotated nouns with non-identical singular and plural forms in COMLEX.</Paragraph> <Paragraph position="3"> Having extracted the positive and negative exemplar noun lists for each countability class, we filtered out all noun lemmata not occurring in the BNC.</Paragraph> <Paragraph position="4"> The final make-up of the gold-standard data for each of the countability classes is listed in Table 2, along with a baseline classification accuracy for each class (&quot;Baseline&quot;), based on the relative frequency of the majority class (positive or negative).</Paragraph> <Paragraph position="5"> That is, for bipartite nouns, we achieve a 99.4% classification accuracy by arbitrarily classifying every training instance as negative.</Paragraph> <Paragraph position="6"> The supervised classifiers were built using TiMBL version 4.2 (Daelemans et al., 2002), a memory-based classification system based on the k-nearest neighbour algorithm. As a result of extensive parameter optimisation, we settled on the default configuration for TiMBL with k set to 9. 2</Paragraph> </Section> </Section> class="xml-element"></Paper>