File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1059_intro.xml

Size: 5,221 bytes

Last Modified: 2025-10-06 14:01:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1059">
  <Title>Learning the Countability of English Nouns from Corpus Data</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"> Grammatical countability is motivated by the semantic distinction between object and substance reference (also known as bounded/non-bounded or individuated/non-individuated). It is a subject of contention among linguists as to how far grammatical countability is semantically motivated and how much it is arbitrary (Wierzbicka, 1988).</Paragraph>
    <Paragraph position="1"> The prevailing position in the natural language processing community is effectively to treat countability as though it were arbitrary and encode it as a lexical property of nouns. The study of countability is complicated by the fact that most nouns can have their countability changed: either converted by a lexical rule or embedded in another noun phrase.</Paragraph>
    <Paragraph position="2"> An example of conversion is the so-called universal packager, a rule which takes an uncountable noun with an interpretation as a substance, and returns a countable noun interpreted as a portion of the substance: I would like two beers. An example of embedding is the use of a classifier, e.g. uncountable nouns can be embedded in countable noun phrases as complements of classifiers: one piece of equipment. null Bond et al. (1994) suggested a division of countability into five major types, based on Allan (1980)'s noun countability preferences (NCPs). Nouns which rarely undergo conversion are marked as either fully countable, uncountable or plural only. Fully countable nouns have both singular and plural forms, and cannot be used with determiners such as much, little, a little, less and overmuch. Uncountable nouns, such as furniture, have no plural form, and can be used with much. Plural only nouns never head a singular noun phrase: goods, scissors.</Paragraph>
    <Paragraph position="3"> Nouns that are readily converted are marked as either strongly countable (for countable nouns that can be converted to uncountable, such as cake) or weakly countable (for uncountable nouns that are readily convertible to countable, such as beer).</Paragraph>
    <Paragraph position="4"> NLP systems must list countability for at least some nouns, because full knowledge of the referent of a noun phrase is not enough to predict countability. There is also a language-specific knowledge requirement. This can be shown most simply by comparing languages: different languages encode the countability of the same referent in different ways. There is nothing about the concept denoted by lightning, e.g., that rules out *a lightning being interpreted as a flash of lightning. Indeed, the German and French translation equivalents are fully countable (ein Blitz and un 'eclair respectively). Even within the same language, the same referent can be encoded countably or uncountably: clothes/clothing, things/stuff , jobs/work.</Paragraph>
    <Paragraph position="5"> Therefore, we must learn countability classes from usage examples in corpora. There are several impediments to this approach. The first is that words are frequently converted to different countabilities, sometimes in such a way that other native speakers will dispute the validity of the new usage. We do not necessarily wish to learn such rare examples, and may not need to learn more common conversions either, as they can be handled by regular lexical rules (Copestake and Briscoe, 1995). The second problem is that some constructions affect the apparent countability of their head: for example, nouns denoting a role, which are typically countable, can appear without an article in some constructions (e.g.</Paragraph>
    <Paragraph position="6"> We elected him treasurer). The third is that different senses of a word may have different countabilities: interest &amp;quot;a sense of concern with and curiosity&amp;quot; is normally countable, whereas interest &amp;quot;fixed charge for borrowing money&amp;quot; is uncountable.</Paragraph>
    <Paragraph position="7"> There have been at several earlier approaches to the automatic determination of countability. Bond and Vatikiotis-Bateson (2002) determine a noun's countability preferences from its semantic class, and show that semantics predicts (5-way) countability 78% of the time with their ontology.</Paragraph>
    <Paragraph position="8"> O'Hara et al. (2003) get better results (89.5%) using the much larger Cyc ontology, although they only distinguish between countable and uncountable.</Paragraph>
    <Paragraph position="9"> Schwartz (2002) created an automatic countability tagger (ACT) to learn noun countabilities from the British National Corpus. ACT looks at determiner co-occurrence in singular noun chunks, and classifies the noun if and only if it occurs with a determiner which can modify only countable or uncountable nouns. The method has a coverage of around 50%, and agrees with COMLEX for 68% of the nouns marked countable and with the ALT-J/E lexicon for 88%. Agreement was worse for uncountable nouns (6% and 44% respectively).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML