Some remarks on the Annotation of Quantifying Noun Groups in
Treebanks
Kristina Spranger
Institute for Natural Language Processing (IMS), University of Stuttgart
Azenbergstraße 12
70174 Stuttgart, Germany
Kristina.Spranger@ims.uni-stuttgart.de
Abstract
This article is devoted to the problem
of quantifying noun groups in German.
After a thorough description of the phe-
nom ena, the results of corpus-based in-
vestigations are described. Moreover,
some examples are given that under-
line the necessity of integrating some
kind of information other than gram-
mar sensu stricto into the treebank.
We argue that a more sophisticated
and fine-grained annotation in the tree-
bank would have very positve effects on
stochastic parsers trained on the tree-
bank and on grammars induced from
the treebank, and it would make the
treebank more valuable as a source of
data for theoretical linguistic investiga-
tions. The information gained from cor-
pus research and the analyses that are
proposed are realized in the framework
of SILVA, a parsing and extraction tool
for German text corpora.
1 Introduction
There is an increasing number of linguists in-
terested in large syntactically annotated corpora,
i.e. so-called treebanks. Treebanks consist of lan-
guage samples annotated with structural infor-
mation - centrally, the grammatical structure of
the samples, though some resources include cat-
egories of information other than grammar sensu
stricto. Data contained in treebanks are useful for
diverse theoretical and practical purposes: Com-
bining raw language data with linguistic infor-
mation offers a promising basis for the develop-
ment of new, efficient and robust natural language
processing methods. Real-world texts annotated
with different strata of linguistic information can
serve as training and test material for stochastic
approaches to natural language processing. At the
same time, treebanks are a valuable source of data
for theoretical linguistic investigations about lan-
guage use. The data-drivenness of this approach
presents a clear advantage over the traditional,
idealised notion of competence grammar.
According to Skut et al. (1997) treebanks have to
meet the following requirements:
1. descriptivity:
a0 grammatical phenomena are to be de-
scribed rather than explained;
2. theory-independence:
a0 annotations should not be influenced
by theory-specific considerations; nev-
ertheless, different theory-specific rep-
resentations should be recoverable from
the annotation;
3. multi-stratal representations:
a0 clear separation of different description
levels; and
4. data-drivenness:
a0 the annotation scheme must provide
representational means for all phenom-
ena occurring in texts.
81
The most important treebank for English nowa-
days is the Penn Treebank (cf. (Marcus et al.,
1994)). Many statistical taggers and parsers
have been trained on it, e.g. Ramshaw and Mar-
cus (1995), Srinivas (1997) and Alshawi and
Carter (1994). Furthermore, context-free and
unification-based grammars have been derived
from the Penn Treebank (cf. (Charniak, 1996)
and (van Genabith et al., 1999)). These parsers,
trained or created by means of the treebank, can
be applied for enlarging the treebank.
For German, the first initiative in the field of
treebanks was the NEGRA Corpus (cf. (Skut et
al., 1998)) which contains approximately 20.000
sentences of syntactically interpreted newspaper
text. Furthermore, there is the Verbmobil Cor-
pus (cf. (Wahlster, 2000)) which covers the area
of spoken language.
TIGER (cf. (Brants et al., 2002)) is the largest and
most exhaustively annotated treebank for Ger-
man. It consists of more than 35.000 syntactically
annotated sentences. The annotation format and
scheme are based on the NEGRA corpus. The
linguistic annotation of each sentence in TIGER
is represented on a number of different levels:
a0 part-of-speech (pos) information is encoded
in terminal nodes;
a0 non-terminal nodes are labelled with phrase
categories;
a0 the edges of a tree represent syntactic func-
tions; and
a0 secondary edges, i.e. labelled directed arcs
between arbitrary nodes, are used to encode
coordination information.
Syntactic structures are rather flat and simple in
order to reduce the potential for attachment ambi-
guities. The distinction between adjuncts and ar-
guments is not expressed in the constituent struc-
ture, but is instead encoded by means of syntactic
functions.
In this article, we attend to the question of how to
analyze and annotate quantifying noun groups in
German in a way that is most informative and in-
tuitive for treebank users, i.e., linguists as well as
NLP applications. In the following section, we
first give an overview of the different kinds of
quantitative specification in German. In the fol-
lowing section, the general structure of quantify-
ing noun groups is depicted and subclasses are
introduced. We concentrate on the description
of measure constructions and count constructions.
In the third section, we show how these construc-
tions are annotated in TIGER and NEGRA so
far and why the existing annotation is not suffi-
cient for linguistically more demanding tasks. We
make several refinement proposals for the anno-
tation scheme. A grammar induction experiment
shows the benefit that can be gained from the pro-
posed refinements. In section 5, some concluding
remarks follow.
The information gained by corpus-based research
is used in the framework of SILVA, a finite-state
based symbolic system for parsing and for the ex-
traction of deep linguistic information from unre-
stricted German text. Since, this paper presents
research that is part of the agenda for developing
SILVA, our analyses are tributary to this overall
goal. And, since our parsed output should be a
reasonable basis for linguistically more demand-
ing extraction tasks, a highly informative annota-
tion is one of our main goals.
2 Indications of Quantity - Variants of
the Quantitative Specification
In order to apprehend a non-discrete, amorphous
substance, such as Lehm (loam), as a discrete ob-
ject and therefore to make it countable, and in
order to integrate discrete entities, such as Blu-
men (flowers), in one more complex unit, there
are mainly two possibilities in German:
1. building a noun group:
(1) ein
a
Klumpen
clump
Lehm
loam
‘a clump of loam’
(2) ein
a
Strauß
bouquet
Blumen
flowers
‘a bouquet of flowers’
or
2. building a compound:
(3) ein
a
Lehmklumpen
loam clump
‘a clump of loam’
82
(4) ein
a
Blumenstrauß
flower bouquet
‘a bouquet of flowers’
.
In the context of parsing and structure annotation,
the latter possibility does not pose any problems,
whereas the first one calls for further investigation
with a view to a workable strategy concerning the
analysis and annotation of such constructions.
2.1 The Structure of the German
Quantifying Noun Group (quanNG)
The German quantifying noun group (quanNG)
is composed of a numeral (Num), a nominal con-
stituent used for quantification (quanN) and a
nominal constituent denoting the quantified ele-
ment or substance (ElemN) (cf. figure 1). Thus,
quanNG consists of a quantifying constituent
(e.g. , ein Klumpen in example 1) and a quanti-
fied constituent (e.g. , Lehm in example 1).
With respect to the nouns that can function as
the head of the quantifying constituent quanN,
we distinguish four classes of quantifying noun
groups:
1. numeral noun constructions;
2. quantum noun constructions;
3. count constructions; and
4. measure constructions.
These four main classes are subdivided into sev-
eral syntactically and semantically motivated sub-
classes (see table 1). A more detailed description
of the respective classes is presented in the fol-
lowing sections.
numeral
Num +
constituent
nominal
quantifying
quanN +
constituent
nominal
quantified
ElemN =
groupnoun
quantifying
quanNG
Figure 1: Structure of a Quantifying Noun Group
2.2 Four Classes of Quantifying Noun
Groups
The quantifying aspect to the meaning of the
complex noun group quanNG is contributed by
a quantifying noun quanN. Starting from the spe-
cific nature of these quanN, we differentiate be-
tween four classes of quanNG. Depending on
whether the head noun of the quanN can only
yield a contribution to the quantitative aspect of
the complex noun phrase, as it is the case with
Kilogramm (kilogram) or Million (million), or if
the contribution to the meaning of the complex
noun phrase is of quantitative nature only in cer-
tain contexts, as it is the case with Tasse (cup), or
Schritt (step), there arise cases of ambiguity we
are faced with in the context of structure analyz-
ing and annotation.
2.2.1 Numeral Noun Constructions and
Quantum Noun Constructions
In German, in addition to the numerals, there
are some nouns such as Dutzend (dozen) or Mil-
lion (million), that bring out specific indications
of number and that do not go beyond this quanti-
tative aspect (in contrast to Paar (pair)). quanNG
with such a noun as head of quanN are called nu-
meral noun constructions.
Quantum noun constructions can be characterized
by two observations:
1. they cannot be freely combined with numer-
als; and
2. they express indefinite quantities.
Numeral noun constructions and quantum noun
constructions are mentioned here only for the
sake of completeness1.
In the context of this article, we concentrate on
measure constructions and count constructions.
2.2.2 Measure Constructions and Count
Constructions
Measure Constructions For the description of
measure constructions the concept of measure-
ment, as known from the theory of measurement
(cf. (Suppes, 1963)), is used. By virtue of mea-
sure constructions, a real number n is assigned to
1Interested readers are referred to Wiese (1997) for a de-
tailed description of these construction types.
83
CATEGORY EXAMPLES
(1) numeral noun construction Dutzend (dozen), Million (million)
(2) quantum noun construction Menge (number), Unmenge (vast number), Un-
summe (amount), Vielzahl (multitude)
(3) count construction:
(a) count noun construction:
(i) numeral classifier construction Stück (piece)
(ii) shape noun construction Tropfen (drop), Laib (loaf), Scheibe (slice)
(iii) container noun construction Glas (glass), Tasse (cup), Kiste (crate)
(iv) singulative construction Halm (blade), Korn (grain)
(b) sort noun construction Sorte (kind), Art (type)
(c) collective noun construction:
(i) configuration construction Stapel (stack), Haufen (heap)
(ii) group collective construction Herde (herd), Gruppe (group), Paar (pair)
(4) measure construction:
(a) measuring unit construction (muc):
(i) abstract muc Meter (meter), Grad (degree), Euro (euro)
(ii) concrete muc:
- container noun construction Glas (glass), Tasse (cup), Kiste (crate)
- action noun construction Schluck (gulp/mouthful), Schritt (step)
(iii) relative muc Prozent (percent)
Table 1: Categorization of quanNG with respect to quanN
a measure object u that determines the value of
a property P (such as weight or temperature). In
the context of measurement, u is correlated to a
set m of measure objects (such as weights), whose
quantity directly or indirectly indicates the value
of the measured property P. The correlation to m
is made up by virtue of a measure function M, that
maps P(u) onto m.
The possible properties P are called dimensions
and the measure function is called measuring unit.
m is an indication of quantity such as 30 kilo-
grams. There exist two specification relations be-
tween m and u:
(5) vier
four
Kilogramm
kilogramm
Gewicht
weight
‘four kilograms of weight’
(6) vier
four
Kilogramm
kilogramm
Eisen
iron
‘four kilograms of iron’
In example 5 the dimension is explicitly given, in
example 6 the dimension is not explicitly given,
but can be inferred from the measure function M
(Kilogramm) and the measure object u (Eisen).
m is restricted concerning the compatibility with
dimension denoting nouns, verbs, and adjectives.
Thus, indications in metres can only be combined
with spatial dimensions such as height or length,
but they cannot be combined with temporal re-
lations. These restrictions can be modelled by
assuming that there are scales and degrees func-
tioning as abstract entities that are correlated by
dimensions and objects (cf. (Cresswell, 1977)).
Scales and degrees are to be considered as means
for the formal analysis of quantity-related phe-
nomena. A reasonable possibility to comprehend
degrees as abstractions over objects seems to be
the assumption that a degree is a class of objects
that cannot be differentiated wrt. the considered
dimension. A scale is a totally ordered set of de-
grees. Indications of measurement are predicates
over degrees or degree distances. Degrees of dif-
ferent scales cannot be compared with each other.
However, entities can be measured with the help
of different measuring units wrt. to one and the
same dimension. That means, there are different
measure functions (such as euros and dollars) that
are linked to each other depending on the respec-
84
Scale Measuring Unit Dimension Adjective Verb
length Meter (meter) length lang (long)
height hoch (high)
width breit (wide)
thickness dick (thick), stark
(strong)
distance weit (far)
depth tief (deep)
area Hektar (hectare) area groß (large)
volume & mass Liter (litre) capacity fassen
Gramm
(gramme)
weight schwer (heavy) wiegen (weigh)
time Jahr (year) duration of situa-
tion
lang(e) (long) dauern (last)
period of validity gültig (valid) laufen (run), gel-
ten (hold)
age alt (old)
value Euro (euro) value wert (worth)
price teuer (expensive) kosten (cost)
page Seite (page) length lang (long), dick
(thick), stark
(strong)
umfassen (com-
prise)
Mann (man) (team) strength stark (strong) umfassen (com-
prise)
temperature Grad Celsius (de-
gree celsius)
temperature warm (warm),
heiß (hot), kalt
(cold)
proportion Prozent (percent) relative size
Table 2: Scales, Measuring Units and Dimensions
tive dimension (such as price or hire).
By means of corpus investigations, we designed
eight scales. For each of these scales, we col-
lected the belonging measuring units (i.e. the
measure functions) together with the dimension
denoting adjectives and verbs referring to the
same scale. Whereas for the dimension weight
there exist a dimension denoting noun as well
as a dimension denoting adjective and a dimen-
sion denoting verb, there are other dimensions for
which there are lexical gaps. Table 2 lists our
scales together with an exemplarily chosen mea-
sure unit and the belonging dimension(s) and lex-
emes2. The scales presented in table 2 are con-
ceived with a view to syntactic analysis. Since
mass and volume look alike wrt. their surface re-
2Several derived dimensions such as “frequency” or
“density” are not considered.
alizations, i.e. the sets of dimension denoting ad-
jectives and verbs referring to the scales of mass
and volume overlap to a considerable degree, we
do not differentiate between these two scales in
the context of analysis. In our parsing system the
incompatibility of different scales is reflected in
lexicalized grammar rules: adjectives referring to
the weight scale can only take as measure argu-
ment measurement indications containing a mea-
suring unit referring to the same scale. This is
especially important in order to be able to dis-
tinguish an indication-of-quantity-reading (cf. ex-
ample 7) from a reading as a measure argument of
an adjective (cf. example 8).
85
(7) <NP> <NP> drei
three
Hektar
hectare
</NP> <NP>
weites
wide
Land
land
</NP> </NP>
‘three hectares of wide land’
(8) <NP> <AP> drei
three
Hektar
hectare
große
large
</AP>
Felder
fields
</NP>
‘three hectare large fields’
The measure constructions can be divided wrt.
the nature of the used measuring units into ab-
stract, concrete and relative measuring unit con-
structions.
Abstract Measuring Unit Constructions
Abstract measuring unit constructions contain an
abstract measuring unit as quanN. These abstract
measuring units are defined in the frame of physi-
cal theories and they are always restricted to a cer-
tain scale. By virtue of different scales, measuring
units can be categorized. The units of one and the
same scale can be converted into each other.
Concrete Measuring Unit Constructions
Apart from the abstract measuring units there are
a number of concrete measuring units. The con-
crete measuring unit constructions can be subdi-
vided into
1. container noun constructions; and
2. action noun constructions.
Container nouns can be used as measuring units
that are definable by virtue of the capacity that can
be assigned to the concrete object. In the case of
action noun constructions, the measurement act-
ing as the basis for the definition of the measuring
unit is the capacity restriction of the respective ac-
tion.
From a linguistical point of view, the distinction
of abstract measuring unit construcitons and con-
crete measuring unit constructions is important
insofar as:
1. dimension denoting nouns are preferably
combined with abstract or at least heavily
standardized measuring units; and
2. if mass nouns are directly combined with nu-
merals, such as zwei Kaffee (two coffees),
this construction cannot be understood as an
abbreviation of an abstract measuring unit
construction, such as zwei Liter Kaffee (two
liters of coffee), but, instead, it must be un-
derstood as an abbreviation of a concrete
measuring unit construction, such as zwei
Tassen Kaffee (two cups of coffee).
Relative Measuring Unit Constructions
Relative measuring units such as Prozent (per-
cent) serve to specify relative sizes. Relative
measuring units are not restricted to a certain
scale.
Count Constructions Count constructions
contrast to mass constructions. Count construc-
tions do not serve for the measurement of certain
substances, but, for the numerical quantification
of discrete entities. That means, the number
assignment does not identify values of a certain
property P, but, it refers to the number of discrete
entities. Restrictions concerning the compat-
ibility are dependent on the properties of the
denotates of the nouns they refer to.
Depending on quanN, the count constructions
can be subdivided into three construction types:
1. count noun constructions;
2. sort noun constructions; and
3. collective noun constructions.
Count Noun Constructions The count noun
constructions are again split up wrt. the nature of
the respective quanN into:
1. numeral classifier constructions;
2. shape noun constructions;
3. container noun constructions; and
4. singulative constructions.
Numeral classifiersonly play a marginal
role in German since also the direct combination
of a count noun with a numeral functions as an
indication of counting. In other languages, nu-
meral classifier constructions occur far more of-
ten (cf. (Bond, 1996)).
86
In German, shape nouns such as Scheibe
(slice) or Laib (loaf) specify spatial (shape) prop-
erties of the object. Even if it is possible to cut a
loaf of bread into 15 slices of bread, the adequacy
of the usage of a loaf of bread or 15 slices of bread
depends on the actual state of the object referred
to. Thus, shape nouns only reflect object-inherent
properties. This distinguishes shape noun con-
structions from abstract measuring unit construc-
tions: It is irrelevant whether we describe a roast
as 1 kilo of meat or as two pounds of meat. In
the latter case, it is not necessary that we have
two separate portions of meat. Count construc-
tions with shape nouns as count units have a com-
plex semantic structure: The object of the numer-
ical quantification is an entity of a certain sub-
stance with a certain shape. Thus, the construc-
tion denotes objects that are identified by virtue
of two conceptual components, namely “shape”
and “substance”. The referees of the shape noun
and the substance noun form one complex con-
cept, whose instances are numerically quanti-
fied. These combinations can typically also be
expressed with the help of a meaning-conserving
compound.
Container nouns have a special status
within the quanN, since they belong to the ab-
solute nouns and only their usage in quantify-
ing constructions transforms them into relational
nouns that need the completion by Num and El-
emN. The container nouns can be used outside
the quanNG to refer to the corresponding concrete
entities. We distinguish between three readings of
container noun constructions:
(9) Er
He
nahm
took
ein
a
Glas
glass
aus
out
dem
the
Schrank.
cupboard.
‘He took a glass out of the cupboard.’
(10) Auf
On
dem
the
Tisch
table
stehen
stand
drei
three
Glas/Gläser
glasses/glass
Wasser.
water.
‘Three glasses of water are on the table.’
(11) In
In
der
the
Karaffe
carafe
sind
are
drei
three
Gläser/Glas
glasses/glass
Wasser.
water.
‘In the carafe are three glasses of water.’
In example 9 Glas is an absolute noun denoting
the physical object. In examples 11 and 10 it
refers to indications of quantity. But, whereas
in example 10 it deals with the concrete quan-
tity of water which is in the given containers,
i.e. the glasses, in example 11, it deals with a con-
ventionalized quantity of water corresponding to
the conventional standard size of the container of
the given kind, i.e. the glass. The difference be-
tween the latter examples reflects the difference
between container noun constructions as count
constructions (cf. example 10) and container noun
constructions as measure constructions (cf. exam-
ples 11).
Often, count constructions with container nouns
as count unit and measure constructions with con-
tainer nouns as measuring unit are not distin-
guishable from each other, since the quantifica-
tion of a set of equally large (filled) containers al-
ways also identifies the volume of their content.
Thus, it is not always possible to decide, if, with
a given construction, containers are numerically
quantified or if the volume of their contents is
measured. Concrete measuring units, in contrast
to container nouns as count units, often occur in
a number-unmarked form. But, first, this is not
always the case, and, second, a missing number
marking is only a hint for a measure construction;
the reverse inference cannot be drawn: a plural
noun can function both as count unit and as mea-
suring unit (cf. example 12, where a plural con-
tainer noun undoubtedly functions as measuring
unit).
(12) Eine
One
Kugel
ball
Vanilleeis,
vanilla ice cream,
ein
one
Glas
glass
Ananassaft
pineapple juice
und
and
zwei
two
Gläser
glasses
Milch
milk
im
in the
Mixer
mixer
mischen.
mix.
‘Mix one ball of vanilla ice cream, one glass
of pineapple juice and two glasses of milk
in the mixer.’
Sort Noun Constructions Sort noun con-
structions allow for indications of quantity based
on sortal distinctions.
87
Collective Noun Constructions Among the
collective noun constructions we distinguish be-
tween
1. configuration constructions; and
2. collective noun constructions.
The configuration constructions are
similar to shape noun constructions insofar as the
configuration noun as well as the shape noun car-
ries annotation of a certain shape. But, in contrast
to the shape nouns, collective noun constructions
do not denote an individuating operation, but a
collectivizing operation.
In addition to the collectivizing effect, the
quanN in group collective construc-
tions carries certain social and functional as-
pects.
3 The Quantifying Noun Group in
TIGER and NEGRA
In accordance with the TIGER annotation scheme
(cf. (Albert et al., 2003)), in TIGER (and also in
NEGRA), quantifying noun groups are annotated
as sequences of nouns, i.e. entirely flat. More-
over, there is no distinction between quantifying
noun groups and other kinds of noun groups com-
posed of several nominal constituents, i.e., ein
Liter Wasser (one liter of water) is annotated the
same way as eine Art Seife (a kind of soap) and
die Bahn AG (the Bahn AG).
3.1 Proposals for a refinement of the
annotation scheme
Starting from the observations and problems de-
scribed in the preceding sections, we propose
certain partly syntactically, partly semantically
motivated refinements of the TIGER annotation
scheme.
First of all, the quantifying noun group should
not be annotated as a sequence of nouns, but El-
emN should be a separately built up noun phrase,
attached to quanN, which in turn constitutes a
complex noun phrase together with Num. With
this annotation, we reflect the fact, that quanNG
can be considered as a loose appositive syntagma
(cf. (Krifka, 1989)). quanNG as a complex noun
phrase should get a label signalling that it is an
indication of quantity.
Second, the determination of the head of quanNG
has to be rethought: from a syntactic point of
view, quanN functions as head, from a seman-
tic point of view ElemN functions as head. We
plead for the annotation of a complex head con-
sisting of both, the head of quanN and the head
of ElemN. The decision for always assigning a
two-place head to quanNG is quite important if
we deal with elliptic constructions. In sentences,
such as example 13, we do not want to infer that
the boy eats a plate, and in sentences, such as ex-
ample 14, we want to infer that the coffee is in a
certain container.
(13) Der
The
Junge
boy
ißt
eats
einen
a
großen
big
Teller.
plate.
‘The boy eats a big plate (of x).’
(14) Sie
They
bestellten
ordered
zwei
two
Kaffee.
coffees.
‘They ordered two (cups of) coffee.’
Moreover, in the context of quanNG it should
be annotated if the involved ElemN functions as
count noun or as mass noun. In measure con-
structions, singular count nouns always get a mass
noun reading. This information is quite useful
wrt. theoretical linguistic investigations.
A similar question is the differentiation between
container nouns as count units and as measuring
units. Again, this distinction would allow for fine-
grained linguistic research using a treebank as a
valuable resource.
Because of the minor lexical content of count
and measuring units, indications of quantity can
only be combined with few adjectives. Therefore,
there are cases, in which the adjective does not
refer to quanN, but directly to ElemN (cf. exam-
ple 15).
(15) einige
some
schäumende
foamy
Gläser
glasses
Bier
beer
‘some glasses of foamy beer’
In these cases, we should have a link signalling
that schäumend does not refer to the glass, but to
the beer. That means, that despite surface order
and agreement phenomena (schäumende does not
congrue with Bier), the information that the ad-
jective modifies ElemN and not quanN should be
88
contained in the treebank.
Concerning the refinement proposals so far, in
most of the cases the annotation requires manual
work. Another ambiguity that has to be resolved
in a treebank concerns all nouns that do not only
have a quanN-reading, but that can also be used
outside the quanN and then refer to the respec-
tive concrete object. If we find a sequence of two
nouns and the first one could be a quanN, we have
to decide whether it is a quanNG or not. Some-
times, this problem could be solved by means
of subcategorization information. But, consider-
ing two sentences, such as sentences 16 and 17,
we cannot distinguish the two readings on purely
syntactic grounds.
(16) Er
He
stiftete
donated
drei
three
Tafeln
blackboards
Schulen.
schools.
‘He donated three blackboards to schools.’
(17) Er
He
stiftete
donated
drei
three
Tafeln
bars
Schokolade.
chocolade.
‘He donated three bars of chocolade.’
For sentence 16, a concrete-object-reading should
be annotated (as depicted in figure 2), whereas
for sentence 17 an indication-of-quantity-reading
should be annotated (as depicted in figure 3).
In the parsing system of SILVA, we use lexical-
ized rules in order to exclude an indication-of-
quantity-reading for sentence 16. Lists containing
information about quanN and their potential El-
emN were gained by corpus investigation. Even
if they are, of course, not exhaustive, they could
serve as a starting point for a (semi)-automatic
annotation of indications of quantity containing
a noun as head of a quanN that can refer to a con-
crete obect.
4 Using information about indications of
quantity for grammar induction
In a first experiment, we enriched NEGRA with
information about
1. our scales together with their respective ad-
jectives (cf. table 2); and
2. nouns that could yield a concrete-object-
reading as well as an indication-of-quantity-
reading together with typical ElemN follow-
ing them.
Figure 2: Concrete-Object-Reading
Figure 3: Indication-of-Quantity-Reading
Starting from the enriched TIGER we used BitPar
(cf. (Schmid, 2004) and (Schiehlen, 2004)), an
efficient parser for treebank grammars in order to
induce a grammar and parse the treebank.
Without the additional information BitPar
reached an F-Value of 76.17%. After adding the
information the F-Value increased to 76.27%.
Obviously, this is not a tremendous improvement
of performance, but looking at the absolute
numbers, we get a more differentiated picture:
there are 11 constructions containing an indi-
cation of measurement functioning as measure
complement of an adjective. Before adding the
information about the scales and the respective
adjectives, only 3 constructions were rightly
annotated; after having added the information, 6
constructions were rightly annotated.
That indicates that the annotation can help
increase the performance, and it does not lower
the performance, which is not self-evident.
5 Conclusions
In this article, we described the German quan-
tifying noun group in some detail. We pro-
posed a fine-grained, semantically motivated clas-
89
sification of German quantifying noun groups.
We argued that a more sophisticated treatment
of these construction types could help improv-
ing the treebank considerably. We illustrated the
problem of ambiguity between concrete readings
and indication-of-quantity-readings as well as the
problem of measure arguments or indications of
quantity. Moreover, we showed how the latter
problem is solved in a corpus-based parsing ap-
proach. We outlined some ideas for the refine-
ment of the TIGER annotation scheme in order
that it meets requirements in the context of lin-
guistically demanding extraction tasks. We con-
cluded with the presentation of a first experiment
using the more fine-grained information for gram-
mar induction.
Acknowledgements
Thanks to Michael Schiehlen for performing the
grammar induction experiment.
The presented research is undertaken in the con-
text of the DFG-funded project Disambiguierung
einer Lexikalisch-Funktionalen Grammatik für
das Deutsche.
References
Stefanie Albert et al. 2003. TIGER-
Annotationsschema. Technical Report. University
of Potsdam, Saarland University, University of
Stuttgart.
Hiyan Alshawi and David Carter. 1994. Training and
Scaling Preference Functions for Disambiguation.
Computational Linguistics, 20(4):635-648.
Francis Bond, Kentaro Ogura and Satoru Ikehara.
1996. Classifiers in Japanese-to-English Machine
Translation. Proceedings of the 16th International
Conference on Computational Linguistics, 125–
130.
Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolf-
gang Lezius, George Smith. 2002. The TIGER
TreeBank. Proceedings of the Workshop on Tree-
banks and Linguistic Theories, Sozopol.
Eugene Charniak. 1996. Tree-bank grammars. Pro-
ceedings of the National Conference on Artificial
Intelligence (AAAI).
Max J. Cresswell. 1977. The semantics of degree.
Montague Grammar, 261–292. Barbara H. Partee
(Ed.). Academic Press, New York.
Joseph van Genabith, Louisa Sadler and Andy Way.
1999. Semi-Automatic Generation of F-Structures
from Treebanks. 4th International Conference on
Lexical Functional Grammar (LFG-99).
Manfred Krifka. 1989. Nominalreferenz und Zeitkon-
stitution. Zur Semantik von Massentermen, Plural-
termen und Aspektklassen. Fink, München.
Mitchell P. Marcus, Beatrice Santorini and Mary A.
Marcinkiewicz. 1994. Building a Large Annotated
Corpus of English: The Penn Treebank. Computa-
tional Linguistics, 19(2):313–330.
Lance Ramshaw and Mitchell P. Marcus. 1995. Text
Chunking Using Transformation-Based Learning.
Proceedings of the Third Workshop on Very Large
Corpora, 82–94.
Michael Schiehlen. 2004. Annotation Strategies
for Probabilistic Parsing in German. Proceedings
of the 20th International Conference on Computa-
tional Linguistics.
Helmut Schmid. 2004. Efficient Parsing of Highly
Ambiguous Context-Free Grammars with Bit Vec-
tors. Proceedings of the 20th International Confer-
ence on Computational Linguistics.
Wojciech Skut, Brigitte Krenn, Thorsten Brants and
Hans Uszkoreit. 1997. An Annotation Scheme
for Free Word Order Languages. Proceedings of
the Fifth Conference on Applied Natural Language
Processing (ANLP-97). Washington, DC, USA.
Wojciech Skut, Thorsten Brants, Brigitte Krenn and
Hans Uszkoreit. 1998. A linguistically interpreted
corpus of German newspaper text. Proceedings of
the Conference on Language Resources and Evalu-
ation LREC, Granada, Spain.
B. Srinivas. 1997. Performance evaluation of su-
pertagging for partial parsing. Proceedings of
Fifth International Workshop on Parsing Technol-
ogy, Boston, USA.
P. Suppes and J. L. Zinnes. 1963. Basic Measurement
Theory. Handbook of Mathematical Psychology, 1.
R. D. Luce (Ed.). Wiley, New York.
Wolfgang Wahlster. 2000. Verbmobil:Foundations of
Speech-to-Speech Translation. Springer.
Heike Wiese. 1997. Zahl und Numerale. Akademie
Verlag, Berlin.
90
