From a Surface Analysis to a Dependency Structure
Lu  sa Coheur
L2F INESC-ID / GRIL
Lisboa, Portugal
Luisa.Coheur@l2f.inesc-id.pt
Nuno Mamede
L2F INESC-ID / IST
Lisboa, Portugal
Nuno.Mamede@inesc-id.pt
Gabriel G. B es
GRIL / Univ. Blaise-Pascal
Clermont-Ferrand, France
BesGabriel@yahoo.fr
Abstract
This paper describes how we use the arrows
properties from the 5P Paradigm to generate
a dependency structure from a surface analy-
sis. Besides the arrows properties, two mod-
ules, Algas and Ogre, are presented. Moreover,
we show how we express linguistic descriptions
away from parsing decisions.
1 Introduction
Following the 5P Paradigm (B es, 1999; Hag ege,
2000; B es and Hag ege, 2001) we build a
syntactic-semantic interface which obtains a
graph from the analysis of input text. The
graph express a dependency structure, which is
the domain of a function that will obtain as out-
put a logic semantic interpretation.
The whole syntactic-semantic interface is in-
tegrated by four modules: Susana in charge
of surface analysis, Algas and Ogre, de ning
the graph, and ASdeCopas, that obtains the
logic semantic representation. In this paper
we present the  rst three modules, focussing
mainly on Algas and Ogre.
5P argues for a carefully separation between
linguistic descriptions and algorithms. The  rst
ones are expressed by Properties and the last
ones by Processes. Futhermore, linguistic mod-
elised and formalised descriptions (i.e. Prop-
erties, P2 of 5P) are not designed to be the
declarative source of algorithms, but rather as
a repository of information (Hag ege and B es,
2002) that one should be able to re-use (to-
tally or partially) in each task. Following and
completing this, we assume that the parsing is-
sue can be viewed from at least three di erent
points of view: (i) modelised and formalised
linguistic observation; (ii) computational e ec-
tive procedures; (iii) useful computational con-
straints. These three aspects of the same issue
are distinctly tackled in the proposed syntactic-
semantic interface, but they converge in the ob-
tention of results.
There are three di erent kinds of Properties
(P2) in 5P: existence, linearity and arrow prop-
erties. The  rst two underly the Susana module
(3.1). They express which are the possible mor-
phological categories of some expression and the
possible order between them. The third ones
arrow properties specify arrow pairs, which for-
mally are directed arcs of a graph. Arrow prop-
erties underly the Algas (3.2) and Ogre (3.3)
modules. At the level of Projections (i.e. P3
of 5P) the balanced parentheses structure un-
derlying sentences is exploited (2). Computa-
tional useful constraints improve Algas perfor-
mance (5).
2 Arrow properties
The motivation behind an arrow property is to
connect two elements, because the established
relation is needed to reach the desired semantic
representation (B es, 1999). Notice that this for-
malism can be applied to establish dependencies
either between words, chunks or phrases. Nev-
ertheless, arrows can be seen as dependencies
but, contrary to the main dependency theories,
an arrow is not labeled and go from dependents
to the head (Hag ege, 2000).
Let C be the set of category labels available,
M the set of chunk labels, P a set of phrase
labels and I a set of indexes.
Arrow Property: An arrow property is a
tuple (X, n, Z, Y, m, R+, R ) noted by:
Xn !Z Ym,
+R+
-R 
where:
 X, Y 2 M [ C (X is said to be the source
and Y the target of the arrow);
 Z 2 M [ P (the segment labeled Z contains
X and Y);
 R+, R are sets of constraints over the ar-
rows (respectively, the set of constraints
that Z must verify, either positive ones
(R+) on symbols which must be attested or
negative ones (R ) on symbols which must
not occur);
 n, m 2 I.
Both R+, R impose simple constraints over
the arrows, such as symbols that should or
should not occur within Z or linear order re-
lations that should be satis ed between its con-
stituents. As an example, the following ar-
row property says that within an interroga-
tive phrase (Pint), an interrogative chunk (IntC)
with an interrogative pronoun inside (pint) ar-
rows a nominal chunk (NC) on its right (i  
k), as long as there is no other nominal chunk
between them (i  j  k).
IntCi(fpintg/) !Pint NCk
-fNCjg
A more complex type of constraint is the
\stack" constraint (Coheur, 2004). This con-
straint is based on the linguistically motivated
work over balanced parentheses of (B es and
Dahl, 2003; B es et al., 2003). Brie y, the
idea behind that work is the following: given
a sentence, if we introduce a left parentheses
everytime we  nd a word such as que(that),
se(if ), ...) { the introducers { and a right
parentheses everytime we  nd an in ected
verbal form1, at the end of the sentence, the
number of left parentheses is equal to the
number of right ones, and at any point of it,
the number of left ones is equal or greater
that the number of right ones (B es and Dahl,
2003). In (B es and Dahl, 2003), they use this
natural language evidence in order to identify
the main phrase, relatives, coordinations, etc.
Within our work, we use it to precise arrowing
relations. For example, consider the sentence
Quais os hot eis que t^em piscina?(Which are
the hotels that have a swimming pool?). The
surface analysis of this statement results in the
following (where VC stands for verbal chunk):
(Quais)IntC (os hot eis)NC (que)RelC
(t^em)V C (piscina)NC
Typically the NC os hot eis arrows the main
VC, but in this situation, as there is no main VC
we want it to arrow itself. Nevertheless, there is
an arrow property saying that an NC can arrow
a VC, which applied to this particular situation
1See (B es and Dahl, 2003) for details about how to
deal with coordination.
would establish a wrong dependency (Figure 1).
Figure 1: Wrong dependency
Roughly, we use the stack constraint that says
that an NC arrows a VC if the stack of introduc-
ers and  exioned verbs is empty between them2:
NCi !S VCk
+fstackj = [ ]g
As a result, if we consider again the example
Quais os hot eis que t^em piscina, the NC hot eis
will not arrow the VC t^em, because the stack
constraint is not veri ed between them (there
is only the introducer que).
3 Reaching the dependency
structure
3.1 Surface analysis
From existence and linearity properties (P2
of 5P) speci yng chunks, it can be deduced
what categories can or must start a chunk,
and which ones can or must be the last one.
Drawing on this linguistic information, chunks
are detected in a surface analysis made by
Susana (Batista and Mamede, 2002). As an
example, consider the question Qual a maior
praia do Algarve?(Which is the biggest beach
in Algarve?). Susana outputs the following
surface analysis (where PC stands for preposi-
tional chunk):
(Qual)IntC (a maior praia)NC (do Al-
garve)PC (?)Ponct
3.2 Algas
Algas is the C++ program responsible for con-
necting chunks and the elements inside them,
taking as input a structure that contains infor-
mation from arrow properties and also informa-
tion that can limit the search space (see section
4 from details about this). Additionally, as in-
side the majority of the chunks all the elements
arrow the last element (the head), the user can
declare which are the chunks that verify this
property. As a result, no calculus need to be
made in order to compute dependencies inside
these chunks: all its elements arrow the last one.
This possibility is computational very usefull.
2In fact, this restriction is a little more complicated
than this.
Continuing with our example, after Algas ex-
ecution, we have the output from Figure 2.
Both the IntC and the PC chunks arrow the
NC and inside them, all the elements arrow the
head.
Figure 2: Algas’s output.
Algas is able to skip unalyzable parts of a
sentence, but (for the moment) some constraints
are made to its output:
(1) There is at most an element arrowing itself,
inside each chunk;
(2) Cycles are not allowed;
(3) Arrow crossing is not allowed (projectiv-
ity);
(4) An element cannot be the target of an ar-
row if it is not the source of any arrow.
Notice that these constraints are made inside
the program. Notice that, in particular the pro-
jectivity requirement is not imposed by 5P. We
impose it, due to the fact that { for the moment
{ we are only dealing with written Portuguese,
that typically respects this property.
3.3 Ogre
After Algas, the text is processed by Ogre, a
pipeline of Perl and XSLT scripts, that gener-
ates a graph from the arrowed structures pro-
duced by Algas3. This process is based on the
following: if a chunk arrows another chunk, the
head of the  rst chunk will arrow the head of
the second chunk, and the chunk label can be
omitted.
Continuing with our example, after Ogre we
have the graph of Figure 3 (a dependency struc-
ture). Basically, IntC and PC head { respec-
tively qual and Algarve { arrow now the NC
head.
Figure 3: Ogre’s output.
3Arrowed structures produced by Algas can also be
seen as a graph, having nodes containing graphs.
It might seem that we are keeping away infor-
mation in this step, but the new arrowing rela-
tion between chunk heads keeps the lost struc-
tures. Beside, as information about the direc-
tion of the arrows is kept, and the position of
each word is also kept in the graph, we are
still able to distinguish behaviours dependent
on word order for the following semantic task.
That is, both semantic relations and word order
are kept within our graph.
Ogre’s motivation is to converge di erent
structures into the same graph. For example,
after Ogre’s execution O Ritz  e onde?,  E onde
o Ritz? and Onde  e o Ritz?, they all share the
same graph (appart from positions).
4 From descriptions to the
algorithm input structures
In order to keep descriptions apart from pro-
cessing, arrow properties and Algas input struc-
tures are developed in parallel. Then, arrow
properties are formally mapped into Algas in-
put structures (see (Coheur, 2004) for details).
This decision allowed us to add computational
constraints to Algas input structures, leaving
descriptions untouchable.
In fact, in order to reduce the search space,
Algas has the option of letting the user control
the distance between the source and the target
of an arrow. This is particularly very usefull
to control PP attachments (in this case PC
attachments). Thus, if we want a PC to arrow
an NC that is at most n positions away, we
simply say:
PC !S NC [fNC <n PCg/]
Notice that we could make an extension over
the arrow properties formalism in order to al-
low this kind of information. Nevertheless, it
is well know that in natural language there is
no  x distance between two elements. Adding a
distance constraint over arrow properties would
add procedural information to a repository re-
sulting from natural language observations.
5 Applications
Both Algas and Ogre are part of a syntactic-
semantic interface, where the module responsi-
ble for the generation of logical forms is called
AsdeCopas (Coheur et al., 2003). This interface
has been applied in a semantic disambiguation
task of a set of quanti ers and also in question
interpretation.
Notice that, although arrows are not labeled,
the fact that we know its source, target and
direction, give us enough information to  nd
(or at least guess) a label for it. In fact, we
could add a label to the majority of the ar-
rows. For example, using the link-types from
the Link Grammar (Sleator and Temperley,
1993; Sleator, 1998), if an adverb connects an
adjective, this connection would be labeled EA,
if an adverb connects another adverb, the la-
bel would be EE. AsdeCopas can be used to
add this information to the graph. Neverthe-
less, the fact that we are using an unlabelled
connection serves languages as Portuguese par-
ticularly well. In Portuguese, it is not 100% sure
that we are able to identify the subject. For
example, we can say \O Tom as come a sopa.",
\Come a sopa o Tom as.", or even \A sopa come
o Tom as." having all the same (most probable)
interpretation: Thomas eats the soup. That is,
there is no misleading interpretation due to our
knowledge of the world: a man can eat a soup,
but a soup cannot eat a man. As so, arrow prop-
erties simply establish relations, and we leave to
semantic analysis the task of deciding what is
the nature of these relations.
6 Conclusions
We presented two modules { Algas and Ogre
{ that build a dependency graph from a sur-
face analysis. Algas uses information from a
formalism called arrows properties. Neverthe-
less this formalism is independent from Algas
input structures, that can be enriched with in-
formation that limits the relations to establish.
In the future we want the user to be able to
control the constraints over Algas output. That
is, the user will have the option to chose if out-
put may contain arrows crossing or not.
For the moment the Susana-Algas-Ogre mod-
ules of the syntactic-semantic interface behave
without problems in the domain of question
interpretation. They apply successfully to an
elicited corpus of questions produced by N por-
tuguese speakers which were asked to produce
them simulating e ective and natural questions.
Our next step is to try to use them incremen-
tally (A  t-Mokhtar et al., 2002).
Also, another improvement will be over arrow
properties, as we want to organise them in a
hierarchy.
7 Acknowledgements
This paper was supported by FCT (Funda c~ao
para a Ci^encia e Tecnologia) and by Project
POSI/PLP/41319/2001 (FEDER).

References
Salah A  t-Mokhtar, Jean-Pierre Chanod, and
Claude Roux. 2002. Robustness beyound
shallowness: incremental deep parsing. Nat-
ural Language Engineering, pages 121{144.
Fernando Batista and Nuno Mamede. 2002.
SuSAna: M odulo multifuncional da an alise
sint actica de superf  cie. In Julio Gonzalo,
Anselmo Pe~nas, and Antonio Ferr andez, ed-
itors, Proc. Multilingual Information Access
and Natural Language Processing Workshop
(IBERAMIA 2002), pages 29{37, Sevilla,
Spain, November.
Gabriel G. B es and Veronica Dahl. 2003. Bal-
anced parentheses in nl texts: a useful cue
in the syntax/semantics interface. In Nacy
Workshop on Prospects and Advances in the
Syntax/Semantics Interface.
Gabriel G. B es and Caroline Hag ege. 2001.
Properties in 5P. Technical report, GRIL,
Universit e Blaise-Pascal, Clermont-Ferrand,
France, November.
Gabriel G. B es, Veronica Dahl, Daniel Guil-
lot, Lionel Lamadon, Ioana Milutinovici, and
Joana Paulo. 2003. A parsing system for bal-
anced parentheses in nl texts. In CLIN’2003.
Gabriel G. B es. 1999. La phrase verbal noyau
en fran cais. In in Recherches sur le fran cais
parl e, volume 15, pages 273{358. Universit e
de Provence, France.
Lu  sa Coheur, Nuno Mamede, and Gabriel G.
B es. 2003. ASdeCopas: a syntactic-semantic
interface. In Epia, Beja, Portugal, Dezembro.
Springer-Verlag.
Lu  sa Coheur. 2004. A interface entre a sin-
taxe e a sem^antica no quadro das l  nguas
naturais. Ph.D. thesis, Instituto Superior
T ecnico, Universidade T ecnica de Lisboa,
Portugal, Universit e Blaise-Pascal, France.
work in progress.
Caroline Hag ege and Gabriel G. B es. 2002. En-
coding and reusing linguistic information ex-
pressed by linguistic properties. In Proceed-
ings of COLING’2002, Taipei.
Caroline Hag ege. 2000. Analyse Syntatic
Automatique du Portugais. Ph.D. thesis,
Universit e Blaise Pascal, Clermont-Ferrand,
France.
Daniel Sleator and Davy Temperley. 1993.
Parsing english with a link grammar. In Pro-
ceedings of the Third International Workshop
on Parsing Technologies.
Daniel Sleator. 1998. Summary of link types.
