File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1511_metho.xml

Size: 10,645 bytes

Last Modified: 2025-10-06 14:09:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1511">
  <Title>From a Surface Analysis to a Dependency Structure</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Arrow properties
</SectionTitle>
    <Paragraph position="0"> The motivation behind an arrow property is to connect two elements, because the established relation is needed to reach the desired semantic representation (B es, 1999). Notice that this formalism can be applied to establish dependencies either between words, chunks or phrases. Nevertheless, arrows can be seen as dependencies but, contrary to the main dependency theories, an arrow is not labeled and go from dependents to the head (Hag ege, 2000).</Paragraph>
    <Paragraph position="1"> Let C be the set of category labels available, M the set of chunk labels, P a set of phrase labels and I a set of indexes.</Paragraph>
    <Paragraph position="2"> Arrow Property: An arrow property is a tuple (X, n, Z, Y, m, R+, R ) noted by:</Paragraph>
    <Paragraph position="4"> R+, R are sets of constraints over the arrows (respectively, the set of constraints that Z must verify, either positive ones (R+) on symbols which must be attested or</Paragraph>
    <Paragraph position="6"> Both R+, R impose simple constraints over the arrows, such as symbols that should or should not occur within Z or linear order relations that should be satis ed between its constituents. As an example, the following arrow property says that within an interrogative phrase (Pint), an interrogative chunk (IntC) with an interrogative pronoun inside (pint) arrows a nominal chunk (NC) on its right (i k), as long as there is no other nominal chunk between them (i j k).</Paragraph>
    <Paragraph position="7"> IntCi(fpintg/) !Pint NCk -fNCjg A more complex type of constraint is the \stack&amp;quot; constraint (Coheur, 2004). This constraint is based on the linguistically motivated work over balanced parentheses of (B es and Dahl, 2003; B es et al., 2003). Brie y, the idea behind that work is the following: given a sentence, if we introduce a left parentheses everytime we nd a word such as que(that), se(if ), ...) { the introducers { and a right parentheses everytime we nd an in ected verbal form1, at the end of the sentence, the number of left parentheses is equal to the number of right ones, and at any point of it, the number of left ones is equal or greater that the number of right ones (B es and Dahl, 2003). In (B es and Dahl, 2003), they use this natural language evidence in order to identify the main phrase, relatives, coordinations, etc. Within our work, we use it to precise arrowing relations. For example, consider the sentence Quais os hot eis que t^em piscina?(Which are the hotels that have a swimming pool?). The surface analysis of this statement results in the following (where VC stands for verbal chunk):</Paragraph>
    <Paragraph position="9"> Typically the NC os hot eis arrows the main VC, but in this situation, as there is no main VC we want it to arrow itself. Nevertheless, there is an arrow property saying that an NC can arrow a VC, which applied to this particular situation  Roughly, we use the stack constraint that says that an NC arrows a VC if the stack of introducers and exioned verbs is empty between them2:</Paragraph>
    <Paragraph position="11"> As a result, if we consider again the example Quais os hot eis que t^em piscina, the NC hot eis will not arrow the VC t^em, because the stack constraint is not veri ed between them (there is only the introducer que).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Reaching the dependency
</SectionTitle>
    <Paragraph position="0"> structure</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Surface analysis
</SectionTitle>
      <Paragraph position="0"> From existence and linearity properties (P2 of 5P) speci yng chunks, it can be deduced what categories can or must start a chunk, and which ones can or must be the last one.</Paragraph>
      <Paragraph position="1"> Drawing on this linguistic information, chunks are detected in a surface analysis made by Susana (Batista and Mamede, 2002). As an example, consider the question Qual a maior praia do Algarve?(Which is the biggest beach in Algarve?). Susana outputs the following surface analysis (where PC stands for prepositional chunk):</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Algas
</SectionTitle>
      <Paragraph position="0"> Algas is the C++ program responsible for connecting chunks and the elements inside them, taking as input a structure that contains information from arrow properties and also information that can limit the search space (see section 4 from details about this). Additionally, as inside the majority of the chunks all the elements arrow the last element (the head), the user can declare which are the chunks that verify this property. As a result, no calculus need to be made in order to compute dependencies inside these chunks: all its elements arrow the last one.</Paragraph>
      <Paragraph position="1"> This possibility is computational very usefull.</Paragraph>
      <Paragraph position="2"> 2In fact, this restriction is a little more complicated than this.</Paragraph>
      <Paragraph position="3"> Continuing with our example, after Algas execution, we have the output from Figure 2.</Paragraph>
      <Paragraph position="4"> Both the IntC and the PC chunks arrow the NC and inside them, all the elements arrow the  are made to its output: (1) There is at most an element arrowing itself, inside each chunk; (2) Cycles are not allowed; (3) Arrow crossing is not allowed (projectivity); null (4) An element cannot be the target of an arrow if it is not the source of any arrow.</Paragraph>
      <Paragraph position="5">  Notice that these constraints are made inside the program. Notice that, in particular the projectivity requirement is not imposed by 5P. We impose it, due to the fact that { for the moment { we are only dealing with written Portuguese, that typically respects this property.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Ogre
</SectionTitle>
      <Paragraph position="0"> After Algas, the text is processed by Ogre, a pipeline of Perl and XSLT scripts, that generates a graph from the arrowed structures produced by Algas3. This process is based on the following: if a chunk arrows another chunk, the head of the rst chunk will arrow the head of the second chunk, and the chunk label can be omitted.</Paragraph>
      <Paragraph position="1"> Continuing with our example, after Ogre we have the graph of Figure 3 (a dependency struc- null 3Arrowed structures produced by Algas can also be seen as a graph, having nodes containing graphs. It might seem that we are keeping away information in this step, but the new arrowing relation between chunk heads keeps the lost structures. Beside, as information about the direction of the arrows is kept, and the position of each word is also kept in the graph, we are still able to distinguish behaviours dependent on word order for the following semantic task.</Paragraph>
      <Paragraph position="2"> That is, both semantic relations and word order are kept within our graph.</Paragraph>
      <Paragraph position="3"> Ogre's motivation is to converge di erent structures into the same graph. For example, after Ogre's execution O Ritz e onde?, E onde o Ritz? and Onde e o Ritz?, they all share the same graph (appart from positions).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 From descriptions to the
</SectionTitle>
    <Paragraph position="0"> algorithm input structures In order to keep descriptions apart from processing, arrow properties and Algas input structures are developed in parallel. Then, arrow properties are formally mapped into Algas input structures (see (Coheur, 2004) for details). This decision allowed us to add computational constraints to Algas input structures, leaving descriptions untouchable.</Paragraph>
    <Paragraph position="1"> In fact, in order to reduce the search space, Algas has the option of letting the user control the distance between the source and the target of an arrow. This is particularly very usefull to control PP attachments (in this case PC attachments). Thus, if we want a PC to arrow an NC that is at most n positions away, we simply say: PC !S NC [fNC &lt;n PCg/] Notice that we could make an extension over the arrow properties formalism in order to allow this kind of information. Nevertheless, it is well know that in natural language there is no x distance between two elements. Adding a distance constraint over arrow properties would add procedural information to a repository resulting from natural language observations.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Applications
</SectionTitle>
    <Paragraph position="0"> Both Algas and Ogre are part of a syntactic-semantic interface, where the module responsible for the generation of logical forms is called AsdeCopas (Coheur et al., 2003). This interface has been applied in a semantic disambiguation task of a set of quanti ers and also in question interpretation.</Paragraph>
    <Paragraph position="1"> Notice that, although arrows are not labeled, the fact that we know its source, target and direction, give us enough information to nd (or at least guess) a label for it. In fact, we could add a label to the majority of the arrows. For example, using the link-types from the Link Grammar (Sleator and Temperley, 1993; Sleator, 1998), if an adverb connects an adjective, this connection would be labeled EA, if an adverb connects another adverb, the label would be EE. AsdeCopas can be used to add this information to the graph. Nevertheless, the fact that we are using an unlabelled connection serves languages as Portuguese particularly well. In Portuguese, it is not 100% sure that we are able to identify the subject. For example, we can say \O Tom as come a sopa.&amp;quot;, \Come a sopa o Tom as.&amp;quot;, or even \A sopa come o Tom as.&amp;quot; having all the same (most probable) interpretation: Thomas eats the soup. That is, there is no misleading interpretation due to our knowledge of the world: a man can eat a soup, but a soup cannot eat a man. As so, arrow properties simply establish relations, and we leave to semantic analysis the task of deciding what is the nature of these relations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML