File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-2026_metho.xml

Size: 10,738 bytes

Last Modified: 2025-10-06 14:07:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-2026">
  <Title>ing of a tree adjoining grammar using finitestate machines. In Proceedings of the Sixth International Workshop on tree Adjoining</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Sample Application: WORDSEYE
WORDSEYE (Coyne and Sproat, 2001) is a
</SectionTitle>
    <Paragraph position="0"> system for converting English text into three-dimensional graphical scenes that represent that text. WORDSEYE performs syntactic and semantic analysis on the input text, producing a description of the arrangement of objects in a scene. An image is then generated from this scene description. At the core of WORDSEYE is the notion of a &amp;quot;pose&amp;quot;, which can be loosely defined as a figure (e.g. a human figure) in a configuration suggestive of a particular action.</Paragraph>
    <Paragraph position="1"> For WORDSEYE, the NLP task is thus to map from an input sentence to a representation that the graphics engine can directly interpret in terms of poses. The graphical component can render a fixed set of situations (as determined by its designer); each situation has several actors in situation-specific poses, and each situation can be described linguistically using a given set of verbs. For example, the graphical component may have a way of depicting a commercial transaction, with two humans in particular poses (the buyer and the seller), the goods being purchased, and the payment amount. In English, we have different verbs that can be used to describe this situation (buy, sell, cost, and so on). These verbs have different mappings of their syntactic arguments to the components in the graphical representation. We assume a mapping from syntax to domain semantics, leaving to lexical semantics the question of how such a mapping is devised and derived. (For many applications, such mappings can be derived by hand, with the semantic representation an ad-hoc notation.) We show a sample of such mapping in Figure 1. Here, we assume that the graphics engine of WORDSEYE knows how to depict a TRANSACTION when some of the semantic arguments of a transaction (such as CUSTOMER, ITEM, AMOUNT) are specified.</Paragraph>
    <Paragraph position="2"> We show some sample transductions in Figure 2. In the output, syntactic constituents are bracketed. Following each argument is information about its grammatical function (&amp;quot;GF=0&amp;quot; for example) and about its semantic role (ITEM for example). If a lexical item has a semantics of its own, the semantics replaces the lexical item (this is the case for verbs), otherwise the lexical item remains in place. In the case of the transitive cost, the verbal semantics in Figure 1 specifies an implicit CUSTOMER argument. This is generated when cost is used transitively, as can be seen in Figure 2.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Mapping Tree Adjoining Grammar
</SectionTitle>
    <Paragraph position="0"> to Finite State Machines What is crucial for being able to define a mapping from words to application semantics is a very abstract notion of grammatical function: in devising such a mapping, we are not interested in how English realizes certain syntactic arguments, i.e., in the phrase structure of the verbal projection. Instead, we just want to be able to refer to syntactic functions, such as subject or indirect object. Tree Adjoining Grammar (TAG) represents the entire syntactic projection from a lexeme in its elementary structures in an elementary tree; because of this, each elementary tree can be associated with a lexical item (lexicalization, (Joshi and Schabes, 1991)). Each lexical item can be associated with one or more trees which represent the lexeme's valency; these trees are referred to as its supertags. In a derivation, substituting or adjoining the tree of one lexeme into that of another creates a direct dependency between them. The syntactic functions are labeled with integers starting with zero (to avoid discussions about names), and are retained across operations such as topicalization, dative shift and passivization.</Paragraph>
    <Paragraph position="1"> A TAG consists of a set of elementary trees of two types, initial trees and auxiliary trees. These trees are then combined using two operations, substitution and adjunction. In substitution, an initial tree is appended to a specially marked node with the same label as the initial tree's root node. In adjunction, a non-substitution node is rewritten by an auxiliary tree, which has a specially marked frontier node called the footnode.</Paragraph>
    <Paragraph position="2"> The effect is to insert the auxiliary tree into the middle of the other tree.</Paragraph>
    <Paragraph position="3"> We distinguish two types of auxiliary trees.</Paragraph>
    <Paragraph position="4"> Adjunct auxiliary trees are used for adjuncts; they have the property that the footnode is al- null ment semantics, &amp;quot;Imp&amp;quot; for implicit argument) ways a daughter node of the root node, and the label on these nodes is not, linguistically speaking, part of the projection of the lexical item of that tree. For example, an adjective will project to AdjP, but the root- and footnode of its tree will be labeled NP, since an adjective adjoins to NP.</Paragraph>
    <Paragraph position="5"> We will refer to the root- and footnode of an adjunct auxiliary tree as its passive valency structure. Note that the tree for an adjective also specifies whether it adjoins from the left (footnode on right) or right (footnode on left). Predicative auxiliary trees are projected from verbs which subcategorize for clauses. Since a verb projects to a clausal category, and has a node labeled with a clausal category on its frontier (for the argument), the resulting tree can be interpreted as an auxiliary tree, which is useful in analyzing long-distance wh-movement (Frank, 2001).</Paragraph>
    <Paragraph position="6"> To derive a finite-state transducer (FST) from a TAG, we do a depth-first traversal of each elementary tree (but excluding the passive valency structure, if present) to obtain a sequence of non-terminal nodes. For predicative auxiliary trees, we stop at the footnode. Each node becomes two states of the FST, one state representing the node on the downward traversal on the left side, the other representing the state on the upward traversal, on the right side. For leaf nodes, the two states are juxtaposed. The states are linearly connected with a0 -transitions, with the left node state of the root node the start state, and its right node state the final state (except for predicative auxiliary trees - see above). To each non-leaf state, we add one self loop transition for each tree in the grammar that can adjoin at that state from the specified direction (i.e., for a state representing a node on the downward traversal, the auxiliary tree must adjoin from the left), labeled with the tree name. For each pair of adjacent states representing a substitution node, we add transitions between them labeled with the names of the trees that can substitute there. We output the number of the grammatical function, and the argument semantics, if any is specified. For the lexical head, we transition on the head, and output the semantics if defined, or simply the lexeme otherwise. There are no other types of leaf nodes since we do not traverse the passive valency structure of adjunct auxiliary tees. At the beginning of each FST, an a0 -transition outputs an open-bracket, and at the end, an a0 -transition outputs a close-bracket. The result of this phase of the conversion is a set of FSTs, one per elementary tree of the grammar. We will refer to them as &amp;quot;elementary FSTs&amp;quot;.</Paragraph>
    <Paragraph position="7">  narrow indicates a substitution node for the nominal argument</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Constructing the Parser
</SectionTitle>
    <Paragraph position="0"> In our approach, each elementary FST describes the syntactic potential of a set of (syntactically similar) words (as explained in Section 3). There are several ways of associating words with FSTs.</Paragraph>
    <Paragraph position="1"> Since FSTs correspond directly to supertags (i.e., trees in a TAG grammar), the basic way to achieve such a mapping is to list words paired with supertags, along with the desired semantic associated with each argument position (see Figure 1). The parser can also be divided into a lexical machine which transduces words to classes, and a syntactic machine, which transduces classes to semantics. This approach has the advantage of reducing the size of the over-all machine since the syntax is factored from the lexicon.</Paragraph>
    <Paragraph position="2"> The lexical machine transduces input words to classes. To determine the mapping from word to supertag, we use the lexical probability a2a4a3a6a5a8a7a9a11a10 where a9 is the word and a12 the class. These are derived by maximum likelihood estimation from a corpus. Once we have determined for all words which classes we want to pair them with, we create a disjunctive FST for all words associated with a given supertag machine, which transduces the words to the class name. We replaces the class's FST (as determined by its associated supertag(s)) with the disjunctive head FST. The weights on the lexical transitions are the negative logarithm of the emit probabilitya2a13a3a14a9a15a7a12a16a10 (obtained in the same manner as are the lexical probabilities). null For the syntactic machine, we take each elementary tree machine which corresponds to an initial tree (i.e., a tree which need not be adjoined) and form their union. We then perform a series of iterative replacements; in each iteration, we replace each arc labeled by the name of an elementary tree machine by the lexicalized version of that tree machine. Of course, in each iteration, there are many more replacements than in the previous iteration. We use 5 rounds of iteration; obviously, the number of iterations restrict the syntactic complexity (but not the length) of recognized input. However, because we output brackets in the FSTs, we obtain a parse with full syntactic/lexical semantic (i.e., dependency) structure, not a &amp;quot;shallow parse&amp;quot;.</Paragraph>
    <Paragraph position="3"> This construction is in many ways similar to similar constructions proposed for CFGs, in particular that of (Nederhof, 2000). One difference is that, since we start from TAG, recursion is already factored, and we need not find cycles in the rules of the grammar.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML