File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2157_intro.xml
Size: 3,043 bytes
Last Modified: 2025-10-06 14:06:39
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2157"> <Title>Prefix Probabilities from Stochastic Tree Adjoining Grammars*</Title> <Section position="4" start_page="953" end_page="953" type="intro"> <SectionTitle> 2 Notation </SectionTitle> <Paragraph position="0"> A stochastic Tree Adjoining Grammar (STAG) is represented by a tuple (NT, E,:T, .A, C/) where NT is a set of nonterminal symbols, E is a set of terminal symbols, 2: is a set of initial trees and .A is a set of auxiliary trees. Trees in :TU.A are also called elementary trees.</Paragraph> <Paragraph position="1"> We refer to the root of an elementary tree t as Rt. Each auxiliary tree has exactly one distinguished leaf, which is called the foot. We refer to the foot of an auxiliary tree t as Ft. We let V denote the set of all nodes in the elementary trees.</Paragraph> <Paragraph position="2"> For each leaf N in an elementary tree, except when it is a foot, we define label(N) to be the label of the node, which is either a terminal from E or the empty string e. For each other node N, label(N) is an element from NT.</Paragraph> <Paragraph position="3"> At a node N in a tree such that label(N) * NT an operation called adjunction can be applied, which excises the tree at N and inserts an auxiliary tree.</Paragraph> <Paragraph position="4"> Function C/ assigns a probability to each adjunction. The probability of adjunction of t * A at node N is denoted by C/(t, N). The probability that at N no adjunction is applied is denoted by C/(nil, N). We assume that each STAG G that we consider is proper. That is, for each</Paragraph> <Paragraph position="6"> For each non-leaAf node N we construct the string cdn(N) = N1... Nm from the (ordered) list of children nodes N1,...,Nm by defining, for each d such that 1 < d < m, Nd = label(Nd) in case label(Nd) * E U {e}, and N d = Nd otherwise. In other words, children nodes are replaced by their labels unless the labels are non-terminal symbols.</Paragraph> <Paragraph position="7"> To simplify the exposition, we assume an additional node for each auxiliary tree t, which we denote by 3_. This is the unique child of the actual foot node Ft. That is, we change the definition of cdn such that cdn(Ft) = 2_ for each auxiliary tree t. We set V +- = {N e V I label(N) * NT} U E U {3_}.</Paragraph> <Paragraph position="8"> We use symbols a,b,c,.., to range over E, symbols v,w,x,.., to range over E*, symbols N, M,... to range over V +-, and symbols ~, fl, 7,... to range over (V+-) *. We use t, t',... to denote trees in 2: U ,4 or subtrees thereof.</Paragraph> <Paragraph position="9"> We define the predicate dft on elements from V +- as dft(N) if and only if (i) N E V and N dominates 3_, or (ii) N = 3_. We extend dft to strings of the form N1...Nm E (V+-) * by defining dft(N1... Nm) if and only if there is a d (1 < d < m) such that dft(Nd).</Paragraph> <Paragraph position="10"> For some logical expression p, we define 5(p) = 1 iff p is true, 5(p) = 0 otherwise.</Paragraph> </Section> class="xml-element"></Paper>