File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1060_intro.xml
Size: 4,145 bytes
Last Modified: 2025-10-06 14:01:11
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1060"> <Title>Parse Forest Computation of Expected Governors</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A labeled headed tree is one in which each non-terminal vertex has a distinguished head child, and in the usual way non-terminal nodes are labeled with non-terminal symbols (syntactic categories such as NP) and terminal vertices are labeled with terminal symbols (words such as The governor algorithm was designed and implemented in the Reading Comprehension research group in the 2000 Workshop on Language Engineering at Johns Hopkins University. Thanks to Marc Light, Ellen Riloff, Pranav Anand, Brianne Brown, Eric Breck, Gideon Mann, and Mike Thelen for discussion and assistance. Oral presentations were made at that workshop in August 2000, and at the University of Sussex in January 2001. Thanks to Fred Jelinek, John Carroll, and other members of the audiences for their comments. reads).1 We work with syntactic trees in which terminals are in addition labeled with uninflected word forms (lemmas) derived from the lexicon.</Paragraph> <Paragraph position="1"> By percolating lemmas up the chains of heads, each node in a headed tree may be labeled with a lexical head. Figure 1 is an example, where lexical heads are written as subscripts. We use the notation a22a24a23a26a25a28a27 for the lexical head of a vertex a25 , and a29 a23a26a25a28a27 for the ordinary category or word label of a25 .</Paragraph> <Paragraph position="2"> The governor label for a terminal vertex a25 in such a labeled tree is a triple which represents the syntactic and lexical environment at the top of the chain of vertices headed by a25 . Where a30 is the maximal vertex of which a25 is a head vertex, and a30a32a31 is the parent of a30 , the governor label for a25 1Headed trees may be constructed as tree domains, which are sets of addresses of vertices. 0 is used as the relative address of the head vertex, negative integers are used as relative addresses of child vertices before the head, and positive integers are used as relative addresses of child vertices after the head. A headed tree domain is a set of finite sequences of integers a33 such that (i) if a34a36a35a36a37a38a33 , then a34a39a37a38a33 ; (ii) if a34a36a40a41a37a38a33 and tree of Figure 1. For the head of the sentence, special symbols startc and startw are used as the parent category and parent lexical governor.</Paragraph> <Paragraph position="3"> is the tuple a49a38a29 a23 a30 a27a19a51 a29 a23 a30a32a31 a27a19a51a6a22a24a23 a30a52a31 a27 a50 .2 Governor labels for the example tree are given in Figure 2.</Paragraph> <Paragraph position="4"> As observed in Chomsky (1965), grammatical relations such as subject and object may be reconstructed as ordered pairs of category labels, such as a49 NP,Sa50 for subject. So, a governor label encodes a grammatical relation and a governing lexical head.</Paragraph> <Paragraph position="5"> Given a unique tree structure for a sentence, governor markup may be read off the tree. However, in view of the fact that robust broad coverage parsers frequently deliver thousands, millions, or thousands of millions of analyses for sentences of free text, basing annotation on a unique tree (such as the most probable tree analysis generated by a probabilistic grammar) appears arbitrary.</Paragraph> <Paragraph position="6"> Note that different trees may produce the same governor labels for a given terminal position.</Paragraph> <Paragraph position="7"> Suppose for instance that the yield of the tree in Figure 1 has a different tree analysis in which the PP is a child of the VP, rather than NP. In this case, just as in the original tree, the label for the fourth terminal position (with word label paper) is a49 NP,VP,reada50 . Supposing that there are only two tree analyses, this label can be assigned to the fourth word with certainty, in the face of syntactic ambiguity. The algorithm we will define pools governor labels in this way.</Paragraph> </Section> class="xml-element"></Paper>