File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1509_intro.xml
Size: 6,960 bytes
Last Modified: 2025-10-06 14:03:18
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1509"> <Title>Lexical and Structural Biases for Function Parsing</Title> <Section position="3" start_page="0" end_page="84" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Natural language processing methods producing shallow semantic output are starting to emerge as the next step towards successful developments in natural language understanding. Incremental, robust parsing systems will be the core enabling technology for interactive, speech-based question answering and dialogue systems. In recent years, corpora annotated with semantic and function labels have seen the light (Palmer et al., 2005; Baker et al., 1998) and semantic role labelling has taken centre-stage as a challenging new task. State-of-the-art statistical parsers have not yet responded to this challenge.</Paragraph> <Paragraph position="1"> State-of-the-art statistical parsers trained on the duce trees annotated with bare phrase structure labels (Collins, 1999; Charniak, 2000). The trees of the Penn Treebank, however, are also decorated with function labels, labels that indicate the grammatical and semantic relationship of phrases to each other in the sentence. Figure 1 shows the simplified tree representation with function labels for a sample sentence from the PTB corpus (section 00) The Government's borrowing authority dropped at midnight Tuesday to 2.80 trillion from 2.87 trillion. Unlike phrase structure labels, function labels are context-dependent and encode a shallow level of phrasal and lexical semantics, as observed first in (Blaheta and Charniak, 2000). For example, while the authority in Figure 1 will always be a Noun Phrase, it could be a subject, as in the example, or an object, as in the sentence They questioned his authority, depending on its position in the sentence. To some extent, function labels overlap with semantic role labels as defined in PropBank (Palmer et al., 2005). Table 1 Treebank.</Paragraph> <Paragraph position="2"> illustrates the complete list of function labels in the Penn Treebank, partitioned into four classes. 1 Current statistical parsers do not use or output this richer information because performance of the parser usually decreases considerably, since a more complex task is being solved. (Klein and Manning, 2003), for instance report a reduction in parsing accuracy of an unlexicalised PCFG from 77.8% to 72.9% if using function labels in training. (Blaheta, 2004) also reports a decrease in performance when attempting to integrate his function labelling system with a full parser. Conversely, researchers interested in producing richer semantic outputs have concentrated on two-stage systems, where the semantic labelling task is performed on the output of a parser, in a pipeline architecture divided in several stages (Gildea and Jurafsky, 2002; Nielsen and Pradhan, 2004; Xue and Palmer, 2004). See also the common task of (CoNLL, 2004; CoNLL, 2005; Senseval, 2004), where parsing has sometimes not been used and has been replaced by chunking.</Paragraph> <Paragraph position="3"> In this paper, we present a parser that produces richer output using information available in a corpus incrementally. Specifically, the parser outputs additional labels indicating the function of a constituent in the tree, such as NP-SBJ or PP-TMP in the tree 1(Blaheta and Charniak, 2000) talk of function tags.We will instead use the term function label, to indicate function identifiers, as they can decorate any node in the tree. We keep the word tag to indicate only those labels that decorate preterminal nodes in a tree - part-of-speech tags - as is standard use. shown in Figure 1.</Paragraph> <Paragraph position="4"> Following (Blaheta and Charniak, 2000), we concentrate on syntactic and semantic function labels.</Paragraph> <Paragraph position="5"> We will ignore the other two classes, for they do not form natural classes. Like previous work, constituents that do not bear any function label will receive a NULL label. Strictly speaking, this label corresponds to two NULL labels: the SYN-NULL and the SEM-NULL. A node bearing the SYN-NULL label is a node that does not bear any other syntactic label.</Paragraph> <Paragraph position="6"> Analogously, the SEM-NULL label completes the set of semantic labels. Note that both the SYN-NULL label and the SEM-NULL are necessary, since both a syntactic and a semantic label can label a given constituent. null We present work to test the hypothesis that a current statistical parser (Henderson, 2003) can output richer information robustly, that is without any degradation of the parser's accuracy on the original parsing task, by explicitly modelling function labels as the locus where the lexical semantics of the elements in the sentence and syntactic locality domains interact. Briefly, our method consists in augmenting the parser with features and biases that capture both lexical semantics projections and structural regularities underlying the distribution of sequences of function labels in a sentence. We achieve state-of-the-art results both in parsing and function labelling. This result has several consequences.</Paragraph> <Paragraph position="7"> On the one hand, we show that it is possible to build a single integrated robust system successfully.</Paragraph> <Paragraph position="8"> This is an interesting achievement, as a task combining function labelling and parsing is more complex than simple parsing. While the function of a constituent and its structural position are often correlated, they sometimes diverge. For example, some nominal temporal modifiers occupy an object position without being objects, like Tuesday in the tree above. Moreover, given current limited availability of annotated tree banks, this more complex task will have to be solved with the same overall amount of data, aggravating the difficulty of estimating the model's parameters due to sparse data. Solving this more complex problem successfully, then, indicates that the models used are robust. Our results also provide some new insights into the discussion about the necessity of parsing for function or semantic role labelling (Gildea and Palmer, 2002; Punyakanok et al., 2005), showing that parsing is beneficial.</Paragraph> <Paragraph position="9"> On the other hand, function labelling while parsing opens the way to interactive applications that are not possible in a two-stage architecture. Because the parser produces richer output incrementally at the same time as parsing, it can be integrated in speech-based applications, as well as be used for language models. Conversely, output annotated with more informative labels, such as function or semantic labels, underlies all domain-independent question answering (Jijkoun et al., 2004) or shallow semantic interpretation systems (Collins and Miller, 1998; Ge and Mooney, 2005).</Paragraph> </Section> class="xml-element"></Paper>