File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1063_metho.xml
Size: 5,430 bytes
Last Modified: 2025-10-06 14:09:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1063"> <Title>Discriminative Syntactic Language Modeling for Speech Recognition</Title> <Section position="5" start_page="509" end_page="511" type="metho"> <SectionTitle> 3 Parse Tree Features </SectionTitle> <Paragraph position="0"> We tagged each candidate transcription with (1) part-of-speech tags, using the tagger documented in Collins (2002); and (2) a full parse tree, using the parser documented in Collins (1999). The models for both of these were trained on the Switchboard treebank, and applied to candidate transcriptions in both the training and test sets. Each transcription received one POS-tag annotation and one parse tree annotation, from which features were extracted.</Paragraph> <Paragraph position="1"> Figure 2 shows a Penn Treebank style parse tree that is of the sort produced by the parser. Given such a structure, there is a tremendous amount of flexibility in selecting features. The first approach that we follow is to map each parse tree to sequences encoding part-of-speech (POS) decisions, and &quot;shallow&quot; parsing decisions. Similar representations have been used by (Rosenfeld et al., 2001; Wang and Harper, 2002). Figure 3 shows the sequential representations that we used. The first simply makes use of the POS tags for each word. The latter representations make use of sequences of non-terminals associated with lexical items. In 3(b), each word in the string is associated with the beginning or continuation of a shallow phrase or &quot;chunk&quot; in the tree. We include any non-terminals above the level of POS tags as potential chunks: a new &quot;chunk&quot; (VP, NP, PP etc.) begins whenever we see the initial word of the phrase dominated by the non-terminal. In 3(c), we show how POS tags can be added to these sequences. The final type of sequence mapping, shown in 3(d), makes a similar use of chunks, but preserves only the head-word seen with each chunk.3 From these sequences of categories, various features can be extracted, to go along with the n-gram features used in the baseline. These include n-tag features, e.g. ti[?]2ti[?]1ti (where ti represents the 3It should be noted that for a very small percentage of hypotheses, the parser failed to return a full parse tree. At the end of every shallow tag or category sequence, a special end of sequence tag/word pair &quot;</parse> </parse>&quot; was emitted. In contrast, when a parse failed, the sequence consisted of solely &quot;<noparse> <noparse>&quot;.</Paragraph> <Paragraph position="2"> (d) Shallow category with lexical head sequence tag in position i); and composite tag/word features, e.g. tiwi (where wi represents the word in position i) or, more complicated configurations, such as ti[?]2ti[?]1wi[?]1tiwi. These features can be extracted from whatever sort of tag/word sequence we provide for feature extraction, e.g. POS-tag sequences or shallow parse tag sequences.</Paragraph> <Paragraph position="3"> One variant that we performed in feature extraction had to do with how speech repairs (identified as EDITED constituents in the Switchboard style parse trees) and filled pauses or interjections (labeled with the INTJ label) were dealt with. In the simplest version, these are simply treated like other constituents in the parse tree. However, these can disrupt what may be termed the intended sequence of syntactic categories in the utterance, so we also tried skipping these constituents when mapping from the parse tree to shallow parse sequences.</Paragraph> <Paragraph position="4"> The second set of features we employed made use of the full parse tree when extracting features.</Paragraph> <Paragraph position="5"> For this paper, we examined several features templates of this type. First, we considered context-free rule instances, extracted from each local node in the tree. Second, we considered features based on lexical heads within the tree. Let us first distinguish between POS-tags and non-POS non-terminal categories by calling these latter constituents NTs. For each constituent NT in the tree, there is an associated lexical head (HNT) and the POS-tag of that lexical head (HPNT). Two simple features are NT/HNT and NT/HPNT for every NT constituent in the tree.</Paragraph> <Paragraph position="6"> are derived from the tree in figure 2.</Paragraph> <Paragraph position="7"> Using the heads as identified in the parser, example features from the tree in figure 2 would be S/VBD, S/helped, NP/NN, and NP/house.</Paragraph> <Paragraph position="8"> Beyond these constituent/head features, we can look at the head-to-head dependencies of the sort used by the parser. Consider each local tree, consisting of a parent node (P), a head child (HCP), and k non-head children (C1 . . . Ck). For each non-head child Ci, it is either to the left or right of HCP, and is either adjacent or non-adjacent to HCP. We denote these positional features as an integer, positive if to the right, negative if to the left, 1 if adjacent, and 2 if non-adjacent. Table 1 shows four head-to-head features that can be extracted for each non-head child Ci. These features include dependencies between pairs of lexical items, between a single lexical item and the part-of-speech of another item, and between pairs of part-of-speech tags in the parse.</Paragraph> </Section> class="xml-element"></Paper>