File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/p92-1017_evalu.xml
Size: 7,706 bytes
Last Modified: 2025-10-06 14:00:07
<?xml version="1.0" standalone="yes"?> <Paper uid="P92-1017"> <Title>eling for speech recognition. In Sadaoki Furui and M. Mohan Sondhi, editors, Advances in Speech</Title> <Section position="6" start_page="131" end_page="133" type="evalu"> <SectionTitle> 4. EXPERIMENTAL EVALUATION </SectionTitle> <Paragraph position="0"> The following experiments, although preliminary, give some support to our earlier suggested advantages of the inside-outside algorithm for partially bracketed corpora.</Paragraph> <Paragraph position="1"> The first experiment involves an artificial example used by Lari and Young (1990) in a previous evaluation of the inside-outside algorithm. In this case, training on a bracketed corpus can lead to a good solution while no reasonable solution is found training on raw text only.</Paragraph> <Paragraph position="2"> The second experiment uses a naturally occurring corpus and its partially bracketed version provided by the Penn Treebank (Brill et al., 1990). We compare the bracketings assigned by grammars inferred from raw and from bracketed training material with the Penn Treebank bracketings of a separate test set.</Paragraph> <Paragraph position="3"> To evaluate objectively the accuracy of the analyses yielded by a grammar G, we use a Viterbi-style parser to find the most likely analysis of each test sentence according to G, and define the bracketing accuracy of the grammar as the proportion of phrases in those analyses that are compatible in the sense defined in Section 2 with the tree bank bracketings of the test set. This criterion is closely related to the &quot;crossing parentheses&quot; score of Black et al. (1991). 1 In describing the experiments, we use the notation GR for the grammar estimated by the original inside-outside algorithm, and GB for the grammar estimated by the bracketed algorithm.</Paragraph> <Paragraph position="4"> 4.1. Inferring the Palindrome Language null We consider first an artificial language discussed by Lari and Young (1990). Our training corpus consists of 100 sentences in the palindrome language L over two symbols a and b decisions by leaving out brackets (thus making flatter parse trees), as hunmn annotators often do. Therefore, the recall component in Black et aL's figure of merit for parser is not needed.</Paragraph> <Paragraph position="5"> The initial grammar consists of all possible CNF rules over five nonterminals and the terminals a and b (135 rules), with random rule probabilities. As shown in Figure 1, with an unbracketed training set W the cross-entropy estimate H(W, GR) remains almost unchanged after 40 iterations (from 1.57 to 1.44) and no useful solution is found.</Paragraph> <Paragraph position="6"> In contrast, with a fully bracketed version C of the same training set, the cross-entropy estimate /~(W, GB) decreases rapidly (1.57 initially, 0.88 after 21 iterations). Similarly, the cross-entropy estimate H(C, GB) of the bracketed text with respect to the grammar improves rapidly (2.85 initially, 0.89 after 21 iterations).</Paragraph> <Paragraph position="7"> The inferred grammar models correctly the palindrome language. Its high probability rules (p > which is a close to optimal CNF CFG for the palindrome language.</Paragraph> <Paragraph position="8"> The results on this grammar are quite sensitive to the size and statistics of the training corpus and the initial rule probability assignment. In fact, for a couple of choices of initial grammar and corpus, the original algorithm produces grammars with somewhat better cross-entropy estimates than those yielded by the new one. However, in every case the bracketing accuracy on a separate test set for the result of bracketed training is above 90% (100% in several cases), in contrast to bracketing accuracies ranging between 15% and 69% for unbracketed training.</Paragraph> <Paragraph position="9"> 4.2. Experiments on the ATIS Corpus null For our main experiment, we used part-of-speech sequences of spoken-language transcriptions in the Texas Instruments subset of the Air Travel Information System (ATIS) corpus (Hemphill et el., 1990), and a bracketing of those sequences derived from the parse trees for that subset in the Penn Treebank.</Paragraph> <Paragraph position="10"> Out of the 770 bracketed sentences (7812 words) in the corpus, we used 700 as a training set C and 70 (901 words) as a test set T. The following is an example training string</Paragraph> <Paragraph position="12"> corresponding to the parsed sentence (((List (the fares (for ((flight) (number 891)))))) .) The initial grammar consists of all 4095 possible CNF rules over 15 nonterminals (the same number as in the tree bank) and 48 terminal symbols for part-of-speech tags.</Paragraph> <Paragraph position="13"> A random initial grammar was trained separately on the unbracketed and bracketed versions of the training corpus, yielding grammars GR and GB. faster than the/:/(W, GR), although eventually the two stabilize at very close values: after 75 iterations, /I(W, GB) ~ 2.97 and /:/(W, GR) ~ 2.95. However, the analyses assigned by the resulting grammars to the test set are drastically different. pus With the training and test data described above, the bracketing accuracy of GR after 75 iterations was only 37.35%, in contrast to 90.36% bracketing accuracy for GB. Plotting bracketing accuracy against iterations (Figure 3), we see that unbracketed training does not on the whole improve accuracy. On the other hand, bracketed training steadily improves accuracy, although not monotonically. null It is also interesting to look at some the differences between GR and GB, as seen from the most likely analyses they assign to certain sentences. Table 2 shows two bracketed test sentences followed by their most likely GR and GB analyses, given for readability in terms of the original words rather than part-of-speech tags.</Paragraph> <Paragraph position="14"> For test sentence (A), the only GB constituent not compatible with the tree bank bracketing is (Delta flight number), although the constituent (the cheapest) is linguistically wrong. The appearance of this constituent can be explained by lack of information in the tree bank about the internal structure of noun phrases, as exemplified by tree bank bracketing of the same sentence. In contrast, the GR analysis of the same string contains 16 constituents incompatible with the tree bank.</Paragraph> <Paragraph position="15"> For test sentence (B), the G~ analysis is fully compatible with the tree bank. However, the Grt analysis has nine incompatible constituents, which for tion in a lowest-level constituent. Since final punctuation is quite often preceded by a noun, a grammar inferred from raw text will tend to bracket the noun with the punctuation mark.</Paragraph> <Paragraph position="16"> This experiment illustrates the fact that although SCFGs provide a hierarchical model of the language, that structure is undetermined by raw text and only by chance will the inferred grammar agree with qualitative linguistic judgments of sentence structure. This problem has also been previously observed with linguistic structure inference methods based on mutual information. Materman and Marcus (1990) addressed the problem by specifying a predetermined list of pairs of parts of speech (such as verb-preposition, pronoun-verb) that can never be embraced by a low-level constituent. However, these constraints are stipulated in advance rather than being automatically derived from the training material, in contrast with what we have shown to be possible with the inside-outside algorithm for partially bracketed corpora.</Paragraph> </Section> class="xml-element"></Paper>