File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/a00-2014_evalu.xml

Size: 12,164 bytes

Last Modified: 2025-10-06 13:58:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-2014">
  <Title>The Effectiveness of Corpus-Induced Dependency Grammars for Post-processing Speech*</Title>
  <Section position="5" start_page="104" end_page="107" type="evalu">
    <SectionTitle>
4 Evaluation Using the Naval
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="104" end_page="107" type="sub_section">
      <SectionTitle>
Resource Management Domain
</SectionTitle>
      <Paragraph position="0"> An experiment was conducted to determine the plausibility and the benefits of extracting CDG constraints from a domain-specific corpus of sentences.</Paragraph>
      <Paragraph position="1"> For our speech application, the ideal CDG should be general enough to cover sentences similar to those that appear in the corpus while being restrictive enough to eliminate sentences that are implausible given the observed sentences. Hence, we investigate whether a grammar extracted from annotated sentences in a corpus achieves this precision of coverage. We also examine whether a learned grammar has the ability to filter out incorrect sentence hypotheses produced by the HMM component of our system in Figure 1. To investigate these issues, we have performed an experiment using the standard  1990) corpora. These mid-size speech corpora have a vocabulary of 991 words and contain utterances of sentences derived from sentence templates based on interviews with naval personnel familiar with naval resource management tasks. They were chosen for several reasons: they are two existing speech corpora from the same domain; their manageable sizes make them a good platform for the development of techniques that require extensive experimentation; and the sentences have both syntactic variety and reasonably rich semantics. RM contains 5,190 separate utterances (3,990 testing, 1,200 training) of 2,845 distinct sentences (2,245 training, 600 testing). We have extracted several types of CDGs from annotations of the RM sentences and tested their generality using the 7,396 sentences in RM2 (out of the 8,173) that are in the resource management domain but are distinct from the RM sentences. We compare these CDGs to each other and to the conventional CDG described previously.</Paragraph>
      <Paragraph position="2"> The corpus-based CDGs were created by extracting the allowable grammar relationships from the RM sentences that were annotated by language experts using the SENATOR annotation tool, a CGI (Common Gateway Interace) HTML script written in GNU C++ version 2.8.1 (White, 2000). We tested two major CDG variations: those derived directly from the RM sentences (Sentence CDGs) and those derived from simple template-expanded RM sentences (Template CDGs). For example, &amp;quot;List MIDPAC's deployments during (date)&amp;quot; is a sentence containing a date template which allows any date representations. For these experiments, we focused on templates for dates, years, times, numbers, and latitude and longitude coordinates. Each template name identifies a sub-grammar which was produced by annotating the appropriate strings. We then annotated sentences containing the template names as if they were regular sentences. Approximately 25% of the 2,845 RM sentences were expanded with one or more templates.</Paragraph>
      <Paragraph position="3"> Although annotating a corpus of sentences can be a labor intensive task, we used an iterative approach that is based on parsing using grammars with varying degrees of restrictiveness. A grammar can be made less restrictive by ignoring: * lexical information associated with a role value's modifiee in the ARVPs, o feature information of two role values in an ARVP not directly related based on their modifiee relations, null . syntactic information provided by two role values that are not directly related, * specific feature information (e.g., semantics or subcategorization).</Paragraph>
      <Paragraph position="4"> Initially, we bootstrapped the grammar by annotating a 200 sentence subset of the RM corpus and extracting a fairly general grammar from the annotations. Then using increasingly restrictive grammars at each iteration, we used the current grammar to identify sentences that required annotation and verified the parse information for sentences that succeeded. This iterative technique reduced the time required to build a CDG from about one year for the conventional CDG to around two months (White, 2000).</Paragraph>
      <Paragraph position="5"> Several methods of extracting an ARV/ARVP grammar from sentences or template-extended sentences were investigated. The ARVPs are extracted differently for each method; whereas, the ARVs are extracted in the same manner regardless of the method. Recall that ARVs represent the set of observed role value assignments. In our implementation, each ARV includes: the label of the role value, the role to which the role value was assigned, the lexical category and feature values of the word containing the role, the relative position of the word and the role value's modifiee, and the modifiee's lexical category and feature values (modifiee constraints).</Paragraph>
      <Paragraph position="6"> We use modifiee constraints for ARVs regardless of extraction method because their use does not change the coverage of the extracted grammar and not using the information would significantly slow the parser (Harper et al., 1999a). Because the ARVP space is larger than the ARV space, we investigate six variations for extracting the pairs:  1. Full Mod: contains all grammar and feature value information for all pairs of role values from annotated sentences, as well as modifiee constraints. For a role value pair in a sentence to be considered valid during parsing with this grammar, it must match an ARVP extracted from the annotated sentences.</Paragraph>
      <Paragraph position="7"> 2. Full: like Full Mod except it does not impose modifiee constraints on a pair of role values during parsing.</Paragraph>
      <Paragraph position="8"> 3. Feature Mod: contains all grammar relations between all pairs of role values, but it consid null ers feature and modifiee constraints only for pairs that are directly related by a modifiee link. During parsing, if a role value pair is related by a modifiee link, then a corresponding ARVP with full feature and modifiee information must appear in the grammar for it to be allowed. If the pair is not directly related, then an ARVP must be stored for the grammar relations, ignoring feature and modifiee constraint information.</Paragraph>
      <Paragraph position="9">  4. Feature: like Feature Mod except it does not impose modifiee constraints on a pair of role values during parsing.</Paragraph>
      <Paragraph position="10"> 5. Direct Mod: stores only the grammar, feature,  and modifiee information for those pairs of role</Paragraph>
      <Paragraph position="12"> values that are directly related by a modifiee link.</Paragraph>
      <Paragraph position="13"> During parsing, if a role value pair is related by such a link, then a corresponding ARVP must appear in the grammar for it to be allowed. Any pair of role values not related by a modifiee link  is allowed (an open-world assumption).</Paragraph>
      <Paragraph position="14"> 6. Direct: like Direct Mod except it does not im null pose modifiee constraints on a pair of role values during parsing.</Paragraph>
      <Paragraph position="15"> Grammar sizes for these six grammars, extracted either directly from the 2,845 sentences or from the 2,845 sentences expanded with our sub-grammar templates, appear in Table 1. The largest grammars were derived using the Full Mod extraction method, with a fairly dramatic growth resulting from processing template-expanded sentences. The Feature and Direct variations are more manageable in size, even those derived from template-expanded sentences.</Paragraph>
      <Paragraph position="16"> Size is not the only important consideration for a grammar. Other important issues are grammar generality and the impact of the grammar on the accuracy of selecting the correct sentence from the recognition lattice of a spoken utterance. After extracting the CDG grammars from the RM sentences and template-expanded sentences, we tested the generality of the extracted grammars by using each grammar to parse the 7,396 RM2 sentences.</Paragraph>
      <Paragraph position="17"> See the results in Table 2. The grammar with the greatest generality was the conventional CDG for the RM corpus; however, this grammar also has the unfortunate attribute of being quite ambiguous. The most generalizable of extracted grammars uses the Direct method on template-expanded sentences. In all cases, the template-expanded sentence grammars gave better coverage than their corresponding sentence-only grammars.</Paragraph>
      <Paragraph position="18"> We have also used the extracted grammars to post-process word graphs created by the word graph compression algorithm of (Johnson and Harper, 1999) for the test utterances in the RM corpus. As was reported in (Johnson and Harper, 1999), the word-error rate of our HMM recognizer with an embedded word pair language model on the RM test set of 1200 utterances was 5.0%, the 1-best sentence accuracy was 72.1%, and the word graph coverage accuracy was 95.1%. Also, the average uncompressed word graph size was 75.15 nodes, and our compression algorithm resulted in a average word graph size of 28.62 word nodes. When parsing the word graph, the probability associated with a word node can either represent its acoustic score or a combination of its acoustic and stochastic grammar score. We use the acoustic score because (Johnson and Harper, 1999) showed that by using a word node's acoustic score alone when extracting the top sentence candidate after parsing gave a 4% higher sentence accuracy. null For the parsing experiments, we processed the 1,080 word graphs produced for the RM test set that contained 50 or fewer word nodes after compression (out of 1,200 total) in order to efficiently compare the 12 ARV/ARVP CDG grammars and the conventional CDG (the larger word graphs require significant time and space to parse using the conventional CDG). These 1,080 word graphs contain 24.95 word nodes on average with a standard deviation (SD) of 10.80, and result in 1-best sentence accuracy was 75% before parsing. The number of role values prior to binary constraint propagation differ across the grammars with an average (and SD) for the conventional grammar of 504.99 (442.00), for the sentence-only grammars of 133.37 (119.48), and for the template-expanded grammars of 157.87 (145.16). Table 3 shows the word graph parsing speed and the path, node, and role value (RV) ambiguity after parsing; Table 4 shows the sentence accuracy and the accuracy and percent correct for words. Note that percent correct words is calculated using N-D-S and word accuracy using N N-D-S-I where N is the number of words, D is N the number of deletions, S is the number of substitutions, and I is the number of insertions.</Paragraph>
      <Paragraph position="19"> The most selective RM sentence grammar, Full Mod, achieves the highest sentence accuracy, but at a cost of a greater average parsing time than the other RM sentence grammars. Higher accu- null of 50 or fewer word nodes produced for the RM test set using the 13 CDGs. racy appears to be correlated with the ability of the constraints to eliminate word nodes from the word graph during parsing. The least restrictive sentence grammar, Direct, is less accurate than the other sentence grammars and offers an intermediate speed of parsing, most likely due to the increased ambiguity in the parsing space. The fastest grammar was the Feature-Mod grammar, which also offers an intermediate level of accuracy. Its size (even with templates), restrictiveness, and speed make it very attractive. The template versions of each grammar showed a slight increase in average parse times (from processing a larger number of role values) and a slight decrease in parsing accuracy. The conventional grammar was the least competitive of the grammars both in speed and in accuracy.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML