File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4032_intro.xml

Size: 3,208 bytes

Last Modified: 2025-10-06 14:02:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4032">
  <Title>Parsing Conversational Speech Using Enhanced Segmentation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Structured Language Model
</SectionTitle>
      <Paragraph position="0"> The SLM assigns a probability a2a4a3a6a5a8a7a10a9a12a11 to every sentence a5 and its every possible binary parse a9 . The terminals of a9 are the words of a5 with POS tags, and the nodes of a9 are annotated with phrase headwords and non-terminal labels. Let a5 be a sentence of length a13 words with added sentence boundary markers a14a16a15a18a17 &lt;s&gt; and a14a20a19a22a21a24a23a8a17 &lt;/s&gt;. Let a5a26a25a27a17a28a14a29a15a31a30a32a30a32a30a10a14a16a25 be the word a33 -prefix of the sentence -- the words from the beginning of the sentence up to the current position a33 -and a5 a25 a9 a25 the word-parse a33 -prefix. Figure 1 shows a word-parse a33 -prefix; h_0 .. h_{-m} are the exposed heads, each head being a pair (headword, non-terminal label), or (word, POS tag) in the case of a root-only tree.</Paragraph>
      <Paragraph position="1"> The exposed heads at a given position a33 in the input sentence are a function of the word-parse a33 -prefix.</Paragraph>
      <Paragraph position="3"> The joint probability a2a4a3a6a5a34a7a35a9a12a11 of a word sequence a5 and a complete parse a9 can be broken into:</Paragraph>
      <Paragraph position="5"> a14 a25 is the word predicted by the WORD-PREDICTORa78a29a84 a25 is the tag assigned to a14 a25 by the TAGGERa78a86a85  a81a87a83 is the number of operations the CONSTRUCTOR executes at sentence position a33 before passing control to the WORD-PREDICTOR (the</Paragraph>
      <Paragraph position="7"> out at position a33 in the word string; the operations performed by the CONSTRUCTOR are illustrated in Figures 2-3 and they ensure that all possible binary branching parses, with all possible headword and non-terminal label assignments for the a14 a23 a30a32a30a91a30a35a14 a25 word sequence, can ...............</Paragraph>
      <Paragraph position="9"/>
      <Paragraph position="11"> The SLM is based on three probabilities, each estimated using deleted interpolation and parameterized (approximated) as follows:</Paragraph>
      <Paragraph position="13"> Since the number of parses for a given word prefix a5 a25 grows exponentially with a33 , a4a6a5 a9 a25a8a7 a4a10a9a12a11 a3a14a13 a25 a11 , the state space of our model is huge even for relatively short sentences, so the search strategy uses pruning.</Paragraph>
      <Paragraph position="14"> Each model component is initialized from a set of parsed sentences after undergoing headword percolation and binarization. The position of the headword within a constituent is specified using a rule-based approach. Assuming the index of the headword on the right-hand side of the rule is a33 , we binarize the constituent by following one of the two binarization schemes in Figure 4. Intermediate nodes created receive the label a15a17a16</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML