File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1309_metho.xml

Size: 8,755 bytes

Last Modified: 2025-10-06 14:07:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1309">
  <Title>Error-driven HMM-based Chunk Tagger with Context-dependent Lexicon</Title>
  <Section position="2" start_page="72" end_page="73" type="metho">
    <SectionTitle>
NULL \[NP He/PRP\] \[VP reckons/VBZ\] \[ NP
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> the corresponding structural relations between two adjacent input tokens are:</Paragraph>
    <Paragraph position="4"> Compared with the B-Chunk and I-Chunk used in Ramshaw and Marcus(1995), structural relations 99 and 90 correspond to B-Chunk which represents the first word of the chunk, and structural relations 00 and 09 correspond to I-Chunk which represnts each other in the chunk while 90 also means the beginning of the sentence and 09 means the end of the sentence.</Paragraph>
    <Paragraph position="5"> 2)Phrase category. This is used to identify the phrase categories of input tokens.</Paragraph>
    <Paragraph position="6"> 3)Part-of-speech. Because of the limited number of structural relations and phrase categories, the part-of-speech is added into the structural tag to represent more accurate models. For the above chunk tagged sentence, the structural tags for all the corresponding input</Paragraph>
    <Paragraph position="8"> current part-of-speech is used as a lexical entry to determine the current structural chunk tag.</Paragraph>
    <Paragraph position="9"> Here, we define: * * is the list of lexical entries in the chunking lexicon,  * \[ @ \[ is the number of lexical entries(the size of the chunking lexicon) * C is the training data.</Paragraph>
    <Paragraph position="10"> For the baseline system, we have : * @={pi,p~3C}, where Pi is a part-of- null speech existing in the tra\]Lning data C * \]@ \[=48 (the number of part-of-speech tags in the training data).</Paragraph>
    <Paragraph position="11"> Table 1 gives an overview of the results of the chunking experiments. For convenience, precision, recall and F#_ 1 values are given seperately for the chunk types NP, VP, ADJP,</Paragraph>
  </Section>
  <Section position="3" start_page="73" end_page="75" type="metho">
    <SectionTitle>
3 Context-dependent Lexicons
</SectionTitle>
    <Paragraph position="0"> In the last section, we only use current part-of-speech as a lexical entry. In this section, we will attempt to add more contextual information to approximate P(t i/G~). This can be done by adding lexical entries with more contextual information into the lexicon ~. In the following, we will discuss five context-dependent lexicons which consider different contextual information.</Paragraph>
    <Section position="1" start_page="73" end_page="73" type="sub_section">
      <SectionTitle>
3.1 Context of current part-of-speech and
</SectionTitle>
      <Paragraph position="0"> current word Here, we assume:</Paragraph>
      <Paragraph position="2"> part-of-speech and word pair existing in the training data C.</Paragraph>
      <Paragraph position="3"> In this case, the current part-of-speech and word pair is also used as a lexical entry to determine the current structural chunk tag and we have a total of about 49563 lexical entries(\[ * \]=49563). Actually, the lexicon used here can be regarded as context-independent.</Paragraph>
      <Paragraph position="4"> The reason we discuss it in this section is to distinguish it from the context-independent lexicon used in the baseline system. Table 2 give an overview of the results of the chunking experiments on the test data.</Paragraph>
      <Paragraph position="5">  Table 2 shows that incorporation of current word information improves the overall F~=~ value by 2.9%(especially for the ADJP, ADVP and PP chunks), compared with Table 1 of the baseline system which only uses current part-of-speech information. This result suggests that current word information plays a very important role in determining the current chunk tag.</Paragraph>
    </Section>
    <Section position="2" start_page="73" end_page="74" type="sub_section">
      <SectionTitle>
3.2 Context of previous part-of-speech and
</SectionTitle>
      <Paragraph position="0"> current part-of-speech Here, we assume :</Paragraph>
      <Paragraph position="2"> is a pair of previous part-of-speech and current part-of-speech existing in the training data C.</Paragraph>
      <Paragraph position="3"> In this case, the previous part-of-speech and current part-of-speech pair is also used as a lexical entry to determine the current structural chunk tag and we have a total of about 1411 lexical entries(l~\]=1411). Table 3 give an overview of the results of the chunking experiments.</Paragraph>
      <Paragraph position="4">  system, Table 3 shows that additional contextual information of previous part-of-speech improves the overall F/~_~ value by 0.5%. Especially, F/3_ ~ value for VP improves by 1.25%, which indicates that previous part-of-speech information has a important role in determining the chunk type VP. Table 3 also shows that the recall rate for chunk type ADJP decrease by 3.7%. It indicates that additional previous part-of-speech information makes ADJP chunks easier to merge with neibghbouring chunks.</Paragraph>
    </Section>
    <Section position="3" start_page="74" end_page="74" type="sub_section">
      <SectionTitle>
3.3 Context of previous part-of-speech,
</SectionTitle>
      <Paragraph position="0"> previous word and current part-of-speech Here, we assume :</Paragraph>
      <Paragraph position="2"> where pi_lwi_lp~ is a triple pattern existing in the training corpus.</Paragraph>
      <Paragraph position="3"> In this case, the previous part-of-speech, previous word and current part-of-speech triple is also used as a lexical entry to determine the current structural chunk tag and } * 1=136164. Table 4 gives the results of the chunking experiments. Compared with Table 1 of the baseline system, Table 4 shows that additional 136116 new lexical entries of format Pi-lw~-lPi improves the overall F#= l value by 3.3%. Compared with Table 3 of the extended system 2.2 which uses previous part-of-speech and current part-of-speech as a lexical entry, Table 4 shows that additional contextual information of previous word improves the</Paragraph>
    </Section>
    <Section position="4" start_page="74" end_page="74" type="sub_section">
      <SectionTitle>
3.4 Context of previous part-of-speech, current
</SectionTitle>
      <Paragraph position="0"> part-of-speech and current word Here, we assume :</Paragraph>
      <Paragraph position="2"> where pi_lpiw~ is a triple pattern existing in the training and \] * \[=131416.</Paragraph>
      <Paragraph position="3"> Table 5 gives the results of the chunking experiments.</Paragraph>
      <Paragraph position="4">  Compared with Table 2 of the extended system which uses current part-of-speech and current word as a lexical entry, Table 5 shows that additional contextual information of previous part-of-speech improves the overall Fa= 1 value by 1.8%.</Paragraph>
    </Section>
    <Section position="5" start_page="74" end_page="75" type="sub_section">
      <SectionTitle>
3.5 Context of previous part-of-speech,
</SectionTitle>
      <Paragraph position="0"> previous word, current part-of-speech and current word Here, the context of previous part-of-speech, current part-of-speech and current word is used as a lexical entry to determine the current  structural chunk tag and qb = {Pi-l wi-lPiWi, Pi-lwi-~piwi 36'} + {Pi, Pi3C} , where p~_lWi_~P~W~ is a pattern existing in the training corpus. Due to memory limitation, only lexical entries which occurs :more than 1 times are kept. Out of 364365 possible lexical entries existing in the training data, 98489 are kept(</Paragraph>
      <Paragraph position="2"> Compared with Table 2 of the extended system which uses current part-of-speech and current word as a lexical entry, Table 6 shows that additional contextual information of previous part-of-speech improves the overall Ft3=l value by 1.8%.</Paragraph>
    </Section>
    <Section position="6" start_page="75" end_page="75" type="sub_section">
      <SectionTitle>
3.6 Conclusion
</SectionTitle>
      <Paragraph position="0"> Above experiments shows that adding more contextual information into lexicon significantly improves the chunking accuracy. However, this improvement is gained at the expense of a very large lexicon and we fred it difficult to merge all the above context-dependent lexicons in a single lexicon to further improve the chunking accurracy because of memory limitation. In order to reduce the size of lexicon effectively, an error-driven learning approach is adopted to examine the effectiveness of lexical entries and make it possible to further improve the chunking accuracy by merging all the above context-dependent lexicons in a single lexicon.</Paragraph>
      <Paragraph position="1"> This will be discussed in the next section.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML