XML Viewer - w95-0115

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/w95-0115_intro.xml
Size: 3,951 bytes
Last Modified: 2025-10-06 14:05:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0115">
  <Title>Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons</Title>
  <Section position="4" start_page="184" end_page="185" type="intro">
    <SectionTitle>
I
</SectionTitle>
    <Paragraph position="0"> The evaluation method uses a simple objective criterion rather than relying on subjective human judges. It allows many experiments to be run without concern about the cost, availability and reliability of human evaluators.</Paragraph>
    <Paragraph position="1"> The filter-based approach is designed to identify likely (source word, target word) 1 pairs, using a statistical decision procedure. Candidate word pairs are drawn from a corpus of aligned sentences: (S, T) is a candidate iff T appears in the translation of a sentence containing S. In the simplest case, the decision procedure considers M1 candidates for inclusion in the lexicon; but the new framework allows a cascade of non-statistical filters to remove inappropriate pairs fl'om consideration.</Paragraph>
    <Paragraph position="2"> Each filter is based on a particular knowledge source, and can be placed into the cascade independently of the others. The knowledge sources investigated here are:  sources have not previously been used for the task of inducing translation lexicons.</Paragraph>
    <Paragraph position="3"> The filter-based framework, together with the fully automatic evaluation method, allows easy investigation o$ the relative efficacy of cascades of each of the subsets of these four filters. As will be shown below, some filter cascades sift candidate word pMrs so well that training corpora small enough tO be hand-built can be used to induce more accurate translation lexicons than those induced from a much larger training corpus without such filters. In one evaluation, a training corpus of 500 sentence pairs processed with these knowledge sources achieved a precision of 0.54, while a training corpus of 100,000 training pairs alone achieved a precision of only 0.45. Such improvements Could not be previously obtained, because  * These knowledge sources have not been used together for this task before.</Paragraph>
    <Paragraph position="4"> * There was no way to uniformly combine the different kinds of filters.</Paragraph>
    <Paragraph position="5"> * There was no way to objectively judge lexicon precision.</Paragraph>
    <Paragraph position="6">  Table 1 provides a qualitative demonstration of how a lexicon entry gradually improves as more .E filters are applied. The table contains actual entries for the French source word &amp;quot;premier,&amp;quot; from 7best lexicons that were induced from 5000 pairs of training sentences, using different filter cascades. The baseline lexicon, induced with no filters, contains correct translations only in the first and sixth positions. The Cognate Filter disallows all candidate translations of French &amp;quot;premier&amp;quot; whenever the English cognate &amp;quot;premier&amp;quot; appears in the target English sentence. This causes English &amp;quot;premier&amp;quot; to move up to second position. The Part-of-Speech Filter realizes that &amp;quot;premier&amp;quot; can only be an adjective in French, whereas in the English Hansards it is mostly used as a noun. So, it throws out that pairing, along with several other English noun candidates, allowing &amp;quot;first&amp;quot; to move up to third position. The POS and Cognate filters reduce noise better together than separately. More of the incorrect translations are filtered out in the &amp;quot;POS &amp; COG&amp;quot; column, making room for &amp;quot;foremost.&amp;quot; Finally, the MRBD Filter narrows the list down to just the three translations of French &amp;quot;premier&amp;quot; that are appropriate in the Hansard sublanguage.</Paragraph>
    <Paragraph position="7"> 1Punctuation, numbers, etc. also count as words.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML