File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-1060_concl.xml

Size: 2,514 bytes

Last Modified: 2025-10-06 13:54:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1060">
  <Title>Experiments in Parallel-Text Based Grammar Induction</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> This paper presented a novel approach of using parallel corpora as the only resource in the creation of a monolingual analysis tools. We believe that in order to induce high-quality tools based on statistical word alignment, the training approach for the target language tool has to be able to exploit islands of reliable information in a stream of potentially rather noisy data. We experimented with an initial idea to address this task, which is conceptually simple and can be implemented building on existing technology: using the notion of word blocks projected (as compared to 23.5 for the standard PCFG).</Paragraph>
    <Paragraph position="1">  by word alignment as an indication for (mainly) impossible string spans. Applying this information in order to impose weighting factors on the EM algorithm for PCFG induction gives us a first, simple instance of the &amp;quot;island-exploiting&amp;quot; system we think is needed. More sophisticated models may make use some of the experience gathered in these experiments. null The conservative way in which cross-linguistic relations between phrase structure is exploited has the advantage that we don't have to make unwarranted assumptions about direct correspondences among the majority of constituent spans, or even direct correspondences of phrasal categories. The technique is particularly well-suited for the exploitation of parallel corpora involving multiple lan- null based on the NN tag guages like the Europarl corpus. Note that nothing in our methodology made any language particular assumptions; future research has to show whether there are language pairs that are particularly effective, but in general the technique should be applicable for whatever parallel corpus is at hand.</Paragraph>
    <Paragraph position="2"> A number of studies are related to the work we presented, most specifically work on parallel-text based &amp;quot;information projection&amp;quot; for parsing (Hwa et al., 2002), but also grammar induction work based on constituent/distituent information (Klein and Manning, 2002) and (language-internal) alignment-based learning (van Zaanen, 2000). However to our knowledge the specific way of bringing these aspects together is new.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML