File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/p06-1062_abstr.xml

Size: 1,000 bytes

Last Modified: 2025-10-06 13:45:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1062">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A DOM Tree Alignment Model for Mining Parallel Data from the Web</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper presents a new web mining scheme for parallel data acquisition.</Paragraph>
    <Paragraph position="1"> Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree alignment model is proposed to identify the translationally equivalent texts and hyperlinks between two parallel DOM trees. By tracing the identified parallel hyperlinks, parallel web documents are recursively mined. Compared with previous mining schemes, the benchmarks show that this new mining scheme improves the mining coverage, reduces mining bandwidth, and enhances the quality of mined parallel sentences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML