File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-3019_intro.xml

Size: 1,361 bytes

Last Modified: 2025-10-06 14:02:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-3019">
  <Title>System and Applictaion</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 TANGO
</SectionTitle>
    <Paragraph position="0"> TANGO is a concordancer capable of answering users' queries on collocation use. Currently, TANGO supports two text collections: a monolingual corpus (BNC) and a bilingual corpus (SPC). The system consists of four main parts:</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Chunk and Clause Information
Integrated
</SectionTitle>
      <Paragraph position="0"> For CoNLL-2000 shared task, chunking is considered as a process that divides a sentence into syntactically correlated parts of words. With the benefits of CoNLL training data, we built a chunker that turn sentences into smaller syntactic structure of non-recursive basic phrases to facilitate precise collocation extraction. It becomes easier to identify the argument-predicate relationship by looking at adjacent chunks. By doing so, we save time as opposed to n-gram statistics or full parsing. Take a text in CoNLL-2000 for example: The words correlated with the same chunk tag can be further grouped together (see Table 1). For instance, with chunk information, we can extract</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML