File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-3019_intro.xml
Size: 1,361 bytes
Last Modified: 2025-10-06 14:02:30
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-3019"> <Title>System and Applictaion</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 TANGO </SectionTitle> <Paragraph position="0"> TANGO is a concordancer capable of answering users' queries on collocation use. Currently, TANGO supports two text collections: a monolingual corpus (BNC) and a bilingual corpus (SPC). The system consists of four main parts:</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Chunk and Clause Information Integrated </SectionTitle> <Paragraph position="0"> For CoNLL-2000 shared task, chunking is considered as a process that divides a sentence into syntactically correlated parts of words. With the benefits of CoNLL training data, we built a chunker that turn sentences into smaller syntactic structure of non-recursive basic phrases to facilitate precise collocation extraction. It becomes easier to identify the argument-predicate relationship by looking at adjacent chunks. By doing so, we save time as opposed to n-gram statistics or full parsing. Take a text in CoNLL-2000 for example: The words correlated with the same chunk tag can be further grouped together (see Table 1). For instance, with chunk information, we can extract</Paragraph> </Section> </Section> class="xml-element"></Paper>