File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1806_concl.xml

Size: 1,141 bytes

Last Modified: 2025-10-06 13:53:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1806">
  <Title>Multiword Unit Hybrid Extraction</Title>
  <Section position="9" start_page="11" end_page="11" type="concl">
    <SectionTitle>
7 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper describes an original hybrid system that extracts multiword unit candidates by endogenously identifying relevant syntactical patterns from the corpus and by combining word statistics with the acquired linguistic information. As a result, by avoiding human intervention in the definition of syntactical patterns, (1) HELAS provides total flexibility of use being independent of the targeted language and (2) it allows the identification of various MWUs like compound nouns, compound determinants, verbal locutions, adverbial locutions, prepositional locutions and adjectival locutions without defining any threshold or using lists of stop words. The system has been tested on the Brown Corpus leading to encouraging results evidenced by a precision score of 62 % for the best configuration. The system will soon be available on http://helas.di.ubi.pt.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML