File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1206_concl.xml
Size: 2,605 bytes
Last Modified: 2025-10-06 13:55:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1206"> <Title>Automated Multiword Expression Prediction for Grammar Engineering</Title> <Section position="7" start_page="42" end_page="42" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> One of the important challenges for robust natural language processing systems is to be able to deal with the systematic parse failures caused in great part by Multiword Expressions and related constructions. Therefore, in this paper we have proposed an approach for the semi-automatic extension of grammars by using an error mining technique for the detection of MWE candidates in texts and for predicting possible lexico-syntactic types for them. The approach presented is based on that of van Noord (2004) and proposes a set of MWE candidates. For this set of candidates, using the World Wide Web as a large corpus, frequencies are gathered for each candidate. These in conjunction with some statistical measures are employed for ruling out noisy cases like spelling mistakes (from 6The POS tags are produced with the TnT tagger.</Paragraph> <Paragraph position="1"> of government) and frequent non-MWE sequences like input is complete.</Paragraph> <Paragraph position="2"> With this information the remaining sequences are analysed by a statistical type predictor that assigns the most likely lexical type for each of the candidates in a given context. By adding these to the grammar as new lexical entries, a considerable increase in coverage of 14.4% was obtained.</Paragraph> <Paragraph position="3"> The approach proposed employs simple and self-contained techniques that are language-independent and can help to semi-automatically extend the coverage of a grammar without relying on external resources, like electronic dictionaries and ontologies that are expensive to obtain and not available for all languages. Therefore, it provides an inexpensive and reusable manner of helping and speeding up the grammar engineering process, by relieving the grammar developer of some of the burden of extending the coverage of the grammar.</Paragraph> <Paragraph position="4"> As future work we intend to investigate further statistical measures that can be applied robustly to different types of MWEs for refining even more the list of candidates and distinguishing false positives, like of alcohol and from MWEs, like put forward by. The high frequency with which the former occur in corpora and the more accute problem of data sparseness that affects the latter make this a difficult task.</Paragraph> </Section> class="xml-element"></Paper>