File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1809_intro.xml
Size: 2,869 bytes
Last Modified: 2025-10-06 14:02:06
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1809"> <Title>A Statistical Approach to the Semantics of Verb-Particles</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The semantic representation of multiword expressions (MWEs) has recently become the target of renewed attention, notably in the area of hand-written grammar development (Sag et al., 2002; Villavicencio and Copestake, 2002). Such items cause considerable problems for any semantically-grounded NLP application (including applications where semantic information is implicit, such as information retrieval) because their meaning is often not simply a function of the meaning of the constituent parts. However, corpus-based or empirical NLP has shown limited interest in the problem. While there has been some work on statistical approaches to the semantics of compositional compound nominals (e.g. Lauer (1995), Barker and Szpakowicz (1998), Rosario and Hearst (2001)), the more idiosyncratic items have been largely ignored beyond attempts at identification (Melamed, 1997; Lin, 1999; Schone and Jurafsky, 2001). And yet the identification of non-compositional phrases, while valuable in itself, would by no means be the end of the matter. The unique challenge posed by MWEs for empirical NLP is precisely that they do not fall cleanly into the binary classes of compositional and non-compositional expressions, but populate a continuum between the two extremes.</Paragraph> <Paragraph position="1"> Part of the reason for the lack of interest by computational linguists in the semantics of MWEs is that there is no established gold standard data from which to construct or evaluate models. Evaluation to date has tended to be fairly ad hoc. Another key problem is the lack of any firm empirical foundations for the notion of compositionality. Given this background, this paper has two aims. The first is to put the treatment of non-compositionality in corpus-based NLP on a firm empirical footing. As such it describes the development of a resource for implementing and evaluating statistical models of MWE meaning, based on non-expert human judgements. The second is to demonstrate the usefulness of such approaches by implementing and evaluating a handful of approaches.</Paragraph> <Paragraph position="2"> The remainder of this paper is structured as follows.</Paragraph> <Paragraph position="3"> We outline the linguistic foundations of this research in Section 2 before describing the process of resource building in Section 3. Section 4 summarises previous work on the subject and Section 5 details our proposed models of compositionality. Section 6 lays out the evaluation of those models over the gold standard data, and we conclude the paper in Section 7.</Paragraph> </Section> class="xml-element"></Paper>