File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/e06-1043_concl.xml
Size: 4,455 bytes
Last Modified: 2025-10-06 13:55:06
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1043"> <Title>Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations</Title> <Section position="6" start_page="342" end_page="343" type="concl"> <SectionTitle> 6 Discussion and Conclusions </SectionTitle> <Paragraph position="0"> The significance of the role idioms play in language has long been recognized. However, due to their peculiar behaviour, idioms have been mostly overlooked by the NLP community. Recently, there has been growing awareness of the importance of identifying non-compositional multiword expressions (MWEs). Nonetheless, most research on the topic has focused on compound nouns and verb particle constructions. Earlier work on idiomshaveonlytouched thesurface oftheproblem, failing to propose explicit mechanisms for appropriately handling them. Here, we provide effective mechanisms for the treatment of a broadly documented and crosslinguistically frequent class of idioms, i.e., VNICs.</Paragraph> <Paragraph position="1"> Earlier research on the lexical encoding of idioms mainly relied on the existence of human annotations, especially for detecting which syntactic variations (e.g., passivization) an idiom can undergo (Villavicencio et al., 2004). We propose techniques for the automatic acquisition and encoding of knowledge about the lexicosyntactic behaviour of idiomatic combinations. We put forwardameans for automatically discovering the set ofsyntactic variations that aretolerated byaVNIC and that should be included in its lexical representation. Moreover, weincorporate suchinformation into statistical measures that effectively predict the idiomaticity level of a given expression. In this regard, our work relates to previous studies on determining the compositionality (inverse of idiomaticity) of MWEs other than idioms.</Paragraph> <Paragraph position="2"> Most previous work on compositionality of MWEs either treat them as collocations (Smadja, 1993), or examine the distributional similarity between the expression and its constituents (Mc-Carthy et al., 2003; Baldwin et al., 2003; Bannard et al., 2003). Lin (1999) and Wermter and Hahn (2005) go one step further and look into a linguistic property of non-compositional compounds--their lexical fixedness--to identify them. Venkatapathy and Joshi (2005) combine aspects of the above-mentioned work, by incorporatinglexical fixedness, collocation-based, anddistributional similarity measures into a set of features which are used to rank verb+noun combinations according to their compositionality.</Paragraph> <Paragraph position="3"> Our work differs from such studies in that it carefully examines several linguistic properties of VNICs that distinguish them from literal (compositional) combinations. Moreover, we suggest novel techniques for translating such characteristics into measures that predict the idiomaticity level of verb+noun combinations. More specifically, we propose statistical measures that quantify the degree of lexical, syntactic, and overall fixedness of such combinations. We demonstrate that these measures can be successfully applied to the task of automatically distinguishing idiomatic combinations from non-idiomatic ones. We also show that our syntactic and overall fixedness measures substantially outperform a widely used measure of collocation, a40a42a41a44a43 , even when the latter takes syntactic relations into account.</Paragraph> <Paragraph position="4"> Others have also drawn on the notion of syntactic fixedness for idiom detection, though specific to a highly constrained type of idiom (Widdows and Dorow, 2005). Our syntactic fixedness measure looks into a broader set of patterns associated with a large class of idiomatic expressions. Moreover, our approach is general and can be easily extended to other idiomatic combinations.</Paragraph> <Paragraph position="5"> Each measure we use to identify VNICs captures a different aspect of idiomaticity: a40a73a41a74a43 reflects the statistical idiosyncrasy of VNICs, while the fixedness measures draw on their lexicosyntactic peculiarities. Our ongoing work focuses on combining these measures to distinguish VNICs from other idiosyncratic verb+noun combinations that are neither purely idiomatic nor completely literal, so that we can identify linguistically plausible classes of verb+noun combinations on this continuum (Fazly and Stevenson, 2005).</Paragraph> </Section> class="xml-element"></Paper>