File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2602_concl.xml
Size: 2,110 bytes
Last Modified: 2025-10-06 13:54:26
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2602"> <Title>Towards Full Automation of Lexicon Construction</Title> <Section position="5" start_page="0" end_page="0" type="concl"> <SectionTitle> 3.5 Conclusion </SectionTitle> <Paragraph position="0"> It seems clear that practical constraints will necessitate the development of powerful corpus-driven methods for meaning representation, particularly when dealing with diverse languages, subject matter, and writing styles. Although it remains to be fully developed and tested, the evidence assembled thus far seems suf cient to conclude that our lexical optimization approach offers this prospect.</Paragraph> <Paragraph position="1"> The approach follows a simple information-theoretic principle: A lexicon can be judged by the amount of information it captures about a suitably chosen grounding space . The process results in a distributional lexicon suitable for semantic comparison of sense-disambiguated terms, multi-word units, and most likely, larger units of text such as short phrases.</Paragraph> <Paragraph position="2"> One can initialize the lexical optimization process by applying a distributional clustering algorithm such as co-clustering to obtain term classes that have the properties of syntactic tags, regardless of the fact that many of the terms in a typical cluster will, in many contexts, fail to exhibit the syntactic class that the cluster implicitly represents. This starting point is suf cient to support incremental re nements including sense disambiguation, multi-word-unit detection, and the incorporation of novel terms into the lexicon. The preliminary evidence also suggests that this approach can be extended to capture shallow parsing information. Although we have yet to conduct such experiments, it also seems clear that given a set of re nements based on one co-clustering run, it becomes possible to re-analyze the corpus in terms of the improved lexicon and generate an improved coclustering, etc. It remains to be seen how far such an approach can be productively pursued.</Paragraph> </Section> class="xml-element"></Paper>