File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/w97-0907_concl.xml
Size: 2,236 bytes
Last Modified: 2025-10-06 13:57:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0907"> <Title>A Language Identification Application Built on the Java Client/Server Platform</Title> <Section position="7" start_page="45" end_page="45" type="concl"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> In this paper, we have described a Java s implementation of a character ngram language labeling algorithm. This NLP module was successfully reused in a client side Java application, in an offiine document management system and embedded within an HTTP proxy server. With the rapid deployment of the globally available Java infrastructure, a tremendous opportunity exists for resusable NLP components. null The distributed nature of our particular application, led us to explore possible tradeoffs between the accuracy needed for client side language labeling and the size of the language models. By selecting smaller ngram windows sizes and by disgarding infrequently observed ngrams from our language profiles we can reduce the size of the models by an order of magnitude with an insignificant loss of precision for our target application.</Paragraph> <Paragraph position="1"> The tradeoffs we have explored in the context of automatic language identification are relevant more generally to natural language processing in the distributed setting made possible by the Java infrastructure. At a minimum, our observations with respect to character-based language models are likely to be applicable to the word-based language mod-</Paragraph> <Paragraph position="3"> aSun, Java, Java Developers Kit, Hot Java, and Ultral are trademarks or registered trademarks of Sun Microsystems, \[nc. in the United States and other countries.</Paragraph> <Paragraph position="4"> tions. Beyond that, similar client/server tradeoffs are likely to be important even in strictly knowledge based systems. Part-of-speech tagging and phrase identification, foreign word translation, and topic labeling are among the operations that promise to enhance intelligent search and browsing on the Web, and the present paper represents a beginning step toward making decisions about where to locate these operations' computations and data.</Paragraph> </Section> class="xml-element"></Paper>