File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-1225_intro.xml
Size: 6,043 bytes
Last Modified: 2025-10-06 14:06:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1225"> <Title>I Natural Language Concept Analysis</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In most mainstream approaches to natural language modelling and parsing, some form of hierarchical structure plays a central role. The most obvious case in point is phrase structure. However, while the latter notion has shown its theoretical relevance in many ways, practical applications based on phrase structure description are not without problems. The main reason for this is the high flexibility of natural language. In performance data (i.e. actual language use), many disruptions of, and variations on standard phrase structure patterns occur. As a result, application of phrase structure-based parsers in natural language processing shows only limited success.</Paragraph> <Paragraph position="1"> This has inspired a search for alternative methods, such as statistical based or lexicon-driven parsing.</Paragraph> <Paragraph position="2"> In our search for a solution to the problems mentioned we decided to take one step back, and examine the underlying nature of hierarchical structure in general, and phrase structure in particular. Our aim in doing so was to find a more principled solution to the problems of linguistic analysis and parsing.</Paragraph> <Paragraph position="3"> We looked for ways to derive structural information from input, and to incorporate this in a mathematically well-founded theory of knowledge representation. As a result we found a level of abstractness that, in principle, allows language-independent modelling and analysis.</Paragraph> <Paragraph position="4"> In our approach we capitalise on the property that the information carriers, the lexical items, are 'willing' to combine. These combinatorial properties are determined by inherent characteristics of lexica/ items. Hierarchical structure follows naturally from the interaction of these properties, while leaving room for variation and flexibility in structural patterning.</Paragraph> <Paragraph position="5"> In our model of natural language (NL) the input is represented as a binary relation. This is due to the dichotomy of language, meaning that a classification of lexical items as objects and attributes can be made (we use the term &quot;dichotomy&quot; in a restricted sense: a division into two mutually exclusive parts).</Paragraph> <Paragraph position="6"> The two classes are interrelated, and their relation can be dete ~rrnined merely on the basis of lexicai information and some general principles, like word order (e.g. SVO). The relation between the classes is due to the principle of relatedness. This principle entails that any non-empty set of objects implies the existence of a non-empty set of attributes (properties) it is related to, and vice versa. Minimally, an observable entity (object) has the property of existence (attribute). This principle gives rise to a relation representing the semantics of the 'thought' described by the sentence in terms of a set of related items, called observations. An observation captures a set of objects and properties that are mutually characteristic of each other. Such a notion corresponds to a/ormal concept in lattice theory. We will show that the above relation is supported by linguistic considerations.</Paragraph> <Paragraph position="7"> Our approach to language, Natural Language Concept Analysis (NLCA), constitutes a linguistically and mathematically based theory. This is reflected by the different readings of the acronym, as follows. NLC(:A): the analysis of concepts that play a role in natural language; (NL)CA: the lattice the-Karaphuis and Sarbo 205 Natural Language Concept Analysis V. Kamphuis and J.J. Sarbo (1998) Natural Language Concept Analysis. In D.M.W. Powers (ed.) NeMLaP3/CoNLL98: New Methods in Language Processing and Computational Natural Language Learning, ACL, pp 205-214, oretical model of formal concept analysis applied to natural language; N(LCA): a natural transformation on language (in concrete, on functor-argument relations). null</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 Related research </SectionTitle> <Paragraph position="0"> Our theory goes together with a movement of modern formalisms in computational linguistics which can be characterised by a shift of emphasis from a large, detailed syntax and simple lexicon, to a compact syntax and rich lexicon. Amongst other works, one can cite HPSG (Pollard and Sag, 1994), and most recently, a proposal by Berwick and Epstein (Berwick and Epstein, 1995).</Paragraph> <Paragraph position="1"> Berwick and Epstein outline a model that, in accordance with Minimalist principles, does not posit &quot;any syntactic entities at all beyond what \[is\] absolutely necessary for linguistic description and ex* planation.&quot; The necessary machinery, as they point out, is one based on categorial grammar (Lambek, 1988). Their argument follows from the fundamental idea that natural languages are limited to rules specifying how constituents can be concatenated to form larger constituents. Berwick and Epstein introduce a single syntactic operation, Hierarchical Composition (HC), for the realization of such syntactic constraints.</Paragraph> <Paragraph position="2"> With respect to the above mentioned movement in natural language processing, we note that the endeavour to move (almost) all information to the lexicon can be theoretically justified. Intuitively, practical NL formalisms like HPSG can be seen as variants of two-level, e.g. attribute grammars. Theoretically, for such a grammar, a weakly equivalent grammar using only a single nonterminal symbol exists (Franzen, 1983). In such a grammar all structural information is specified by attribute functions. These functions can be defined by the lexicon.</Paragraph> </Section> </Section> class="xml-element"></Paper>