File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1010_intro.xml
Size: 3,772 bytes
Last Modified: 2025-10-06 14:06:33
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1010"> <Title>A Memory-Based Approach to Learning Shallow Natural Language Patterns</Title> <Section position="2" start_page="0" end_page="67" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Identifying local patterns of syntactic sequences and relationships is a fundamental task in natural language processing (NLP). Such patterns may correspond to syntactic phrases, like noun phrases, or to pairs of words that participate in a syntactic relationship, like the heads of a verb-object relation.</Paragraph> <Paragraph position="1"> Such patterns have been found useful in various application areas, including information extraction, text summarization, and bilingual alignment. Syntactic patterns are useful also for many basic computational linguistic tasks, such as statistical word similarity and various disambiguation problems.</Paragraph> <Paragraph position="2"> One approach for detecting syntactic patterns is to obtain a full parse of a sentence and then extract the required patterns. However, obtaining a complete parse tree for a sentence is difficult in many cases, and may not be necessary at all for identifying most instances of local syntactic patterns.</Paragraph> <Paragraph position="3"> An alternative approach is to avoid the complexity of full parsing and instead to rely only on local information. A variety of methods have been developed within this framework, known as shallow parsing, chunking, local parsing etc. (e.g., (Abney, 1991; Greffenstette, 1993)). These works have shown that it is possible to identify most instances of local syntactic patterns by rules that examine only the pattern itself and its nearby context. Often, the rules are applied to sentences that were tagged by part-of-speech (POS) and are phrased by some form of regular expressions or finite state automata.</Paragraph> <Paragraph position="4"> Manual writing of local syntactic rules has become a common practice for many applications. However, writing rules is often tedious and time consuming.</Paragraph> <Paragraph position="5"> Furthermore, extending the rules to different languages or sub-language domains can require substantial resources and expertise that are often not available. As in many areas of NLP, a learning approach is appealing. Surprisingly, though, rather little work has been devoted to learning local syntactic patterns, mostly noun phrases (Ramshaw and Marcus, 1995; Vilain and Day, 1996).</Paragraph> <Paragraph position="6"> This paper presents a novel general learning approach for recognizing local sequential patterns, that may be perceived as falling within the memory-based learning paradigm. The method utilizes a part-of-speech tagged training corpus in which all instances of the target pattern are marked (bracketed).</Paragraph> <Paragraph position="7"> The training data are stored as-is in suffix-tree data structures, which enable linear time searching for subsequences in the corpus.</Paragraph> <Paragraph position="8"> The memory-based nature of the presented algorithm stems from its deduction strategy: a new instance of the target pattern is recognized by examining the raw training corpus, searching for positive and negative evidence with respect to the given test sequence. No model is created for the training corpus, and the raw examples are not converted to any other representation.</Paragraph> <Paragraph position="9"> Consider the following example 1. Suppose we 1We use here the POS tags: DT ---- determiner, ADJ = adjective, hDV = adverb, C0NJ = conjunction, VB=verb, PP=preposition, NN = singular noun, and NNP ---- plural noun. want to decide whether the candidate sequence</Paragraph> </Section> class="xml-element"></Paper>