File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0807_metho.xml
Size: 5,211 bytes
Last Modified: 2025-10-06 14:08:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0807"> <Title>Current Issues in Software Engineering for Natural Language Processing</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 http://www.ai.sri.com/ oaa/ </SectionTitle> <Paragraph position="0"> (Menzel, 2002) for a further discussion of architectural issues in NLP.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.7 Design reuse with design patterns </SectionTitle> <Paragraph position="0"> Design patterns (Gamma et al., 1994; Harrison et al., 2000) are reusable units of software architecture design that have emerged from object-oriented software development research, where certain collaborative object configurations were found to re-occur in different contexts.</Paragraph> <Paragraph position="1"> Finite-State Automata (FSAs) were historically the first devices that have received a software engineering treatment (Watson, 1995), as they are pervasive from compiler technology to software engineering itself. (Yacoub and Ammar, 2000) describe how using a FiniteStateMachine design pattern that separates out certain facets can facilitate interoperability between Mealy, Moore and hybrid FSAs.</Paragraph> <Paragraph position="2"> (Manolescu, 2000) identifies the FeatureExtraction pattern as a useful abstraction for information retrieval and natural language processing: a FeatureExtractorManager is a Factory of FeatureExtractor objects, where each knows a MappingStrategy, a FilteringStrategy and a Database. Numerical techniques often used in machine learning to overcome the &quot;curse of dimensionality&quot; (a0 data sparseness above) such as Singular Value Decomposition, Latent Semantic Indexing, or Principle Component Analysis (PCA) are also instances of this pattern. It is worth noting that some of these patterns are domain-specific, i.e. the software engineering aspects interact with the type of linguistic processing. (Basili et al., 1999) generalize over typical NLP components, combining Data Flow Diagrams for a Linguistic Processing Module (LM), a Lexical Acquisition Module (LAM) and an Application Module (AM) to a generic model of an NLP application. The result of the LAM is what (Cunningham et al., 1997) would call a Data Resource (as opposed to a Processing Resource, which corresponds to a LM). (Basili et al., 1999) also present an UML model of a class for linguistically annotated text, LinguisticInformation, that is interoperable with application-dependent classes.</Paragraph> <Paragraph position="3"> 2.8 Productivity gain with composition languages? Recently, work in software engineering has focused on composition languages (Nierstrasz and Meijler, 1994), which allow to construct systems on a meta-level by specifying composition transformations in a separate glue notation without editing component source code (Assmann, 2003). Such an approach would support a view held by (Daelemans et al., 1998), who argue that &quot;all NLP tasks can be seen as either a0 light NLP tasks involving disambiguation or segmentation locally at one language level or between two closely-related language levels; or as a0 compositions of light NLP tasks, when the task surpasses the complexity of single light NLP tasks.&quot; That NLP processing often involves generic pre-processing (such as POS-tagging) can be taken as evidence for the need for dedicated linguistic composition languages.5 Whereas toolkits and frameworks for NLP have already been developed, to date there exists no dedicated NLP composition language. In such a language, both linguistic structures (such as typed AVMs) and processing resources (such as taggers or tag-set mappers) had first-order status. Composition languages are a logical next step in the ongoing development of new abstraction layers for computing.6</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Experiment or System? </SectionTitle> <Paragraph position="0"> Figure 3 depicts the trade-off researchers have to face when deciding between carrying out an experiment, building a prototype program, implementing a more fleshed-out self-contained system, building a complete, generic, redistributable toolkit or whether they invest long-term in providing the community with a new framework.7 On the one hand, experiments ensure high short-term productivity with hardly any reuse or cross-fertilization to other projects. Frameworks, on the other 5 The visual application builder part of GATE 2 can be seen as a visual composition language.</Paragraph> <Paragraph position="1"> 6 See (Abelson and Sussman, 1996) for a view that programming is indeed constant development and application of a growing collection of abstraction mechanisms.</Paragraph> <Paragraph position="2"> 7 There may be a difference of several orders of magnitude in hand, which are only possible in larger groups and with long-range funding, pay back relatively late, but offer many synergies due to their all-embracing nature if they can overcome developers reluctance to adopt a new framework.</Paragraph> </Section> class="xml-element"></Paper>