File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1105_intro.xml
Size: 5,440 bytes
Last Modified: 2025-10-06 14:06:27
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1105"> <Title>A LEXICAL DATABASE TOOL FOR QUANTITATIVE PHONOLOGICAL RESEARCH</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> A lexical database tool tailored for phonological research is described. Database fields include transcriptions, glosses and hyperlinks to speech files. Database queries are expressed using HTML forms, and these permit regular expression search on any combination of fields. Regular expressions are passed directly to a Perl CGI program, enabling the full flexibility of Perl extended regular iexpressions. The regular expression notation is extended to better support phonological searches, such as search for minimal pairs. Search results are presented fin the form of HTML or I~TEX tables, where each call is either a number (representing frequency) or a designated subset of the fields. Tables have up to four dimensions, with an elegant system for specifying iwhich fragments of which fields should be used for tile row/column labels* The tool</Paragraph> <Paragraph position="2"> phonological researcfi; (ii) it gives universal access to the same set of informants; (iii) it enables other r researchers to hear the original speech data without having to rely on published transcriptions; (iv) it makes the full power of regular expression search available, and search results are full multimedia documents; and (v) it enables the earl), refutation of false hypotheses, shortening the analysis-hypothesis-test loop. A lifesize application to an African tone language (Dschang) is used for exemplificgtion throughout the paper. The database contains 2200 records, each with approximately 15 fields. Running on a PC laptop with a stand-alone web server, the 'Dschang HyperLexicon' has already been used ex!ensively in phonological fieldwork and analysis in Cameroon.</Paragraph> <Paragraph position="3"> Initial stages of phonological analysis typically focus on words in isolation, as the phonemic inventory and syllable canon are established. Data is stored as a lexicon, where each word is entered as a transcription accompanied by at least a gloss (so the word can be elicited again) and the major syntactic category. In managing a lexicon, the working phonologist has a variety of computational needs: storage and retrieval; searching and sorting; tabular reports on distributions and contrasts; updates to database and to reports as distinctions are discovered or discarded. In the past the analyst had to do all this computation by hand using index cards kept in shoeboxes. But now many of these particular tasks are automated by software such as the SIL programs Shoebox (Buseman et al., 1996) and Findphone (Bevan, 1995), 1 or using commercial database packages.</Paragraph> <Paragraph position="4"> Of course, many tasks other than those listed above have already benefitted from (partial) automation. 2 Additionally, it has been shown how a computational inheritance model can be used for structuring lexical information relevant for phonology (Reinhard & Gibbon, 1991). And there is a body of work on the use of finite state devices - closely related to regular expressions - for modelling phonological phenomena (Kaplan & Kay, 1994) and for speech processing (cf. Kornai's 1Unlike regular database management systems, these include international and phonetic character sets and user-defined keystrokes for entering them, and a utility to dump a database into an RTF file in a user-defined lexicon format for use in desktop publishing.</Paragraph> <Paragraph position="5"> work with HMMs (Kornai, 1995)). However, computational phonology is yet to provide tools for manipulating lexical and speech data using the full expressive power of the regular expression notation in a way that supports pure phonological research.</Paragraph> <Paragraph position="6"> This paper describes a lexical database system tailored to the needs of phonological research and exemplified for Dschang, a language of Cameroon. An online lexicon (originally published as Bird & Tadadjeu, 1997), contains records with the format in Figure 1. Only the most important fields are shown.</Paragraph> <Paragraph position="7"> The user interface is provided by a Web browser. A suite of Perl programs (Wall & Schwartz, 1991) generates the search form in HTML and processes the query. Regular expressions in the query are passed directly to Perl, enabling the full flexibility of Perl extended regular expressions. A further extension to the notation allows searches for minimal sets, groups of words which are minimally different according to some criterion. Hits are structured into a tabular display and returned as an HTML or IrTEX document.</Paragraph> <Paragraph position="8"> In the next section, a sequence of example queries is given to illustrate the format of queries and results, and to demonstrate how a user might interact with the system. A range of more powerful queries are then demonstrated, along with an explanation of the notations for minimal pairs and projections. Next, some implementation details are given, and the component modules are described in detail. The last two sections describe planned future work and present the conclusions. null display: root: loanwords: suffixed: phrases: time-limit: vars:</Paragraph> </Section> class="xml-element"></Paper>