File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-2004_intro.xml
Size: 2,352 bytes
Last Modified: 2025-10-06 14:01:25
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-2004"> <Title>A Linguistic Discovery Program that Verbalizes its Discoveries</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Previous works in machine scientific discovery have mostly focussed on historical reconstruction (work culminating in the book by Langley et. al. 1987), but more recent efforts are directed towards designing programs that discover new scientific knowledge. Such systems operate in disciplines as diverse as mathematics, chemistry, astronomy, medicine or linguistics. The field is currently very active (for recent developments, cf. e.g. the special issues on discovery of the journals Artificial Intelligence, April 1997 or Foundations of Science 1999; the ECAI-98 Workshop on In this paper, we present UNIVAUTO (UNIVersals AUthoringTOol), a system whose domain of application is linguistics, and in particular, the study of language universals, a classic trend in contemporary linguistics. This trend was initiated by the pioneering paper of Joseph Greenberg (1966), investigating word order in a database of 30 languages of wide genetic and areal coverage, described in terms of 15 ordering features. Greenberg discovered a number of universals relating diverse ordering properties of languages, and his example was followed by attempts at similar generalizations at other linguistic levels or across levels (for a review of the state-of-the-art, cf. e.g. Croft 1990).</Paragraph> <Paragraph position="1"> UNIVAUTO was run on various data sets (word order, phonology, morpho-syntax), with very promising linguistic results. The published outcomes of UNIVAUTO so far include : two whole journal articles Pericliev (1999, 2000) based on data from Greenberg (1966) (with no post-editing, the first one with no disclosure of articles' &quot;machine origin&quot;); around 50 statistically significant phonological universals based on Maddieson's UPSID-451 database, published without post-editing at the Universals Archive at the University of Konstanz; the substance discovery (rather than verbalization) part of Pericliev (2002). To the best of our knowledge, this is the first computer program to generate a whole scientific article.</Paragraph> </Section> class="xml-element"></Paper>