File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/i05-6004_relat.xml
Size: 4,115 bytes
Last Modified: 2025-10-06 14:15:51
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-6004"> <Title>Integration of a Lexical Type Database with a Linguistically Interpreted Corpus</Title> <Section position="4" start_page="37" end_page="38" type="relat"> <SectionTitle> 5 Related Work </SectionTitle> <Paragraph position="0"> Tsuchiya et al. (2005) have been constructing a database that summarizes multiword functional expressions in Japanese. That describes each expression's linguistic behavior, usage and examples in depth. Notable differences between their database and ours are that their database is mostly constructed manually while ours is constructed semi-automatically, and that they target only functional expressions while we deal with all kinds of lexical types.</Paragraph> <Paragraph position="1"> Hypertextual Grammar development (Dini and Mazzini, 1997) attempted a similar task, but focused on documenting the grammar, not on linking it to a dynamic treebank. They suggested creating the documentation in the same file along with the grammar, in the style of literate programming. This is an attractive approach, especially for grammars that change constantly. However, we prefer the flexibility of combining different knowledge sources (the grammar, treebank and linguistic description, in addition to external resources). null The Montage project (Bender et al., 2004) aims to develop a suite of software whose primary audience is field linguists working on underdocumented languages. Among their tasks is to facilitate traditional grammatical description from annotated texts by means of one of their products, the Grammar export tool. Although in the paper there is little explicit detail about what the &quot;traditional grammatical description&quot; is, they seem to share a similar goal with us: in the case of Montage, making grammatical knowledge assumed in underdocumented languages explicit, while in our case making lexical types assumed in the treebank and the computational grammar understandable to humans. Also, some tools they use are used in our project as well. Consequently, their process of grammatical description and documentation looks quite similar to ours. The difference is that their target is underdocumented languages whose grammatical knowledge has so far not been made clear enough, while we target a familiar language, Japanese, that is well understood but whose computational implementation is so large and complex as to be difficult to fully comprehend. null Another notable related work is the COMLEX syntax project (Macleod et al., 1994). Their goal is to create a moderately-broad-coverage lexicon recording the syntactic features of English words for purposes of computational language analysis.</Paragraph> <Paragraph position="2"> They employed elves (&quot;elf&quot; = enterer of lexical features) to create such a lexicon by hand. Naturally, the manual input task is error-prone. Thus they needed to prepare a document that describes word usages by which they intended to reduce elves' errors. It is evident that the document plays a role similar to our lexical type database, but there are important divergences between the two. First, while their document seems to be constructed manually (words chosen as examples of lexical types in the documentation are not always in the lexicon!), the construction process of our database is semi-automated. Second, somewhat relatedly, our database is electronically accessible and well-structured. Thus it allows more flexible queries than a simple document. Third, unlike COMLEX, all the lexical types in the database are actually derived from the working Japanese grammar with which we are building the treebank. That is, all the lexical types are defined formally. Fourth, examples in our database are all real ones in that they actually appear in the treebank, while most of the COMLEX examples were created specifically for the project. Finally, we are dealing with all kinds of lexical types that appear in the treebank, but the COMLEX project targets only nouns, adjectives, and verbs.</Paragraph> </Section> class="xml-element"></Paper>