File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-1011_intro.xml
Size: 3,931 bytes
Last Modified: 2025-10-06 14:06:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1011"> <Title>Towards a linguistically motivated computational grammar for Hebrew</Title> <Section position="3" start_page="0" end_page="82" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Modem Hebrew (MH) poses some interesting problems for the grammar designer. The Hebrew script is highly ambiguous, a fact that results in many part-of-speech tags for almost every word (Ornan, 1994). Short prepositions, articles and conjunctions are usually attached to the words that immediately succeed them. In addition, Hebrew morphology is very rich: a noun base might have over fifteen different derivations, and a verb - over thirty. In spite of the difficulties, disambiguation of the script, as well as morphological analysis, were covered by a variety of works (Bentur et al., 1992; Choueka and Ne'eman, 1995; Oman and Katz, 1995). From a practical point of view, Hebrew morphology is well accounted for.</Paragraph> <Paragraph position="1"> The syntax of the language, however, remains an open problem. The first syntactic analyzer for Hebrew is described in (Cohen, 1984), but its grammar is implicit in a software system. Nirenburg and Ben-Asher (1984) describe a small-scale ATN for Hebrew, capable of recognizing very limited structures. Unification-based formalisms were used for developing Hebrew grammars only recently. A limited experiment using PATR-II is described in (Wintner, 1992); it is extended (Wintner and Oman, 1996) to a reasonable subset of the language, on a different platform: Tomita's LR Parser/Compiler, which is based on LFG. The grammar recognizes sentences of wide variety and complexity, but the analyses it provides are not conveyed in the framework of any particular linguistic theory. A different work along the same lines is (Yizhar, 1993): using the same framework, it concentrates on the syntax of noun phrases, employing ideas from different linguistic theories.</Paragraph> <Paragraph position="2"> Works related to the syntax of Hebrew, and in particular to noun phrases, are abundant in the theoretical linguistics literature (Borer, 1984; Ritter, 1991; Siloni, 1994). All of them are carded out in Chomskian frameworks; none can be directly implemented computationally, and their predictions cannot be verified on the basis of existing on-line corpora. The practical contribution of these works is thus limited.</Paragraph> <Paragraph position="3"> This paper describes the first stages of an attempt to bridge the gap between linguistically theoretic analyses and computational implementations. Using HPSG (Pollard and Sag, 1994) as the linguistic theory in which analyses are conveyed, grammars earl be directly implemented and their predictions verified. HPSG is used for formally describing the structure of a variety of languages, but this is the first time the theory is applied to any Semitic language. While some ideas of existing Hebrew grammars, in particular (Wintrier and Ornan, 1996) and (Yizhar, 1993), are incorporated into the work described here, the starting point is new: we present an account of several aspects of the Hebrew noun phrase, aligned with the general principles of HPSG. All the analyses described in the paper were computationally implemented using AMALIA (Wintner, 1997a) as the development framework. The phenomena we address include the status of the definite article, the application of the DP hypothesis to Hebrew, definiteness agreement in noun phrases as well as definiteness inheritance in constructs. This is a work in progress, and the results described here are preliminary. The grammar is not intended to have a broad coverage, but rather to provide explanatory structures to linguistically interesting phenomena. However, we hope to extend the coverage of the grammar in the future, maintaining its linguistic rigor.</Paragraph> </Section> class="xml-element"></Paper>