File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/j95-3004_concl.xml
Size: 4,507 bytes
Last Modified: 2025-10-06 13:57:26
<?xml version="1.0" standalone="yes"?> <Paper uid="J95-3004"> <Title>Alon Itait Technion Uzzi Ornan t Technion</Title> <Section position="12" start_page="400" end_page="402" type="concl"> <SectionTitle> 10. Conclusions </SectionTitle> <Paragraph position="0"> A method to acquire morpho-lexical probabilities from an untagged corpus has been described. The main idea was to use the rich morphology of the language to learn the Computational Linguistics Volume 21, Number 3 frequency of a certain analysis from the frequency of other word forms of the same lexical entry.</Paragraph> <Paragraph position="1"> The results of the experiment confirm the conjecture we made about the nature of the morphological ambiguity problem in Hebrew. It can be argued, therefore, that the computer with its complete morphological knowledge is facing a much more complex problem than that of a human who may be ignorant of some rare analyses reading a Hebrew text. This observation is also supported by the fact that humans are very often surprised to see the amount of possible analyses of a given ambiguous word. It may even have a significance from a psycholinguistic point of view, by suggesting that these kind of probabilities are also used by a human reader of Hebrew. However, this conjecture should be tested empirically.</Paragraph> <Paragraph position="2"> An experiment to test the usefulness of the morpho-lexical probabilities for morphological disambiguation in Hebrew yielded the following results: a recall of 70% for full disambiguation, and a recall of 90% for analysis assignment.</Paragraph> <Paragraph position="3"> However, the morpho-lexical probabilities cannot serve as the only source of information for morphological disambiguation, since they are imperfect by definition--they always choose the same analysis as the right one, regardless of the context in which the ambiguous word appears. Thus, as has been already mentioned, we have incorporated these probabilities into an existing system for morphological disambiguation. The combined system tackles the disambiguation problem by combining two kinds of linguistic information sources: Morpho-Lexical Probabilities and Syntactic Constraints (a full description of this system can be found in Levinger \[1992\]).</Paragraph> <Section position="1" start_page="401" end_page="402" type="sub_section"> <SectionTitle> Appendix A </SectionTitle> <Paragraph position="0"> Given below is the Latin-Hebrew transliteration used throughout the paper. Note that accepted transcriptions for Hebrew (Academy of The Hebrew Language 1957; Ornan 1994) include indication for the vowels that are missing in the modern Hebrew writing system. For this reason, these transcriptions are not suitable for demonstrating the morphological ambiguity problem in the language. Instead, we use the following transliteration, which is based on the phonemic script (Ornan 1994); see Table 8.</Paragraph> <Paragraph position="1"> Appendix B Following is the set of rules used for Hebrew in order to automatically generate the SW set for every morphological analysis in Hebrew. Note that in case an analysis includes a particular attached particle, this particle is also attached to each of its similar words. A definite form of a noun--the SW set includes the indefinite form of the same noun.</Paragraph> <Paragraph position="2"> An indefinite form of a noun--the definite form of the same noun.</Paragraph> <Paragraph position="3"> A noun with a possessive pronoun--the same noun with all the other possessive pronouns with the same person attribute.</Paragraph> <Paragraph position="4"> An adjective---the other forms of the same adjective (changing the gender and number attributes).</Paragraph> <Paragraph position="5"> A verb without an object pronoun--the same verb in the same tense and person (changing the gender and number attributes only).</Paragraph> <Paragraph position="6"> A verb with an object pronoun--the same verb form with all the other object pronouns forms (preserving the person attribute while changing the gender and number ones).</Paragraph> <Paragraph position="7"> Nominal personal pronoun--the other nominal personal pronouns of the same person.</Paragraph> <Paragraph position="8"> A masculine form of a number--the feminine form of the same number. A feminine form of a number--the masculine form of the same number. A proper noun, a particle (preposition, connective, etc.)--the empty SW set.</Paragraph> </Section> </Section> class="xml-element"></Paper>