File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-3091_metho.xml

Size: 6,995 bytes

Last Modified: 2025-10-06 14:12:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-3091">
  <Title>A MATRIX REPRESENTATION OF THE INFLECTIONAL FORMS OF ARABIC WORDS: A STUDY OF CO-OCCURRENCE PATTERNS</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. THE MATRIX REPRESENTATION
</SectionTitle>
    <Paragraph position="0"> Sample &amp;quot;MATRIX PARADIGMS&amp;quot; are shown in Fig(2) for verbs and Fig(3) for nouns. Table(l) gives the keys in English to the columns on the Matrix Paradigms. The inflected form for a given Person/Number/Gender~Mode combination for verbs (obtained from the relevant &amp;quot;row&amp;quot; of the Matrix Paradigm) is constructed by concatenating the prefix, core and both subject and object pronoun column entries. The inflected forms for nouns are similarly constructed for a particular Number/Gender/Case combination.</Paragraph>
    <Paragraph position="1"> The various &amp;quot;cells&amp;quot; of the object pronoun columns indicate whether a particular entry is valid (indicated by &amp;quot;~&amp;quot;, an Arabic numeral one). Invalid entries are indicated by a &amp;quot;'&amp;quot;, an Arabic zero. It is due to this matrix of ones and zeros that the representation was named the &amp;quot;Matrix Paradigm&amp;quot;.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="419" type="metho">
    <SectionTitle>
3. TAXONOMY OF ARABIC WORDS
</SectionTitle>
    <Paragraph position="0"> Fig(l) shows a tree diagram representing the taxonomical classification of Arabic verbs and nouns. There are different &amp;quot;levels&amp;quot; in the tree correspond to different types of variations of the inflected form from one class to another. The first type of variation coincides more or less with the traditional classification and is respresented at levels 2 and 3 for verbs and at level 2 for nouns.</Paragraph>
    <Paragraph position="1"> Each Matrix Paradigm also reflects two further types of variation, which can be considered separately from one another. The first is the variation in the core with the different rows; this dimension corresponds, for example, to the traditional study of verb conjugations (see &lt;2&gt;).</Paragraph>
    <Paragraph position="2">  The other type of variation is that in the distribution of the Matrix of ones and zeros, which is essentially a variation in the co-occurrence of object pronouns (for transitive verbs) and possessive pronouns (for nouns). This variation is reflected at level 4 of the taxonomy. In the following sections 3.1 and 3.2, we will discuss the study of these co-occurrence patterns in more detail for verbs and nouns separately.</Paragraph>
    <Section position="1" start_page="419" end_page="419" type="sub_section">
      <SectionTitle>
3.1 CO-OCCURRENCE PATTERNS FOR VERBS
</SectionTitle>
      <Paragraph position="0"> On examination of the Landau &lt;i&gt; high frequency wordlist, the following features seemed to distinguish classes of verbs apart:  i- Whether the subject is human or non-human (for both transitive and intransitive verbs). 2- Whether the object is human or non-human (for transitive verbs only).</Paragraph>
      <Paragraph position="1"> 3- The number of the subject (for intransitive  verbs only).</Paragraph>
      <Paragraph position="2"> in Arabic, there is a set of object pronouns which refers to a non-human object: (L,~,t*,~) and this will be denoted by -H. This Js a subset of the complete set of pronouns +H, which denotes human and non-human. Below, we will discuss the features for transitive and intransitive verbs separately: (a) Transitive Verbs: As shown in the table below, there can only be 4 combinations of the features +H and -H. Each of the feature sets in the table has been designated a class code. Only verbs with features corresponding to the feature sets B,C and D have been found in  It was found out that the subject number is an additional distinguishing feature for transitive verbs. Moreover, the subject number is significant only in the ease of human subjects. For non-human subjects, t.his feature is not significant.</Paragraph>
      <Paragraph position="3"> Based upon the above observations, we will define the distinguishing features for intransitive verbs to be +H(s),+H(dp) and -H, where s denotes singular and dp denotes dual/plural. +H(s) and +H(dp) denote the sets of singular and dual/plural subjects, respectively. By definition +H(s) U /H(dp) -H, where U denotes the union of the two feature sets* The table below shows the possible combinations of these features; only features designated by A,E and F were found for</Paragraph>
    </Section>
    <Section position="2" start_page="419" end_page="419" type="sub_section">
      <SectionTitle>
3.2 CO-OCCURRENCE PATTERNS FOR NOUNS
</SectionTitle>
      <Paragraph position="0"> The same set of object pronouns for verbs denotes the possessive pronouns for nouns, with the exception of a slight difference in form of the first person singular. The -H set is exactly the same.</Paragraph>
      <Paragraph position="1"> Three distinct classes of Matrix patterns (see level 3 of Fig(l)) have been observed for nouns:  - 2 (A) No possessive p~onouns can be attached. (B) All possessive pronouns can be attached. (C) Only possessive pronouns related  inanimate (set -H) can be attached.</Paragraph>
      <Paragraph position="2"> to the An additional study was made to determine what Number/Gender (NG) combinations are valid for a particular noun stem. These have been found to be an important feature of Arabic nouns, as not all NG combinations are valid for a stem. Each stem needs to be examined separately and this information is put into the lexicon of stem. The NG combinations are represented at level 3 of the taxonomy, for nouns (see Fig(l)).</Paragraph>
      <Paragraph position="3"> Although there is no systematic, theoretical method for determining what all the different NG combinations are for comprehensive coverage of nouns, yet by examining more and more nouns from Landau's &lt;i&gt; wordlist, some form of convergence occurred* For the 2,500 stem shortlist, there were only 17 NG combinations.</Paragraph>
      <Paragraph position="4"> This curious feature of Arabic nouns can be mainly attributed to the presence of words of foreign origin and to the pragmatics of the noun in Question.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="419" end_page="419" type="metho">
    <SectionTitle>
4. APPLICATIONS DEVELOPED
</SectionTitle>
    <Paragraph position="0"> As a first application, an Arabic stem-based morphological analyser has been developed on an IBM PS/2 microcomputer. Morphological features of the word analysed are computed.</Paragraph>
    <Paragraph position="1"> As a by-product of the analyser, an Arabic spelling verifier has been developed, by including unification of the morphological and co-occurrence features of the morphemes.</Paragraph>
    <Paragraph position="2"> The system is currently being developed for use in the interaction with an Arabic syntactic parser.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML