File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0503_metho.xml

Size: 12,322 bytes

Last Modified: 2025-10-06 14:08:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0503">
  <Title>Acquisition System for Arabic Noun Morphology</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Noun Classification
</SectionTitle>
    <Paragraph position="0"> In this paper we focus on the following nouns: genus nouns, agent nouns, instrument nouns, adjectives, proper adjectives (adjectives derived from proper nouns), proper nouns, and adverbs.</Paragraph>
    <Paragraph position="1"> Some of these nouns are not derived from verbs and some are. All of these nouns use the same pattern when it comes to the dual form either for masculine or feminine, but there are many ways to form the plural noun. Some of the nouns have both masculine and feminine forms, some of them have just feminine forms and some have just masculine forms. A few nouns use the same format for both the plural and the dual (e.g.</Paragraph>
    <Paragraph position="2"> r teachers used for both dual and plural) For most nouns, when they end with the letter (@), this indicates the feminine form of the noun, sometimes it does not, but it changes the meaning of the noun completely (e.g.</Paragraph>
    <Paragraph position="3"> office, library). Sometimes the same consonant string with different vowels has different meanings (e.g. r school, r teacher). Nouns are not like verbs in the Arabic language, there is no clear rule to define the morphological information and generate the morphology paradigms for them. Instead each group of nouns follows its own pattern.</Paragraph>
    <Paragraph position="4"> We have classified the nouns into 84 groups according to their patterns for singular, plural, masculine and feminine. We generated a method for each group to be used to find the morphological information and to form its paradigm. Very few of these groups have a unique pattern for plural and singular; and most of them share the same pattern with other groups. Table 1 shows some examples of these groups and their patterns. The digit 9 stands for the letter ayn [`], stands for hamzh [] and @ stands for ta [] since there is no corresponding letters in English for these letters.</Paragraph>
    <Paragraph position="6"/>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Acquisition System
</SectionTitle>
    <Paragraph position="0"> The system reads the next noun in the text, isolates and analyzes the suffixes of the noun, generates its pattern, and uses either the Classified Noun Table, the Suffix/Pattern Analysis or the User-Feedback Module to find the group to which the noun belongs to identify the rules that applies to this group to generate all morphological paradigms with respect to the number and gender and updates the database.</Paragraph>
    <Paragraph position="1"> The system consists of several modules as shown in Figure 1.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Interface Module
</SectionTitle>
      <Paragraph position="0"> This graphical user interface allows the user to interact with the system and handles the input/output. This module displays a main menu with two main options: collect nouns from documents and find morphological information.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Type-Finder Module
</SectionTitle>
      <Paragraph position="0"> The main function of this module is to read the document and find the part of speech of the word: noun, verb, adjective, particle or proper noun by running several tests: Database lookup, particle check, check on adjectives derived from proper nouns, parse of noun phrases and verb phrases, the affix check and the pattern check This module was built by Abuleil and Evens (1998, 2001). We use this module in our new</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Database
</SectionTitle>
      <Paragraph position="0"> The database includes a Classified Noun Table that contains each root noun (singular: masculine or feminine) and the number of the group to which the noun belongs. Each time the system identifies a new noun it adds its root to the Classified Noun Table.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.4 Noun Morphology Analyzer
Module
</SectionTitle>
      <Paragraph position="0"> This is the core of the system, it calls different modules and performs different tasks to identify the noun and find its paradigm. First, it passes the noun to the suffix analyzer module to drop the suffix. Second, it passes it to the pattern generator module to find the pattern. Third, it analyzes the pattern to see whether it belongs to more than one group. It checks the Classified Nouns Table and then the suffix/pattern to  identify the group that the noun belongs to. If the system cannot identify the group then it calls the user-Feedback module to produce some questions to be answered by the user to reduce the number of alternatives to one. Finally, depending on the group the noun belongs to, it generates the morphological paradigms for number and gender and updates the database.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.5 Suffix Analyzer Module
</SectionTitle>
      <Paragraph position="0"> This module identifies the suffix, analyzes it and produces some lexical information about the noun like number and gender. First, it checks if any pronoun is concatenated with the noun.</Paragraph>
      <Paragraph position="1"> Second, it checks for a suffix indicating number.</Paragraph>
      <Paragraph position="2"> Third, it checks for a suffix indicating gender.</Paragraph>
      <Paragraph position="3"> When the letter (y) comes at the end of the noun there are two cases: it could be a part of the noun so we should not drop it, or it could be an extra letter as in relative nouns or when the pronoun is connected to the noun and it should be dropped in this case. When the noun ends with the letters (), most of the time it represents dual nouns but some times it represents both plural and dual nouns as in the following patterns: mfa9l, fa9l, mf9ull.</Paragraph>
      <Paragraph position="4"> Sometimes we have to check the pattern also to help in analyzing the suffix. We will handle these problems as special cases.</Paragraph>
    </Section>
    <Section position="6" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.6 Pattern Generator Module
</SectionTitle>
      <Paragraph position="0"> We have collected 62 different patterns used for both masculine and feminine, singular and plural after the suffix has been dropped see Appendix A. We used these patterns to generate a set of rules to build a finite-state diagram to be used to find the pattern for any noun. The input to this module is a noun after its suffix has been dropped in the previous step, the output is one or more patterns. If more than one pattern is found we validate the string by checking the pattern table.</Paragraph>
      <Paragraph position="1"> The letter (m) and the letter () at the beginning of the noun are sometimes the first characters of the noun, but sometimes they are separate words. We collected the nouns that begin with the letter (m) and the letter () and saved them in a file to help us to distinguish between these two cases.</Paragraph>
    </Section>
    <Section position="7" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.7 Database Checker Module
</SectionTitle>
      <Paragraph position="0"> This module identifies any already classified noun or any noun derived from it. It gets the noun and its pattern from the noun morphology analyzer, finds all groups that contain the pattern, finds the singular noun (masculine or feminine) in each group and uses it to check the Classified Noun Table. If the noun exists it gets the group number to which it belongs and passes it to the Noun Morphology Analyzer to generate the results. For example the noun ( playground) has the pattern (mfa9l). This pattern appears in three different groups. See table 2.</Paragraph>
      <Paragraph position="2"> The nouns formed from these patterns have the following paradigms. See table 3.</Paragraph>
      <Paragraph position="4"> If the noun itself or any other noun derived from it has been previously classified we will find its noun root (singular noun) in the Classified Noun Table. The module will find the root (singular masculine) in the table and will get its group number 2 and pass it to Noun Morphology Analyzer to find the noun paradigms.</Paragraph>
    </Section>
    <Section position="8" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.8 User-Feedback Module
</SectionTitle>
      <Paragraph position="0"> This module gets all alternatives (groups) from the noun morphology analyzer module. It analyzes them and generates some questions to be answered by the user. It gets the answers, analyzes them and finds the group that the noun belongs to. The module asks questions like: Is the noun a singular? Is the noun a plural? Does the noun have a masculine-singular format? Does the noun have a feminine-singular format?</Paragraph>
      <Paragraph position="2"> Step #3: Add the ones in each column and subtract it from number of groups. Add the (1s) in each column and subtract it from number of groups. Add the (0s) in each column.</Paragraph>
      <Paragraph position="4"> From the table above we know that: the probability that the noun is singular masculine is 33.3% and the probability that it is a plural feminine is 66.6%.</Paragraph>
      <Paragraph position="5"> Step #4: Pick the smallest value greater than 0 from the A1 row and the B1 row go from left to right and from top to bottom. Use the column name to form questions. For the A1 value use the following question: is the noun a [column name]? For the B1 use the following question: does the noun have the [column name] format? Get the answer and drop invalid group(s).</Paragraph>
      <Paragraph position="7"> Step #5: Repeat step 3 and step 4 until you end up with one group or all the values in both Row A1 and row B1 have the values either zero or the number of groups left.</Paragraph>
      <Paragraph position="8"> Step #6: if more than one group is left from step #5 then find the largest value in the row C from left to right and ask the following question: which of the following [list all the options in that column] is the [column name] of the noun?</Paragraph>
      <Paragraph position="10"> The questions the module generated from the previous example are: Q1: is the noun plural feminine? Answer: yes // the system drops group#3 Q2: does the noun have singular masculine format? Answer: No // the system drops group#1 Result: Group # 2: The noun ( playground) is a plural Feminine. The singular Masculine format is ( ), the singular Feminine format and plural masculine format are not available for this noun.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Examples
</SectionTitle>
    <Paragraph position="0"> The following example shows how the system works. Assume that the input is the noun ( r their trainer), First the system calls the suffix analyzer module to drop the extra letter (pronoun: their) at the end ( h + r), replace the letter (t) with the letter (@), generate the noun (r trainer) and some lexical information about the noun.</Paragraph>
    <Paragraph position="1"> Second, it passes the noun (r trainer) to the pattern generator module to generate the pattern (mf9l@). Third, it checks the group table looking for this pattern (mf9l@). Fourth, if more that one group is found it uses the Database Checker Module to check the Classified Noun Table. Fifth, if the noun does not exist in the table, it calls the User-Feedback Module to analyze the groups (all alternatives) and asks the user some questions to assist in identifying the group see Table 4 and Table 5. The question that the module generated is: Question: Does the noun have a masculine-singular format?</Paragraph>
    <Paragraph position="3"/>
    <Paragraph position="5"> Fifth, it generates the results: group#38 and updates the database. Table 6 shows system output for some input.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML