File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1118_metho.xml

Size: 16,419 bytes

Last Modified: 2025-10-06 14:08:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1118">
  <Title>Controlling Gender Equality with Shallow NLP Techniques</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Gender Inequality in German
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Texts
</SectionTitle>
      <Paragraph position="0"> Most prominent to achieve gender equality on a linguistic level in German texts is to nd solutions and alternatives for the so-called generic masculine: the masculine form is taken as the generic form to designate all persons of any sex. The major problem is to gure out whether or not a given person denotation refers to a particular person. For instance, in example (1a) \Beamter&amp;quot; (o cer) is most likely used in its generic reading and refers to female o cers (Beamtinnen) and masculine o cers (Beamten). To achieve gender equality an appropriate reformulation is required as shown in example (1b).</Paragraph>
      <Paragraph position="1"> (1a) Der Beamte muss den Anforderungen Gen uge leisten.</Paragraph>
      <Paragraph position="2"> (1b) Alle Beamten und Beamtinnen m ussen den Anforderungen Gen uge leisten.</Paragraph>
      <Paragraph position="3"> Since we tackle texts from administrative and legal domains we principally assume unspecied references. That is, a masculine (or feminine!) noun will not denote a concrete person but rather refers to all persons, irrespectively of their sex.</Paragraph>
      <Paragraph position="4"> A second class of errors are masculine relative, possessive and personal pronouns which refer to a generic masculine or an inde nite masculine pronoun.</Paragraph>
      <Paragraph position="5"> (2a) Der Beamte muss seine Wohnung in der N ahe des Arbeitsplatzes suchen.</Paragraph>
      <Paragraph position="6"> (2b) Jeder muss seinen Beitrag dazu leisten.</Paragraph>
      <Paragraph position="7"> (2c) Wer Rechte hat, der hat auch P ichten.</Paragraph>
      <Paragraph position="8"> The possessive pronoun \seine&amp;quot; (his) in example (2a) refers to the preceeding \Beamte&amp;quot; (ofcer). The generic masculine use of \Beamte&amp;quot; and the referring pronoun will be marked. The same holds for sentence (2b) where the possessive pronoun refers to the inde nite pronoun \jeder&amp;quot; (everymasc). The inde nite pronouns \jemand&amp;quot; (someone) and \wer&amp;quot; (who) count as acceptable. However, masculine pronouns referring to it will be marked. In example (2c), the masculin relative pronoun \der&amp;quot; can be omitted. A third class of gender inequality is lack of agreement between the subject and the predicative noun. Example (3a) gives an example where the masculine subject \Ansprechpartner&amp;quot; (partnermasc) occurs with the a female object \Frau M uller&amp;quot; (Mrs. M uller).</Paragraph>
      <Paragraph position="9"> (3a) Ihr Ansprechpartner ist Frau M uller.</Paragraph>
      <Paragraph position="10"> (3b) Ihre Ansprechpartnerin ist Frau M uller.</Paragraph>
      <Paragraph position="11"> A solution for this class of errors is shown in example (3b) where the subject (Ansprechpartnerin) is adapted to the female gender of the predicate.</Paragraph>
      <Paragraph position="12"> Suggestions to reformulate gender imbalances as shown in examples (1) and (2) can be classied in two main categories: 1. Whenever possible, use gender neutral formulations. These include collectiva (e.g. Lehrk orper (teaching sta ) or Arbeitnehmerschaft (collective of employees)) as well as nominalized participles (Studierende (scholar)) or nominalized adjectives (Berechtigte).</Paragraph>
      <Paragraph position="13"> 2. Use both forms if gender neutral formulations cannot be found. That is, the feminine and the masculine form are to be co-ordinated with \und&amp;quot;, \oder&amp;quot; or \bzw.&amp;quot;. A coordination with slash \/&amp;quot; will also be suggested but should only be used in forms, ordinance and regulations.</Paragraph>
      <Paragraph position="14"> Amendments should accord to general German writing rules. The so called \Binnen-I&amp;quot;, an upper case \I&amp;quot; as in \StudentInnen&amp;quot; will not be suggested and also naming of the female su x in parenthesis should be avoided. The same holds for the inde nite pronoun \frau&amp;quot; (woman) which was occasionally suggested to complement the  pronoun \man&amp;quot;.</Paragraph>
      <Paragraph position="15"> 3 The Gendercheck Editor Controlled-Language Authoring Technology (CLAT) CLAT has been developed to suit the  need of some companies to automatically check their technical texts for general language and company speci c language conventions. Within CLAT, texts are checked with respect to: orthographic correctness company speci c terminology and abbreviations null general and company speci c grammatical correctness stylistic correctness according to general and company speci c requirements The orthographic control examines texts for orthographic errors and proposes alternative writings. The terminology component matches the text against a terminology and abbreviation database where also term variants are detected  (Carl et al., 2004). Grammar control checks the text for grammatical correctness and disambiguates multiple readings. Stylistic control detects stylistic inconsistencies.</Paragraph>
      <Paragraph position="16"> The components build up on each other's output. Besides the described control mechanisms, CLAT also has a graphical front-end which makes possible to mark segments in the texts with di erent colors. Single error codes can be switched o or on and segments of text can be edited or ignored according to the authors need. CLAT also allows batch processing where XML-annotated text output is generated. Figure 1 shows the graphical interface of the editor. The lower part of the editor plots an input sentence. The highlighted SGML codes are manually annotated gender \mistakes&amp;quot;. The upper part plots the automatically annotated sentence with underlined gender mistakes.</Paragraph>
      <Paragraph position="17"> As we shall discuss in section 5, gender imbalances are manually annotated to make easier automatic evaluation. In this example, the highlighted words \Deutscher&amp;quot; (German) and \EG-B urger&amp;quot; (EU-citizen) are identical in the manually annotated text and in the automatically annotated text. The user can click on one of the highlighted words in the upper window to display the explanatory message in the middle part of the screen. Further information and correction or reformulation hints can also be obtained by an additional window as shown on the right side of the gure. The messages are designed according to main classes of gender discriminatory formulations as previously discussed. null</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Gender Checking Strategy
</SectionTitle>
    <Paragraph position="0"> Gendercheck uses a marking and ltering strategy: rst all possible occurrences of words in an error class are marked. In a second step \gendered&amp;quot; formulations are ltered out. The remaining marked words are assigned an error code which is plotted in the Gendercheck editor.</Paragraph>
    <Paragraph position="1"> According to the classi cation in section 2, this section examines the marking and ltering strategy for generic masculine agents in section 4.1, pronouns which refer to generic masculine agents (section ??) and errors in agreement of predicative nouns (section ??).</Paragraph>
    <Paragraph position="2"> Marking and ltering is realized with kurd a pattern matching formalism as described in (Carl and Schmidt-Wigger, 1998; Ins, 2004). Input for kurd are morphologically analyzed and semantically tagged texts.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Class 1: Agents
</SectionTitle>
      <Paragraph position="0"> Two mechanisms are used to mark denotations of persons: a) The morphological analysis of mpro (Maas, 1996) generates not only derivational and in ectional information for German words, but also assigns a small set of semantic values. Male and female human agents such as \Soldat&amp;quot; (soldier), \B urgermeister&amp;quot; (mayormasc), \Beamte&amp;quot; (o cermasc), \Krankenschwester&amp;quot; (nursefem) etc. are assigned a semantic feature s=agent. Words that carry this feature will be marked style=agent.</Paragraph>
      <Paragraph position="1"> b) Problems occur for nouns if the base word is a nominalized verb. For instance \Gewichtheber&amp;quot; (weightlifter) und \Busfahrer&amp;quot; (bus driver) will not be assigned the feature s=agent by mpro since a \lifter&amp;quot; and a \driver&amp;quot; can be a thing or a human. Gender inequalities, however, only apply to humans. Given that the tool is used in a restricted domain, a special list of lexemes can be used to assign these words the style feature style=agent. The kurd rule Include shows some of the lexemes from this list. The list contains lexemes to cover a maximum number of words. For instance the lexeme absolvieren (graduate) will match  Lines 3 to 8 enumerate a list of lexemes separated by a semicolon. The column in line 3 following the attribute name ls tells kurd to interpret the values as regular expressions. Since the dollar sign $ matches the end of the value in the input object, each lexeme in the list can also be the head of a compound word.</Paragraph>
      <Paragraph position="2"> Thus, the test ls:fahren$ matches all lexemes that have fahren as their head words, such as \Fahrer&amp;quot; (driver), \Busfahrer&amp;quot; (bus driver), etc. The action Ag{style=agent} marks the matched words as an agent.</Paragraph>
      <Paragraph position="3">  The text then undergoes several lters to delete marks in words if the appear within gendered formulations.</Paragraph>
      <Paragraph position="4"> a) Excluded are marked agents which preceed a family name. The marking of \Beamte&amp;quot; in example (4) will be erased since it is followed by the family name \Meier&amp;quot;. \Beamte Meier&amp;quot;  is likely to have a speci c reference.</Paragraph>
      <Paragraph position="5"> (4) Der Beamte Meier hat gegen die Vorschrift versto en.</Paragraph>
      <Paragraph position="6"> In terms of kurd this can be achieved with the rule AgentMitFname: if a family name (s=fname) follows a sequence of marked agents (style=agent) the marks in the agent nodes are removed (r{style=nil}).</Paragraph>
      <Paragraph position="7">  jectives and participles since they are well suited for gender neutral formulations. In example (5), the nominalized plural adjective \Sachverst andige&amp;quot; (experts) is ambiguous with respect to gender. The mark will thus be removed. null (5) Sind bereits Sachverst andige bestellt? c) Marked words in already gendered formulations are also erased. Pairing female and male forms by conjunction is a recommended way to produce gender equality. In example (6) the subject \Die Beamtin oder der Beamte&amp;quot; (the o cerfem or the o cermasc) as well as the pronouns which refer to it \sie oder er&amp;quot; (she or he) and \ihrer oder seiner&amp;quot; (her or his) are gender equal formulations.</Paragraph>
      <Paragraph position="8"> (6) Die Beamtin oder der Beamte auf Lebenszeit oder auf Zeit ist in den Ruhestand zu versetzen, wenn sie oder er infolge eines k orperlichen Gebrechens oder wegen Schw ache ihrer oder seiner k orperlichen oder geistigen Kr afte zur Erf ullung ihrer oder seiner Dienstp ichten dauernd unf ahig (dienstunf ahig) ist.</Paragraph>
      <Paragraph position="9"> The kurd rule gegendert removes these marks. The description in lines 2 to 5 matches a conjunction of two marked agents (style=agent) which share the same lexeme ls=_L but which are di erent in gender. This latter constraint is expressed in two variables ehead={g=_G} and ehead={g~=_G} which only unify if the gender features \g&amp;quot; have non-identical sets of values.</Paragraph>
      <Paragraph position="10">  The rule allows the conjunctions \und&amp;quot;, \oder&amp;quot;, \bzw.&amp;quot; and \/&amp;quot;.</Paragraph>
      <Paragraph position="11"> d) Some nouns are erroneously marked even if no gender equal formulation is possible. For instance words such as \Mensch&amp;quot; (human being),a \Gast&amp;quot; (guest), \Fl uchtling&amp;quot; (refugee) are masculine in gender, yet there is no corresponding female form in German. These words are included in an exclude list which works similar to the include list previously discussed.</Paragraph>
      <Paragraph position="12">  a) Currently, we do not mark compound nouns which have an agent as their modi er and a non-agent as their head. However, also words such as \Rednerpult&amp;quot; (talker desk = lectern) and \Teilnehmerliste&amp;quot; (participants list = list of participants) are suitable for gender main-streaming and should be spelled as \Redepult&amp;quot; (talk desk) and \Teilnehmendeliste&amp;quot; (participating list).</Paragraph>
      <Paragraph position="13"> b) We do not mark articles and adjectives which preceed the marked noun. This would be troublesome in constructions like example (7) where the article \der&amp;quot; (the) and the corresponding noun \Dezernent&amp;quot; (head of department) are separated by an intervening adjectival phrase.</Paragraph>
      <Paragraph position="14">  (7) Den Vorsitz f uhrt der jeweils f ur die Aufgaben zust andige Dezernent.</Paragraph>
      <Paragraph position="15"> c) It is currently impossible to look beyond  the sentence boundary. As a consequence, the reference of a agent cannot be detected if it occurs in the preceeding sentence. For instance \Herr M uller&amp;quot; is the reference of \Beamte&amp;quot; in the second sentence in example (8).</Paragraph>
      <Paragraph position="16"> (8) Herr M uller hat die Dienstvorschrift verletzt. Der Beamte ist somit zu entlassen.</Paragraph>
      <Paragraph position="17"> The word \Beamte&amp;quot; will be erroneously marked because information of the preceeding sentence is not available to resolve the reference.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Class 2: Pronouns
</SectionTitle>
      <Paragraph position="0"> Also personal pronouns, possessive pronouns, relative pronouns and inde nite pronouns are marked. The strategy is similar to the one for agents above: rst all pronouns are marked and in a second step markings in correct formulations are erased.</Paragraph>
      <Paragraph position="1"> With the exception of inde nite pronouns (\Mancher&amp;quot;, \Jemand&amp;quot;, \Niemand&amp;quot; etc.), a marked referent agent must be available in the same sentence. Three di erent rules are used to mark relative pronouns, personal pronouns and possessive pronouns.</Paragraph>
      <Paragraph position="2">  marked agent in line 2. Lines 3 and 4 search the next comma1 that follows the marked agent and line 5 matches the relative pronoun2 that immediately follows the comma. The relative  pronoun must agree in gender with the agent (ehead={g=_G}). As we shall see in section 5, this is an error prone approximation to reference solution.</Paragraph>
      <Paragraph position="3"> b) Personal and possessive pronouns are only marked if they refer to a male agent. The two rules MarkPersonalPronomen and MarkPossesivPronomen work in a similar fashion: in line 2 the marked masculine reference is matched. Lines 3 and 4 match the following personal pronoun (c=w,sc=pers) and possessive pronoun (c=w,sc=poss). In lines 5, the pronouns are marked.</Paragraph>
      <Paragraph position="4">  After the marking step, pronoun marks are ltered. Filtering of pronouns is similar to the previously discussed rule gegendert.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Class 3: Predicative Noun
</SectionTitle>
      <Paragraph position="0"> Missing agreement between subject and predicative noun is detected with the following kurd  Lines 2 and 3 detect the marked subject. Notice that noun groups are marked with the feature mark=np by a previous chunking module. Lines 5 to 7 match the predicative noun. Both parts of the sentence are connected by the copula \sein&amp;quot; (be). Similar to the rule gegendert, the rule only applies if both parts are di erent in gender.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML