File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-1054_concl.xml
Size: 1,718 bytes
Last Modified: 2025-10-06 13:54:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1054"> <Title>A Quantitative Analysis of Lexical Differences Between Genders in Telephone Conversations</Title> <Section position="7" start_page="440" end_page="441" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> We have presented evidence of linguistic differences between genders using a large corpus of telephone conversations. We have approached the issue from a purely computational perspective and have shown that differences are profound enough that we can classify the transcript of a conversation side according to the gender of the speaker with accuracy close to 93%. Our computational tools have allowed us to quantitatively show that the gender of one speaker influences the linguistic patterns of the other speaker. Specifically, classifying same-gender conversations can be done with almost perfect accuracy, while evidence of some convergence of male and female linguistic patterns in cross-gender conversations was observed. An analysis of the features revealed that the most characteristic features for males are swear words while for females are family-relation words. Leveraging these differences in simple gender-dependent language models is not a win, but this does not imply that more sophisticated language model training methods cannot help. For example, instead of conditioning every word in the vocabulary on gender we can choose to do so only for the top-N, determined by KL or IG. The probability estimates for the rest of the words will be tied for both genders. Future work will examine empirical differences in other features such as dialog acts or turntaking.</Paragraph> </Section> class="xml-element"></Paper>