File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-3403_evalu.xml
Size: 5,709 bytes
Last Modified: 2025-10-06 13:59:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3403"> <Title>Computational Measures for Language Similarity across Time in Online Communities</Title> <Section position="6" start_page="18" end_page="19" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> The tools we employ approach document similarity quite differently; we therefore compare findings as a way of triangulating on the nature of entrainment in the Junior Summit online community.</Paragraph> <Section position="1" start_page="18" end_page="18" type="sub_section"> <SectionTitle> 5.1 Pairwise Comparisons over Time </SectionTitle> <Paragraph position="0"> First, we hypothesized that messages between individuals in a given topic group would demonstrate more similarity over time. Our findings did not support this claim; in fact, they show the opposite.</Paragraph> <Paragraph position="1"> All three tests show slight convergence between time period one and two, some variation, and then divergence between time periods four, five and six.</Paragraph> </Section> <Section position="2" start_page="18" end_page="19" type="sub_section"> <SectionTitle> Spearman's Correlation Coefficient demon- </SectionTitle> <Paragraph position="0"> strates a steady decline in similarity. As shown in Figure 1, the differences between time periods were all significant, F (5,1375) = 21.475, p<.001, where N=1381 (N represents user pairs across all six time periods).</Paragraph> <Paragraph position="1"> Zipping also shows a significant difference between each time period, F (5,1190) = 39.027, p<.001, N=1196, demonstrating a similar decline in similarity, although not as unwavering. See Figure 2. LSA demonstrates the same divergent trend over . While the dip at time 3 is more pronounced than SCC and Zipping, it is still consistent with the overall findings of the other measures. See Figure 3.</Paragraph> <Paragraph position="2"> .</Paragraph> <Paragraph position="3"> Because of these surprising findings, we examined the influence of demographic variables, such as leadership (those chosen as delegates from each topic group to the in-person forum), gender, and the particular topic groups the individuals were a part of. We divided delegate pairs into (a) pairs where both individuals are delegates; (b) pairs where both individuals are non-delegates; and (c) mixed pairs of delegates and non-delegates. Similarly, gender pairs were divided into same-sex (e.g., male-male, female-female) and mixed-sex pairs. For topic groups, we re-ran our analyses on each of the 20 topic groups separately. Overall, both leaders and gender pairs demonstrate the same divergent trends as the group as a whole. However, not all tests showed significant differences when comparing these pairs.</Paragraph> <Paragraph position="4"> For instance, Spearman's Correlation Coefficient found a significant difference in similarity between three groups, where F (2,273) = 6.804, p<.001, n=276, such that delegate-delegate pairs demonstrate higher similarity scores than nondelegate pairs and mixed pairs. LSA found the same result, F(2,280) = 11.122, p<.001 n=283. By contrast, Zipping did not find this to be the case, where F (2,226) = 2.568, p=.079, n=229.</Paragraph> <Paragraph position="5"> In terms of the potential effect of gender on similarity scores, Zipping showed a significant difference between the three groups, F (2,236) = 3.546, p<.05, n=239, such that female-female pairs and mixed-sex pairs demonstrate more similarity than male-male pairs. LSA found the same relationship, F (2,280) = 4.79, p<.005 n=283. By contrast, Spearman's Correlation Coefficient does not show a significant between-groups difference, F (2,273) = .699, p=.498, n=276.</Paragraph> <Paragraph position="6"> In terms of differences among the topic groups, we did indeed find differences such that some topic groups demonstrated the fairly linear slope with decreasingly similarity shown above, while others demonstrated dips and rises resulting in a level of similarity at T6 quite similar to T1. There is no neat way to statistically measure the differences in these slopes, but it does indicate that future analyses need to take topic group into account. In sum, we did not find leadership or gender to mediate language similarity in this community. Topic group, on the other hand, did play a role, however no topic groups showed increasing similarity across time.</Paragraph> </Section> <Section position="3" start_page="19" end_page="19" type="sub_section"> <SectionTitle> 5.2 Similarity and Temporal Proximity </SectionTitle> <Paragraph position="0"> Our second hypothesis concerned the gradual change of language over time such that temporal proximity of time periods would correlate with mean similarity. In other words, we expect that messages in close time periods (e.g., adjacent weeks) should be more similar than messages from more distant time periods. In order to examine this, we performed two individual tests, in which our predictions can be described as follows: (a) the similarity between texts in one time period and texts in the neighboring time period is greater than texts in one time period, and texts that came two periods previously, S(M</Paragraph> <Paragraph position="2"> ); and (b) the similarity between texts in one time period and texts in the neighboring time period is greater than the similarity between texts in one time period, and texts in the very first time period, S(M</Paragraph> <Paragraph position="4"> As shown in Table 1, SCC and Zipping tests confirm these hypotheses, while none of the LSA tests revealed significant differences.</Paragraph> </Section> </Section> class="xml-element"></Paper>