On the Role of Social Identity and Cohesion in Characterizing Online Social Communities

On the Role of Social Identity and Cohesion in Characterizing Online   Social Communities
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Two prevailing theories for explaining social group or community structure are cohesion and identity. The social cohesion approach posits that social groups arise out of an aggregation of individuals that have mutual interpersonal attraction as they share common characteristics. These characteristics can range from common interests to kinship ties and from social values to ethnic backgrounds. In contrast, the social identity approach posits that an individual is likely to join a group based on an intrinsic self-evaluation at a cognitive or perceptual level. In other words group members typically share an awareness of a common category membership. In this work we seek to understand the role of these two contrasting theories in explaining the behavior and stability of social communities in Twitter. A specific focal point of our work is to understand the role of these theories in disparate contexts ranging from disaster response to socio-political activism. We extract social identity and social cohesion features-of-interest for large scale datasets of five real-world events and examine the effectiveness of such features in capturing behavioral characteristics and the stability of groups. We also propose a novel measure of social group sustainability based on the divergence in group discussion. Our main findings are: 1) Sharing of social identities (especially physical location) among group members has a positive impact on group sustainability, 2) Structural cohesion (represented by high group density and low average shortest path length) is a strong indicator of group sustainability, and 3) Event characteristics play a role in shaping group sustainability, as social groups in transient events behave differently from groups in events that last longer.


💡 Research Summary

Introduction and Motivation
The rapid expansion of online social networks has made it essential to understand how digital communities form, evolve, and persist. Two classic theories from social psychology—social identity theory and social cohesion theory—offer competing explanations. Identity theory argues that individuals join groups because they perceive themselves as members of a shared category, whereas cohesion theory posits that mutual interpersonal attraction and the resulting network structure are the primary drivers of group formation. While both have been extensively studied in offline settings, their applicability to large‑scale, real‑time platforms such as Twitter remains under‑explored. This paper seeks to bridge that gap by operationalizing both theories on Twitter data and evaluating their predictive power for group behavior and sustainability.

Data Collection and Event Selection
The authors selected five real‑world events that differ in duration, social significance, and participant composition: Hurricane Irene (natural disaster), Hurricane Sandy (natural disaster), India Anti‑Corruption protests (political activism), Occupy Wall Street (social movement), and Anti‑SOP A (technology‑policy community). For each event, a seed list of keywords and hashtags was built, and a custom crawler used Twitter’s Streaming API to collect all tweets containing those terms. The Search API was also employed to retrieve historical tweets, limited to the most recent 1,500 per query. Metadata such as user location, description, follower/following counts, and the full follower graph of participants were stored. Privacy‑protected accounts were excluded. Tweets were partitioned into daily time slices to enable temporal analyses. Table 1 in the paper reports that the datasets range from 183 K tweets (Irene) to 4.9 M tweets (Sandy), with user counts from 77 K to 1.8 M.

Community Detection
To study “social groups” beyond mere keyword co‑occurrence, the authors constructed an interaction graph where vertices represent users and edges indicate at least one retweet, mention, or reply between the two users during the entire observation window. A multi‑level graph clustering algorithm (reference


Comments & Academic Discussion

Loading comments...

Leave a Comment