Big-Five Personality Prediction Based on User Behaviors at Social Network Sites
Many customer services are already available at Social Network Sites (SNSs), including user recommendation and media interaction, to name a few. There are strong desires to provide online users more dedicated and personalized services that fit into individual’s need, usually strongly depending on the inner personalities of the user. However, little has been done to conduct proper psychological analysis, crucial for explaining the user’s outer behaviors from their inner personality. In this paper, we propose an approach that intends to facilitate this line of research by directly predicting the so called Big-Five Personality from user’s SNS behaviors. Comparing to the conventional inventory-based psychological analysis, we demonstrate via experimental studies that users’ personalities can be predicted with reasonable precision based on their online behaviors. Except for proving some former behavior-personality correlation results, our experiments show that extraversion is positively related to one’s status republishing proportion and neuroticism is positively related to the proportion of one’s angry blogs (blogs making people angry).
💡 Research Summary
The paper investigates whether a user’s personality, as defined by the Big‑Five model, can be inferred automatically from observable behavior on a social networking site (SNS). The authors focus on Renren, a Chinese Facebook‑like platform, and collect both behavioral logs and self‑reported personality scores from participants.
Data collection is carried out through a custom web application (Dao) that authenticates users via Renren’s API. Once permission is granted, the system retrieves basic profile information (gender, age, hometown), usage statistics (friend count, status updates, blog posts, photo/video uploads), and the textual content of recent status messages and blogs. In parallel, each participant completes the 44‑item Big Five Inventory (BFI) developed by the Berkeley Personality Lab, yielding continuous scores (1–5) for Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness.
To transform raw logs into predictive inputs, the authors design 41 features grouped into five categories: (1) basic demographic and profile data (5 features), (2) general SNS usage metrics (28 features), (3) time‑related usage (3 features) such as counts of posts in the last month, (4) emotion‑related features (2 features) derived from an emotion classifier that tags each blog post as angry, funny, surprised, or moving, and (5) combined time‑and‑emotion features (3 features) that capture the dominant emotion of the most recent status and its duration. The emotion classifier is a Naïve Bayes model enhanced with an emotion lexicon, achieving over 80 % accuracy on a large text corpus.
Because the BFI scores are continuous, the authors discretize each dimension into three classes: low (1 → μ − σ), medium (μ − σ → μ + σ), and high (μ + σ → 5), where μ and σ are the mean and standard deviation of the scores for that trait. This yields class distributions such as 62 low, 92 medium, and 55 high samples for Extraversion. For a second set of experiments, the middle class is removed, turning the problem into a binary classification (low vs. high).
Multiple machine‑learning algorithms are evaluated (Naïve Bayes, Support Vector Machine, Decision Tree, etc.). The C4.5 decision‑tree algorithm consistently outperforms the others. Using 10‑fold cross‑validation on the three‑class task, the average F‑measure across the five traits ranges from 0.697 (Openness) to 0.723 (Agreeableness). When the problem is reduced to two classes, performance improves markedly, with F‑measures reaching 0.839 for Extraversion, 0.825 for Conscientiousness, and 0.749 for Neuroticism.
The authors also examine specific feature‑trait relationships. Extraversion shows a positive correlation with the proportion of status updates that are reposted, while Neuroticism correlates positively with the proportion of “angry” blogs. These findings replicate earlier psychology studies that linked outgoing behavior with extraversion and emotional instability with negative affect.
Despite promising results, the study has several limitations. The participant pool consists of 209 active Renren users (average age 23.8 years), primarily university students, which restricts the generalizability of the model to broader populations. Moreover, the overall predictive accuracy hovers around 70 % for three‑class classification, indicating that behavioral features alone cannot fully capture the nuanced nature of personality. The reliance on a single SNS platform also raises concerns about cross‑platform applicability.
Future work suggested by the authors includes: (1) aggregating data from multiple SNSs (e.g., Facebook, Twitter) to test the robustness of the model across cultural and platform differences; (2) incorporating deep‑learning‑based text, image, and possibly audio sentiment analysis to enrich emotion‑related features; (3) moving from classification to regression to predict continuous personality scores directly, thereby avoiding information loss caused by discretization; and (4) expanding the sample size and demographic diversity to improve external validity.
In conclusion, the paper demonstrates that a systematic combination of SNS usage logs and emotion analysis can predict Big‑Five personality traits with reasonable precision. This bridges a gap between psychological assessment and computational social science, opening avenues for personalized recommendation systems, targeted advertising, and mental‑health monitoring that are grounded in automatically inferred personality profiles.
Comments & Academic Discussion
Loading comments...
Leave a Comment