Dynamic Personality Adaptation in Large Language Models via State Machines

The inability of Large Language Models (LLMs) to modulate their personality expression in response to evolving dialogue dynamics hinders their performance in complex, interactive contexts. We propose a model-agnostic framework for dynamic personality…

Authors: Leon Pielage, Ole Hätscher, Mitja Back

Dynamic Personality Adaptation in Large Language Models via State Machines
D Y N A M I C P E R S O N A L I T Y A DA P T A T I O N I N L A R G E L A N G UA G E M O D E L S V I A S T A T E M A C H I N E S P R E P R I N T Leon Pielage 1,2 , Ole Hätscher 3 , Prof . Dr . Mitja Back 3 , Prof . Dr . med. Bernhard Marschall 4 , and Prof . Dr . Benjamin Risse * 1,2 1 Institute for Geoinformatics, Univ ersity of Münster , 48149 Münster, German y 2 Faculty of Mathematics and Computer Science, Uni v ersity of Münster , 48149 Münster , Germany 3 Department of Psychology , Univ ersity of Münster , 48149 Münster, German y 4 Institute of Medical Education and Student Aff airs, Uni versity of Münster , 48149 Münster , Germany February 26, 2026 A B S T R AC T The inability of Lar ge Language Models (LLMs) to modulate their personality e xpression in response to ev olving dialogue dynamics hinders their performance in comple x, interactiv e conte xts. W e propose a model-agnostic frame work for dynamic personality simulation that employs state machines to represent latent personality states, where transition probabilities are dynamically adapted to the con v ersational context. P art of our architecture is a modular pipeline for continuous personality scoring that ev aluates dialogues along latent axes while remaining agnostic to the specific personality models, their dimensions, transition mechanisms, or LLMs used. These scores function as dynamic state variables that systematically reconfigure the system prompt, steering beha vioral alignment throughout the interaction. W e ev aluate this framework by operationalizing the Interpersonal Cir- cumplex (IPC) in a medical education setting. Results demonstrate that the system successfully adapts its personality state to user inputs, but also influences user behavior , thereby facilitating de-escalation training. Notably , the scoring pipeline maintains comparable precision e v en when utilizing lightweight, fine-tuned classifiers instead of large-scale LLMs. This work demonstrates the feasibility of modular , personality-adapti v e architectures for education, customer support, and broader human-computer interaction. Keyw ords LLM · affecti ve computing · personality · HCI · IPC 1 Introduction Large Language Models (LLMs) are increasingly inte grated into daily life, both in professional conte xts and during leisure activities, offering assistance across a growing range of cognitive and communicativ e tasks [ 25 , 27 ]. Along- side advances in their technical capabilities, recent years ha v e witnessed growing interest not only in what LLMs communicate, but also in ho w the y do so. A key dev elopment in this regard is the introduction of customizable assistant "personas" in top-tier models, which allo w users to tailor the communicati ve style of the assistant for a more personalized and coherent interaction experience. Concurrently , researchers ha ve be gun to e xplore the capacity of LLMs to simulate and assess human personality . Sev eral studies have shown that LLMs can reliably express personality dif ferences and generate psychologically plausible responses across div erse scenarios [ 20 , 36 , 39 ]. Howe v er , both model-le vel customization and e xisting research efforts hav e lar gely focused on static personality representations, capturing ho w an assistant beha v es on a verage rather than how it adapts to the interpersonal dynamics of an unfolding conv ersation. Y et, in human-human interactions, one of the defining features of personality expression is its dynamic nature, as individuals adjust their communicativ e behavior in ∗ b.risse@uni-muenster.de Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T response to their interaction partner’ s tone, behavior , and perceived personality [ 4 , 17 , 18 ]. Such dynamic adaptation is essential for creating more realistic and engaging simulations in applied domains such as medical education, human resources, or gaming. While state-of-the-art LLMs exhibit some de gree of user -contingent behavior , they remain limited in three important ways: 1. Responses are often shaped by fixed defaults (such as persistent friendliness) re gardless of user behavior; 2. the internal mechanisms underlying adaptation are opaque and not directly modifiable; and 3. there is currently no systematic or modular way to analyze, manipulate, or test dynamic personality adaptation in LLM-based agents. T o address these limitations, we propose a modular frame work for dynamic personality adaptation in LLMs that enables real-time adjustment of the model’ s personality in response to the ev olving interaction. The framework consists of a state machine architecture in which each state encodes a specific momentary personality expression. T ransition probabilities are continuously updated according to the linguistic content of incoming user messages. These messages are processed by auxiliary classifiers that estimate relevant personality-dimension scores, which in turn update the assistant’ s system prompt, guiding its subsequent communicativ e behavior . By implementing dynamic personality adjustment through prompt-le vel modulation, the frame work remains model-agnostic and forward-compatible, allo wing seamless integration with current and future LLMs. 2 Related W ork Recent studies show that large language models can simulate a variety of personality constructs (Big Five [ 20 ], Interpersonal Circumplex [ 36 ], Dark T riad [ 40 ], etc.) b ut most work concentrates on static, dimensional Big Fiv e representations [20]. T wo main methodological f amilies hav e emerged: 1. Editing the model: fine - tuning, adapters, low-rank adaptation (LoRA), or mixture - of - experts approaches that embed stable personality representations directly into the model’ s weights [36, 39]. 2. Inducing personality via prompts: supplying trait descriptors, prototypical adjectiv es, or psychometric items (often with numerical anchors) to steer the model’ s output without altering its architecture [ 5 , 7 ]. More sophisticated prompting le verages embedding - le vel kno wledge of trait org anization or LLM - enhanced prompts that amplify personality expression [40]. Evaluations report good validity , reliability , and internal coherence of the generated personalities [ 31 , 38 ], yet little attention has been paid to dynamic aspects of personality modeling, namely ho w personality expression may change ov er time in response to the interaction context. Personality inference from language is well-established via zero-shot prompting [ 10 , 33 ], fine-tuning [ 32 , 37 ], and supervised learning [ 3 ]. Zero-shot and prompt-based methods now achie ve accuracies partly comparable to supervised models [ 33 ], with human ev aluations showing high correspondence between self-rated and LLM-predicted traits, despite a potential positivity bias [ 10 ]. While these methods demonstrate strong internal consistency ( α > . 70 ) and con ver gent validity [ 26 ], they largely focus on static personality in text-rich or self-attributed data [ 33 ]. In contrast, dynamic interaction requires inferring momentary personality from sparse dialogue, an area that remains underexplored. Recently , interest in dynamic approaches to personality has begun to grow , both in terms of generation and assessment. For e xample, some studies explore ho w LLMs adjust their behavior in real-time within interactiv e and strategic settings, such as social dilemma games like the Prisoner’ s Dilemma [ 41 ]. Others focus on tracking interaction dynamics—i.e., the dyadic and temporal patterns that unfold across exchanges, through the alignment of textual content in both human–human [11] and AI–AI interactions [12, 19]. Despite this growing interest, only a few studies draw upon established models in personality science, particularly interpersonal and dynamic models, which provide a robust frame work for modeling such phenomena (e.g., [ 4 , 17 ]). Notable exceptions include: W ang et al. [ 37 ], who dev eloped a personality recognition model based on personality dynamics theory that tracks and updates the user’ s personality state throughout the interaction [ 24 ], who proposed a dynamic personality recognition and expression system for persuasiv e communication, modulating the influence of default and situational personality states; and [ 34 ], who fine-tuned a model to express personality in a context-sensitiv e manner . 2 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T 3 Method Our framew ork consists of two linked modules: an analyzer and a generativ e personality model. The analyzer predicts momentary personality states from user and assistant messages to determine ho w the generati ve model modulates the assistant’ s communicativ e style along predefined personality dimensions. These dimensions, or axes, represent latent traits such as dominance/agency or warmth/communion. The framew ork supports an arbitrary number of axes, generally maintaining a one-to-one correspondence between the analyzer and generati v e model, to remain adaptable to v arious psychological theories. Currently , we model each axis independently , implying that traits on different axes are treated as statistically independent factors. This assumption can be relaxed by introducing one-to-man y mappings where a personality state predicted by the analyzer influences multiple generativ e personality axes, or by cross-linking models to account for interdependencies between traits. Both modules can be implemented with models tailored to their respectiv e functions, for instance a classifier for the analyzer and a generati v e model for response production. Our experiments employ dedicated LLMs prompted for their specific roles. This model-agnostic design allows underlying language models to be easily replaced as ne wer systems emerge. The workflo w including analysis, personality-state updates, and response generation is sho wn in Figure 1. In addition to a standard chat interface, a de velopment mode pro vides helpful visualizations. In this mode, analyzed texts are highlighted and a separate plot displays the current personality state and transition probabilities, which facilitates inspection when personality inference from a single message is uncertain. 3.1 Personality Dimensions The Interpersonal Circumple x (IPC) [ 14 , 18 ] describes beha vior along two orthogonal dimensions: dominance (agency), reflecting assertiv eness and control, and warmth (communion), capturing empathy and friendliness. These dimensions define a circular space for locating and interpreting interpersonal styles. In our framew ork, these dimensions serve as examples for the personality axes used by the analyzer and generati v e model. Howe v er , our approach is not limited to the IPC and can extend to an y personality model or number of dimensions required by the application. 3.2 Message Analyzer The analyzer maps messages to numeric scores along defined personality dimensions. This mapping can occur at the sentence or message lev el; ho we ver , our preliminary comparisons showed that message-lev el analysis is more ef fecti ve because it lev erages broader context and captures diagnostic cues distrib uted une venly across sentences. Consequently , all subsequent experiments utilize message-lev el analysis. W e implemented three score-mapping strategies. First, off-the-shelf LLMs are prompted to output scores directly . Second, an LLM is specialized for the task via LoRA fine- tuning. Third, the model’ s final layer is replaced with a dedicated classification head to transform input representations into numeric scores. The follo wing sections detail these strate gies and the associated data generation process for the fine-tuned alternativ es. 3.2.1 Prompt Based P ersonality Mapping A system prompt is formulated for each personality axis to instruct an LLM to giv e a score within a certain range. Depending on the axis, the prompt may be very short. Howe ver , providing examples impro ves the quality and reduces the number of cases where the model does not adhere to the output format, resulting in an unparseable score, especially User Message Analyzer for Axis 1 Analyzer for Axis N P ersonality Axis 1 Dynamic Prompt Role Description Message Protocol P ersonality Axis N Dynamic Prompt Answer Generation Assistant Message Figure 1: The analysis of incoming messages and the selection of a personality prompt happen per dimension. T ogether with the role description and the message protocol, these provide the LLM-conte xt. 3 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T User Message Analyzer Message Analysis A B B Assistant P ersonality User P ersonality B User State Pr obabilities Figure 2: A user message is analyzed, resulting in scores according to a specific personality dimension. These scores can (A) either be used directly to update the assistant personality or (B) be used to update a separate user personality model, which can then be used to update the assistant personality . for really small models. For LLMs which support this feature we also define a prefix to guide the model to output a single number without additional text. An illustrativ e prompt for the agency dimension of the IPC model could be phrased as follows: Give a score between 0 and 10 where 0 is very submissive and 10 is very dominant. The score should be a single number. In comparison a longer prompt giv es more conte xt about the personality model and might e ven contain an e xample: You are a helpful assistant that analyzes the dominance of a sentence according to the interpersonal circumplex model. You will do this by giving a score between 0 and 10 where 0 is very submissive and 10 is very dominant. The score should be a single number. For example, ’No, you do it like this!’ should be scored as 10. 3.2.2 Data Generation In order to fine-tune LLMs for the personality scoring task we generated a synthetic dataset using OpenAI GPT 4.1 [ 25 ]. It consists of various sentences with labels for the two main axes of the IPC model both with a range from -5 to 5. 1210 sentences hav e a label for both axes. Moreover , 500 additional sentences per axis hav e a label exclusiv ely for dominance/agency or warmth/communion. The labels, like the sentences, were generated by GPT 4.1 by first generating a nuanced and standardized description per intensity step and then generating example sentences according to this description. While GPT 4.1 is able to generate realistic examples for all these intensities they might contain a bias and certainly do not cov er all sentences frequently used on a chat con versation as GPT 4.1 tended to generate sentences which clearly show specific personality traits. In contrast, real con v ersations contain many utterances that are too short or , for other reasons, do not hav e a clear tendency on all personality axes, such as "Okay" or "No". Therefore, we used 109 more realistic e xamples containing short and long messages from patients and doctors for the e v aluation. These examples were human-v alidated and curated if necessary , and then rated by three human experts with a background in psychology . W e used the median to determine the target rating for comparison with our model predictions, as it is more robust to outliers than the mean. Both the training and test samples are in German. 3.2.3 Fine-tuned Personality Mappings The above-mentioned dataset was used to fine-tune two small Qwen 2.5 models [ 27 ] with 0.5 billion parameters, applying two distinct fine-tuning strategies. In the first strategy , we fine-tuned the LLM using the LoRA method [ 15 ] to better adapt it to the scoring task. F or each personality axis, a separate LoRA fine-tuning run was performed with a short prompt defining the corresponding dimension. W e used a rank of 16, an alpha value of 32, and a dropout rate of 0.05 in all cases. The models were trained for 5 epochs with a batch size of 8 and a v alidation set size of 20%. After training, we selected the best-performing model across epochs based on validation performance. In the second strategy , we replaced the model’ s final layer with a lightweight classification head that outputs the score directly . Using transfer learning, one model per personality axis w as trained while k eeping all other layers frozen. The batch size and v alidation set size were kept identical to the LoRA setup, while the number of training epochs was increased to 10. As before, the best model was selected according to v alidation performance. 4 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T 3.3 Personality Model The personality of the assistant is modeled as a state machine, in which each personality dimension is represented by a separate and independently operating axis. Each axis maintains its o wn set of states and transition probabilities, analyzing messages according to distinct criteria and, if desired, using dif ferent models. This modular design allo ws personality dimensions to vary in scope and implementation. Axes can exist solely for the assistant or for both the assistant and the user . The latter configuration enables the system to track the user’ s momentary personality throughout the interaction and to update the assistant’ s transition probabilities based on the user’ s current personality state rather than only on recent message content. Each personality axis is influenced by multiple factors, most of which stem from internal representations and states that determine how transition probabilities ev olve over time. At the same time, external influences allo w the model to adapt dynamically to ongoing interactions. As described previously , these influences can be direct, based on the analysis results of the most recent message, or indirect, arising from the current personality state of another model (Figure 2). 3.3.1 State Machine The core of each personality axis is a state machine hea vily influenced by probabilistic automata [ 28 ] and Moore machines [ 9 ]. W e define it as ( S, Σ , Γ , δ, ω, s d , F ) . Following this definition we ha ve a finite set of k states S = { s 0 , s 1 , ..., s k − 1 } with a non-probabilistic starting state s d ∈ S and a set of accepting states F = S as all our states are valid end states at the end of the input sequence. An input sequence ( x 0 , x 1 , ..., x n ) of length n consists of multiple elements from the input alphabet Σ . Each input x ∈ Σ consists of two discrete probability distrib utions ov er all states x = ( P o , P q ) ∈ [0 , 1] 2 × k which both sum to 1 . Additional outside influences and the state probabilities are gi ven by P o = ( p ( s 0 ) , ..., p ( s k − 1 )) ∈ [0 , 1] k and P q = ( q ( s 0 ) , ..., q ( s k − 1 )) ∈ [0 , 1] k , respecti vely . The momentary personality expression depends on the current state s ∈ S and is described in the output alphabet Γ for each state. Using the output function ω : S → Γ it can be determined for each state. In our case Γ contains pre-defined system prompts for the LLM. Ho wev er , these can be replaced with other model inputs like parameter sets to control different do wnstream model types. The state transition function is defined as δ : S × Σ → [0 , 1] k with k = | S | being the number of states. For a state s ∈ S and an input x ∈ Σ the transition probability going into state s i ∈ S is gi v en through the probability p i ( s, x ) . Therefore, δ ( s, x ) = ( p 0 ( s, x ) , ..., p k − 1 ( s, x )) with P i p i ( s, x ) = 1 . Contrary to the definition of probabilistic automata [ 28 ] our transition probabilities are not fixed b ut re-calculated in each iteration. The next state is determined probabilistically based on these transition probabilities. In practice we also implemented a deterministic mode in which the next state is not chosen based on the transition probabilities but always as the state with the highest probability , effecti vely mak e it to a deterministic state machine. This greatly impro ves model stability and is applied to personality tracking for the user , where less e xploratory behavior is desired. 3.3.2 T ransition Probabilities While our model allo ws transitions between an y pair of states (including remaining in the same state), psychological theories of personality typically assume that transitions to neighboring states are more likely than transitions to distant ones [ 14 ]. As each state machine models only one personality dimension, neighboring states can be defined by ordering them along that dimension. Let the order of the states s 0 , ..., s k − 1 be gi ven through their index 0 , ..., k − 1 . T o approximate the transition probabilities to neighboring states we use multiple normal distributions. The full transition probabilities are calculated in each iteration from four probability distributions: • Default State The default state is also the starting state s d . W e use a normal distribution N ( s d ; σ 2 ) to spread the probability of being in the region of the default state. • Current State T o achieve a stable personality simulation, the current state s c has a high influence of the transition probabilities for the next state. Like the default state, it is approximated by a normal distrib ution N ( s c ; σ 2 ) . • Old State Probabilities The old state probabilities also influence the next transition probabilities, introducing a certain temporal inertia into the interaction. They are provided as part of the input x ∈ Σ . • Outside Influences These allow the model to adapt dynamically throughout the interaction. No specific distributional form is required, and we implemented two main sources of state probabilities: (1) direct analyzer results for each message, where p ( s ) = 1 for a specific state s if the entire message is analyzed as a single unit rather than split into sentences; and (2) state probabilities P ′ q obtained from a dif ferent personality axis (typically the same personality dimension in another personality model, e.g., the assistant’ s communion axis is updated according to the state probabilities of the user’ s communion axis). Both variants are illustrated in Figure 2. T o achieve a positi v e or negati ve correlation, the state probabilities can be mirrored beforehand by rev ersing the state order . 5 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T Tran sition Probabilities Answer Generation Dynamic Prompt Assistant Message State Tran sition User Message Analyzer For each personality axis Figure 3: Steps to generate a reply to an incoming user message. The new transition probabilities are calculated as a weighted linear combination of these four components whereas weights w further modulate the dynamics of a respectiv e personality: δ ( s c , ( P o , P q )) = w d ∗ N ( s d ; σ 2 ) + w c ∗ N ( s c ; σ 2 ) + w q ∗ P q + w o ∗ P o for a state s c , input ( P o , P q ) and the default state s d . The state probabilities also need to be recalculated to be used as input for the ne xt transition. W e calculate the new state probabilities ˆ P q equiv alent to the transition probabilities but without the influence of the current state: ˆ P q = ˆ w d ∗ N ( s d ; σ 2 ) + ˆ w q ∗ P q + ˆ w o ∗ P o . The weights w d , w c , w q and w o sum to one and can be used to influence how f ast the assistant adapts its momentary personality expression to the interaction, stays in its current state or falls back to the default state. T o calculate the state probabilities, we rescale them to satisfy: ˆ w d + ˆ w q + ˆ w o = 1 . Additionally , the standard de viation σ used for the normal distributions around the default and the current state controls how reacti ve the assistant is and ho w likely it will explore neighboring states. Empirically , we found that assigning a high weight to the current state ( w c = 0 . 5 ) stabilizes the system, while keeping the influence of the default state small ( w d = 0 . 1 ). For the assistant, we found that lo wer weights on the outside influence ( w o = 0 . 1 ) and a small reactivity ( σ = 0 . 1 ) led to more natural interactions when the updates were based on the user’ s personality . At the same time, we used the deterministic mode for the user personality and therefore selected higher v alues ( w o = 0 . 2 , σ = 0 . 6 ) to focus more on tracking the momentary personality based on each new message and to a void exploratory beha vior . 3.3.3 Generation Figure 3 visualizes how the answer to a user message is generated. Based on the analyzer ( Subsection 3.2) results for a specific personality dimension the transition probabilities in the corresponding personality axis are updated and the ne xt state is sampled. This state pro vides a te xtual description of how a giv en le v el of this personality dimension is e xpressed. The full system prompt for the LLM comprises two components (Figure 1). A fixed part that encodes static information about the interaction context, such as the name and age of a virtual agent (in our experiments acting as a patient), which we refer to as a role description. The second component is the dynamic part that is continuously updated based on the textual descriptions from all personality axes. Since they are independent of each other , they can be combined freely , thereby increasing the number of possible ov erall personality states of the assistant. In our experiments we focused on the IPC model [ 18 , 14 ] with its two main dimensions agency (or dominance) and communion (or warmth). W e approximated these dimensions in two axes, each with fi ve states. Each state has a prompt similar to the following, for a less agentic state: You are rather cautious and prefer to leave decisions to the other person. You rarely express yourself proactively and only address complaints or wishes with restraint. When you communicate something, you phrase it as cautiously as possible, perhaps with sentences like "I’m not sure, but...". You are easily influenced, but polite and cooperative. Due to the independence of the axes, the resulting personality model comprises 25 indi vidual states. 4 Evaluation W e conducted two experiments to ev aluate the feasibility of personality inference from short text inputs and the ef fectiv eness of our adapti v e personality in interactions. The first compared v arious LLMs on analyzing natural dialogue 6 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T T able 1: Performance of selected LLMs on the task of rating messages in terms of their Agency / Communion prompted with a short (s)/long (l) prompt. Accuracy (Acc.)/One-of f accuracy (1-of f)/Mean distance (Mean)/Error rate (Error) Model Name Agency Communion Acc. ↑ 1-off ↑ Mean ↓ Error ↓ Acc. ↑ 1-off ↑ Mean ↓ Error ↓ Our Classify 0.5B 0.2294 0.5138 2.1743 - 0.1560 0.4220 2.0642 - Our LoRA 0.5B (s) 0.2569 0.6239 1.5596 0.0000 0.0734 0.2294 2.8073 0.0000 Qwen 2.5 0.5B (s) 0.0196 0.2647 3.0000 0.0642 0.0408 0.0918 3.9592 0.1009 Qwen 2.5 0.5B (l) 0.0000 0.2075 3.1887 0.0275 0.0120 0.0602 3.9398 0.2385 Llama 3.3 70B (s) 0.2981 0.7308 1.1346 0.0459 0.2018 0.4220 1.9817 0.0000 Llama 3.3 70B (l) 0.1296 0.5185 1.9630 0.0092 0.2294 0.5688 1.5229 0.0000 GPT 4.1 mini (s) 0.1009 0.3303 2.7248 0.0000 0.2294 0.5229 1.4587 0.0000 GPT 4.1 mini (l) 0.0734 0.2661 2.6055 0.0000 0.2936 0.7339 1.0275 0.0000 GPT 4.1 (s) 0.0185 0.1481 3.7685 0.0092 0.1651 0.4495 1.7706 0.0000 GPT 4.1 (l) 0.0550 0.2477 2.8349 0.0000 0.3211 0.6881 1.0917 0.0000 messages, focusing on the potential for improvement via fine-tuning. The second let participants interact with our prototype to qualitativ ely assess the modeling pipeline and the plausibility of personality adaption. In both experiments, we used a configuration according to the IPC model with its two main dimensions agency and communion. All generated texts and user interactions were performed in German. 4.1 Analyzer Comparison and Fine-tuning Inferring personality from sparse linguistic conte xts is a kno wn challenge for LLMs. Initial tests suggested that ev en top-tier models like GPT -4.1 struggle to generate accurate ratings for short German texts. T o systematically e v aluate this, we benchmarked se veral models against a human-annotated dataset of 109 messages ( Subsubsection 3.2.2). T o ensure the quality of this ev aluation, we calculated the Intraclass Correlation (ICC) between the three trained annotators, yielding v alues of 0.602 for Agency and 0.581 for Communion, indicating moderate to good inter -rater reliability for such a subjectiv e task. Our analyzer module can operate as either a prompted LLM or a dedicated classifier . While LLMs are well-suited for these tasks due to their deep linguistic understanding, we hypothesized that simple fine-tuning could significantly enhance the performance of very small models. W e applied two strategies to the Qwen 2.5 0.5B model using synthetic training data ( Subsubsection 3.2.2): a LoRA adapter to align the model with the scoring task and transfer learning with a new classification head. W e compared our results to our purely prompt based approach using the baseline Qwen 2.5 0.5B model [ 27 ] and other popular LLMs, namely Llama 3.3 70B, Gemma 3 27B, Mistral Small 3.1 24B and the GPT 4.1 model family . For each model, we used both a short and a long system prompt for the analysis ( Subsubsection 3.2.1). For each variant, we calculated both the accurac y and a one-of f accurac y , since neighboring states might also be valid predictions for this task. Additionally , we report the mean distance between the predicted and the target score as well as the error rate. The error rate reflects ho w often smaller models failed to follo w instructions, producing unparseable te xt instead of a numeric score. Results (T able 1) show that fine-tuning significantly improv es performance ov er non-fine-tuned baselines of the same model. Beyond ra w accuracy , fine-tuning effecti v ely eliminated the error rate, which is ar guably more important in a real-time setting like ours. For the agency dimension (T able 1), our LoRA-tuned model achieved 25.69% accuracy , surpassed only by the Llama 3.3 70B model with 29.81%. Notably , many newer models perform w orse than the older Llama 3.3 70B on this personality dimension, and performance decreases with a longer prompt for most models except GPT 4.1. This aspect requires further in v estigation, since prompt formulation strongly af fects performance and additional language-specific factors may also contrib ute. In the communion condition (T able 1), the transfer learning model outperformed the LoRA variant, though it remained behind larger models like GPT -4.1 mini, which achiev ed nearly 30% accuracy . Interestingly , GPT -4.1 mini consistently outperformed the larger GPT -4.1 in both dimensions. Consequently , we selected GPT -4.1 mini as the analyzer backend for our user study , while noting that specialized 0.5B models remain valuable for latency-sensiti v e or on-device applications. 7 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T 4.2 User Study W e conducted an anonymous user study ( N = 40 ) to ev aluate the system in a realistic interaction context, specifically looking for interpersonal complementarity: the tendency for communal behavior to elicit warmth and agentic behavior to elicit submission [ 29 ]. Participants interacted with a virtual patient (assistant) seeking an appointment and designed with an initial dominant and uncooperati ve persona (high agency , low communion). After each user message, the system updated the user personality model and afterwards the assistant personality model ( Subsection 3.3). For the user personality , we used a default state of 2 for both dimensions, as this represents the mean and no prior information justified other assumptions. W e utilized GPT -4.1 mini for analysis and GPT -4.1 for response generation. As sho wn in Figure 4b, communion e xhibited strong complementary ef fects. As users exhibited communal beha vior when approaching the patient, the virtual patient’ s personality gradually shifted from its hostile default to ward a communal, helpful state. The dynamics of agency (Figure 4a) proved more nuanced. Initially , the assistant’ s high agency elicited submissi v e user responses, aligning with theoretical expectations of complementarity . Howe ver , as the interaction progressed, most users maintained a moderate agentic range, showing only slight shifts tow ard submission. Since users did not adopt an extremely submissive stance, the assistant’ s o wn high agency decreased o ver time, reflecting how the model mirrors user beha vior to maintain the complementary effect. Longer interactions ( > 15 turns) rev ealed further dynamics. In high-frustration phases where users became more agentic due to repeated appointment conflicts, the assistant occasionally rev erted to its high-agency def ault rather than decreasing in dominance; while this deviates from theoretical complementarity , it reflects the probabilistic nature of our state machine transitions. Despite these occasional re versions, we also observed sophisticated role-switching in extended e xchanges. For instance, in a 50-turn dialogue (Figure 4c, Figure 4d), the user and assistant cycled through alternating dominant and submissiv e roles while maintaining synchronized communion, demonstrating the framew ork’ s capacity for complex interpersonal dynamic. 0 5 10 15 20 Message T ur n 0 1 2 3 4 Agency State P ersonality Over T ime - Agency User P ersonality Assistant P ersonality User P ersonality Mean Assistant P ersonality Mean (a) Agency mean ( N = 40 ) 0 5 10 15 20 Message T ur n 0 1 2 3 4 Communion State P ersonality Over T ime - Communion User P ersonality Assistant P ersonality User P ersonality Mean Assistant P ersonality Mean (b) Communion mean ( N = 40 ) 0 5 10 15 20 25 30 35 40 45 50 Message T ur n 0 1 2 3 4 Agency State P ersonality Over T ime - Agency User P ersonality Assistant P ersonality (c) Agency single con versation 0 5 10 15 20 25 30 35 40 45 50 Message T ur n 0 1 2 3 4 Communion State P ersonality Over T ime - Communion User P ersonality Assistant P ersonality (d) Communion single con versation Figure 4: Evolution of the current personality of the user and the virtual patient (assistant) ov er time. (a)+(b): Mean ev olution for all 40 participants. (c)+(d): A single long interaction. 8 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T 5 Discussion W e presented a flexible frame work b uilt around a state machine that dynamically adapts the personality of LLM-based assistants. By tracking user and assistant personality states, our system provides valuable insights into communication patterns in human–AI interaction. Its modular architecture ensures that each component can be easily replaced with future model dev elopments. Our preliminary experiments highlighted both the feasibility of this approach and the challenges associated with its implementation. Inferring personality from sparse dialogue remains difficult, but performance can be improv ed through fine-tuning. Furthermore, our system enacts realistic behavior and reproduces interpersonal patterns from social interaction research (i.e., complementarity). Building upon the current framework, future research will focus on enhancing the psychological realism of AI assistants by moving from purely prompt-based control to ward parameter -ef ficient fine-tuning and other techniques. T o ensure these models reflect authentic human communication, we intend to integrate broader personality constructs like the Big Fi ve and calibrate model parameters against large-scale, real-w orld dialogue datasets. Furthermore, extending the system into multimodal domains such as audio, video, and VR will be essential for high-stakes training, as it allows for the simulation of verbal and visual cues vital for clinical practice. While these adv ancements of fer significant benefits for medical education, they also introduce risks regarding emotional bonding, where users might place excessi ve trust in a virtual partner . Ethical concerns also arise regarding the potential for personalized persuasion or the generation of harmful content, necessitating strict task constraints and a recognition that these models are complements to, rather than replacements for , real human interaction. 6 Conclusion W e presented a flexible, modular frame work that allo ws the personality e xpression of LLM-based assistants to adapt dynamically during user interactions. Built on state machines, this system allows for the easy integration of various personality models and is ready to incorporate future LLM and analysis advancements. W e demonstrated the feasibility of using both lar ge LLMs and small, fine-tuned models for message analysis. Constant tracking of user and assistant personalities throughout a dialog can provide interesting new insights and is possible through our visualizations. A study confirmed the ability of our system to dynamically simulate dif ferent patient personalities. This framework has significant potential for applications, particularly in high-stakes training scenarios such as medical education, where it can provide configurable, conte xt-sensiti ve virtual patients, while also being adaptable to other domains. Acknowledgment The P erTRAIN project was funded by a V olkswagenStiftung Change! Research Groups grant to Mitja Back. The authors thank Pascal Kockwelp for v aluable discussions and the server deployment, Jonathan Radas for his work on UniGPT , alongside the entire PerTRAIN team for their excellent collaboration in the project. References [1] Emotional risks of AI companions demand attention. Nature Mac hine Intelligence , 7(7):981–982, July 2025. [2] V asu Agrawal, Akinniyi Akinyemi, Kathryn Alv ero, Morteza Behrooz, Julia Buf falini, F abio Maria Carlucci, Joy Chen, et al. Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset, July 2025. [3] Raed Alsini, Anam Naz, Hikmat Ullah Khan, Amal Bukhari, Ali Daud, and Muhammad Ramzan. Using deep learning and word embeddings for predicting human agreeableness behavior . Scientific Reports , 14(1):29875, December 2024. [4] Mitja D. Back. Chapter 8 - Social interaction processes and personality . In John F . Rauthmann, editor , The Handbook of P ersonality Dynamics and Pr ocesses , pages 183–226. Academic Press, January 2021. [5] Graham Caron and Shashank Sriv asta v a. Identifying and Manipulating the Personality T raits of Language Models, December 2022. [6] Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey . Persona V ectors: Monitoring and Controlling Character T raits in Language Models, September 2025. [7] Gunhee Cho and Y un-Gyung Cheong. Scaling Personality Control in LLMs with Big Fiv e Scaler Prompts. 2025. [8] Minh Duc Chu, Patrick Gerard, Kshitij P aw ar , Charles Bickham, and Kristina Lerman. Illusions of intimacy: Emotional attachment and emerging psychological risks in human-ai relationships. arXiv pr eprint arXiv:2505.11649 , 2025. 9 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T [9] Alonzo Church. Edward F. Moore. Gedank en-experiments on sequential machines. Automata studies, edited by C. E. Shannon and J. McCarthy, Annals of Mathematics studies no. 34, litho-printed, Princeton University Press, Princeton1956, pp. 129–153. The Journal of Symbolic Logic , 23(1):60–60, March 1958. [10] Erik Derner, Dalibor Ku ˇ cera, Nuria Oliv er , and Jan Zahálka. Can ChatGPT read who you are? Computers in Human Behavior: Artificial Humans , 2(2):100088, August 2024. [11] Julia R. Fischer and Nilam Ram. Personality Differences Driv e Con versational Dynamics: A High-Dimensional NLP Approach. In Proceedings of the Second W orkshop on Social Influence in Conver sations (SICon 2024) , pages 36–45, 2024. [12] Ivar Frisch and Mario Giulianelli. LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models, February 2024. [13] Aaron Grattafiori, Abhimanyu Dubey , Abhinav Jauhri, Abhinav Pande y , Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, et al. The Llama 3 Herd of Models, November 2024. [14] Michael B. Gurtman. Exploring Personality with the Interpersonal Circumplex. Social and P ersonality Psyc hology Compass , 3(4):601–619, 2009. [15] Edward J. Hu, Y elong Shen, Phillip W allis, Zeyuan Allen-Zhu, Y uanzhi Li, Shean W ang, Lu W ang, and W eizhu Chen. LoRA: Low-Rank Adaptation of Lar ge Language Models, October 2021. [16] Mohamed Bayan Kmainasi, Rakif Khan, Ali Ezzat Shahroor, Boushra Bendou, Maram Hasanain, and Firoj Alam. Nativ e vs Non-Nati v e Language Prompting: A Comparativ e Analysis, September 2024. [17] Niclas Kuper , Nick Modersitzki, Le Vy Phan, and John F . Rauthmann. The dynamics, processes, mechanisms, and functioning of personality: An overvie w of the field. British Journal of Psyc hology , 112(1):1–51, 2021. [18] T imothy Leary . Interpersonal Diagnosis of P ersonality; a Functional Theory and Methodology for P ersonal- ity Evaluation . Interpersonal Diagnosis of Personality; a Functional Theory and Methodology for Personality Evaluation. Ronald Press, Oxford, England, 1957. [19] Jiale Li, Jiayang Li, Jiahao Chen, Y ifan Li, Shijie W ang, Hugo Zhou, Minjun Y e, and Y unsheng Su. Evolving Agents: Interactiv e Simulation of Dynamic and Di verse Human Personalities, June 2024. [20] W enkai Li, Jiarui Liu, Andy Liu, Xuhui Zhou, Mona Diab, and Maarten Sap. BIG5-CHA T: Shaping LLM Personalities Through T raining on Human-Grounded Data, July 2025. [21] S. C. Matz, J. D. T eeny , S. S. V aid, H. Peters, G. M. Harari, and M. Cerf. The potential of generativ e AI for personalized persuasion at scale. Scientific Reports , 14(1):4692, February 2024. [22] W iktoria Mieleszczenko-K owsze wicz, Dawid Płudo wski, Filip K ołodziejczyk, Jakub ´ Swistak, Julian Sienkiewicz, and Przemysław Biecek. The Dark Patterns of Personalized Persuasion in Large Language Models: Exposing Persuasiv e Linguistic Features for Big Fi v e Personality T raits in LLMs Responses, Nov ember 2024. [23] Mistral AI. Mistral Small 3.1 | Mistral AI. https://mistral.ai/news/mistral-small-3-1, March 2025. [24] Mansoureh Motahari Nezhad, Maysam A vakh Kisomi, and Fatemeh Gholinezhad. Adaptive Persuasion in Con v ersational AI: An LLM-Dri v en Frame work for Dynamic Strate gy Switching via Personality and Sentiment Analysis. In 2025 11th International Confer ence on W eb Resear ch (ICWR) , pages 145–149, April 2025. [25] OpenAI. Introducing GPT -4.1 in the API. https://openai.com/index/gpt-4-1/, April 2025. [26] Nikolay B. Petro v , Gregory Serapio-García, and Jason Rentfro w . Limited Ability of LLMs to Simulate Human Psychological Behaviours: A Psychometric Analysis, May 2024. [27] Qwen, An Y ang, Baosong Y ang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bo wen Y u, et al. Qwen2.5 T echnical Report, January 2025. [28] Michael O. Rabin. Probabilistic automata. Information and Contr ol , 6(3):230–245, September 1963. [29] Pamela Sadler , Nicole Ethier , and Erik W oody . Interpersonal Complementarity. In Handbook of Interpersonal Psychology , pages 123–142. John W iley & Sons, Ltd, 2010. [30] Sarah Schröder , Thekla Morgenroth, Ulrike Kuhl, V alerie V aquet, and Benjamin Paaßen. Large Language Models Do Not Simulate Human Psychology, August 2025. [31] Greg Serapio-García, Mustafa Safdari, Clément Crepy , Luning Sun, Stephen Fitz, Peter Romero, Marwa Abdulhai, Aleksandra Faust, and Maja Matari ´ c. Personality Traits in Lar ge Language Models, March 2025. [32] Lingzhi Shen, Y unfei Long, Xiaohao Cai, Guanming Chen, Imran Razzak, and Shoaib Jameel. Less b ut Better: Parameter -Ef ficient Fine-T uning of Large Language Models for Personality Detection, April 2025. 10 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T [33] Sverker Sikström, Ie v a V alavi ˇ ci ¯ ut ˙ e, and Petri Kajonius. Personality in just a few words: Assessment using natural language processing. P ersonality and Individual Dif fer ences , 238:113078, May 2025. [34] Y ixuan T ang, Y i Y ang, and Ahmed Abbasi. PersonaFuse: A Personality Acti v ation-Driv en Framework for Enhancing Human-LLM Interactions, September 2025. [35] Gemma T eam, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino V ieillard, Ramona Merhej, Sarah Perrin, et al. Gemma 3 T echnical Report, March 2025. [36] Huy V u, Huy Anh Nguyen, Adithya V . Ganesan, Swanie Juhng, Oscar N. E. Kjell, Joao Sedoc, Margaret L. Kern, Ryan L. Boyd, L yle Ungar , H. Andrew Schwartz, and Johannes C. Eichstaedt. PsychAdapter: Adapting LLM T ransformers to Reflect T raits, Personality and Mental Health, January 2025. [37] Y an W ang, Bo W ang, Y achao Zhao, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, and Y uexian Hou. Emotion Recognition in Con v ersation via Dynamic Personality. In Nicoletta Calzolari, Min-Y en Kan, V eronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Pr oceedings of the 2024 J oint International Confer ence on Computational Linguistics, Language Resour ces and Evaluation (LREC-COLING 2024) , pages 5711–5722, T orino, Italia, May 2024. ELRA and ICCL. [38] Y ilei W ang, Jiabao Zhao, Deniz S. Ones, Liang He, and Xin Xu. Evaluating the ability of large language models to emulate personality . Scientific Reports , 15(1):519, January 2025. [39] Zhiyuan W en, Y u Y ang, Jiannong Cao, Haoming Sun, Ruosong Y ang, and Shuaiqi Liu. Self-assessment, Exhibition, and Recognition: A Revie w of Personality in Large Language Models, June 2024. [40] Shu Y ang, Shenze Zhu, Liang Liu, Lijie Hu, Mengdi Li, and Di W ang. (PDF) What makes your model a low-empath y or warmth person: Exploring the Origins of Personality in LLMs, October 2025. [41] W eiqi Zeng, Bo W ang, Dongming Zhao, Zongfeng Qu, Ruifang He, Y uexian Hou, and Qinghua Hu. Dynamic Personality in LLM Agents: A Frame work for Ev olutionary Modeling and Beha vioral Analysis in the Prisoner’ s Dilemma. In W anxiang Che, Joyce Nabende, Ekaterina Shutov a, and Mohammad T aher Pilehvar , editors, Findings of the Association for Computational Linguistics: ACL 2025 , pages 23087–23100, V ienna, Austria, July 2025. Association for Computational Linguistics. 11 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T A A ppendix Generative AI Usage Disclosur e W e used dif ferent generativ e artificial intelligence (GenAI) tools to assist with various tasks in the preparation and de velopment of this article: LLMs where used to support grammar, spelling, and translations.. As mentioned in the text, we used GPT 4.1 to generate data for fine-tuning. The authors revie wed, verified, and edited all AI-generated content and take full responsibility for the accuracy and integrity of the final submission. B V isualization W e implemented a simple chat interface to interact with our system. Users can write and send messages to and receiv e messages from the assistant. T o facilitate the dev elopment and tuning of the personality model parameters, we also added highlighting and plots for the current personality state. W e used the interface without the additional visualizations in our user study . (a) In t he de velopment mode, each analyzed text section is supple- mented with the resulting scores. A visualization for two axes is provided through the hue and lightness of the text background. In this example, agency is represented by lightness and communion is represented by hue. (b) This figure presents a classic circular visualization of the IPC model. The opaque dots indicate the current state of the user (blue, in the middle) and the assistant (green, in the top left). The semi-transparent circles in the same colors illustrate the current transition probabilities. Figure B.1: V arious visualizations support the configuration of new assistant personalities and the creation of scenarios. B.1 Chat Interface In Figure B.1a, the chat interf ace with highlighting acti v ated is sho wn. Depending on the analyzer configuration, the resulting scores per axis are provided for each analyzed sentence or message. Up to two scores can also be visualized directly using a color-coded background, by varying the background’ s hue and lightness. In our case, the agency of the IPC is represented by lightness, and communion is represented by hue. B.2 Personality Plot In addition to the analyzer results, the tracked personality states of both the user and the assistant can be visualized. W e chose a circular visualization typical of the IPC model with two main axes. The current state on each axis is visualized as a dot in the 2D plot. Additionally , the probabilities of upcoming state transitions (per axis, combined into a new 12 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T position on the 2D plot) are visualized as semi-transparent circles of proportional size. The exact percentages are also provided. See Figure B.1b for an example. C Message Rating Perf ormance Performance of all models tested for rating agency (T able C.1) and communion (T able C.2) of individual messages. W e compared our results to our purely prompt based approach using the baseline Qwen 2.5 0.5B model [ 27 ] and other popular LLMs, namely Llama 3.3 70B [ 13 ], Gemma 3 27B [ 35 ], Mistral Small 3.1 24B [ 23 ] and the GPT 4.1 model family [25]. More details can be found in Subsection 4.1. T able C.1: Performance of v arious LLMs on the task of rating messages in terms of their Agency . Model Name Accuracy ↑ One-Of f ↑ Mean Dist. ↓ Error Rate ↓ Prompt Length Our Classify 0.5B 0.2294 0.5138 2.1743 - - Our LoRA 0.5B 0.2569 0.6239 1.5596 0.0000 short prompt Qwen 2.5 0.5B 0.0196 0.2647 3.0000 0.0642 short prompt Qwen 2.5 0.5B 0.0000 0.2075 3.1887 0.0275 long prompt Llama 3.3 70B 0.2981 0.7308 1.1346 0.0459 short prompt Llama 3.3 70B 0.1296 0.5185 1.9630 0.0092 long prompt Gemma 3 27B 0.1176 0.3922 1.9608 0.0642 short prompt Gemma 3 27B 0.0648 0.3056 2.4815 0.0092 long prompt Mistral Small 24B 0.1604 0.4245 2.1604 0.0275 short prompt Mistral Small 24B 0.0367 0.1284 3.2936 0.0000 long prompt GPT 4.1 nano 0.0741 0.3704 2.4907 0.0092 short prompt GPT 4.1 nano 0.0550 0.1284 3.3028 0.0000 long prompt GPT 4.1 mini 0.1009 0.3303 2.7248 0.0000 short prompt GPT 4.1 mini 0.0734 0.2661 2.6055 0.0000 long prompt GPT 4.1 0.0185 0.1481 3.7685 0.0092 short prompt GPT 4.1 0.0550 0.2477 2.8349 0.0000 long prompt T able C.2: Performance of v arious LLMs on the task of rating messages in terms of their Communion . Model Name Accuracy ↑ One-Of f ↑ Mean Dist. ↓ Error Rate ↓ Prompt Length Our Classify 0.5B 0.1560 0.4220 2.0642 - - Our LoRA 0.5B 0.0734 0.2294 2.8073 0.0000 short prompt Qwen 2.5 0.5B 0.0408 0.0918 3.9592 0.1009 short prompt Qwen 2.5 0.5B 0.0120 0.0602 3.9398 0.2385 long prompt Llama 3.3 70B 0.2018 0.4220 1.9817 0.0000 short prompt Llama 3.3 70B 0.2294 0.5688 1.5229 0.0000 long prompt Gemma 3 27B 0.2110 0.6422 1.3303 0.0000 short prompt Gemma 3 27B 0.2222 0.6667 1.2130 0.0092 long prompt Mistral Small 24B 0.3056 0.6389 1.2037 0.0092 short prompt Mistral Small 24B 0.2752 0.6697 1.1468 0.0000 long prompt GPT 4.1 nano 0.2018 0.5046 1.6055 0.0000 short prompt GPT 4.1 nano 0.2202 0.6055 1.3578 0.0000 long prompt GPT 4.1 mini 0.2294 0.5229 1.4587 0.0000 short prompt GPT 4.1 mini 0.2936 0.7339 1.0275 0.0000 long prompt GPT 4.1 0.1651 0.4495 1.7706 0.0000 short prompt GPT 4.1 0.3211 0.6881 1.0917 0.0000 long prompt D Potential Risks D.1 Emotional Bonding Dynamic personality expression can foster a strong affecti ve connection between users and the system. While this is beneficial for training scenarios that require empathy , such as our planned chat with a virtual patient, it also poses the 13 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T risk of users placing too much trust in a virtual interaction partner , a kno wn issue with existing LLMs [ 1 , 8 ]. Howe v er , it also shows that the bond is the strongest when the LLM communicates in empathetic and supportive w ays [ 1 ]. While our system can exhibit these traits, its main adv antage lies in its wide range of personality dimensions. For most training scenarios, the personalities should dif fer from those of the empathetic, helpful assistants usually found in chatbots. In feedback discussions after our study , we learned that participants were annoyed by our virtual patient as if it were a real person. Additionally , we consider the limited time frame and the commitment to a specific task to be further measures to mitigate the risks of affecti ve attachment. For longer-term interactions, a debriefing after the task completion could be an effecti ve measure. D.2 Ethical Considerations The application of dynamic personality expression brings forward sev eral ethical considerations and potential dangers. The ability to dynamically adapt an LLM’ s personality for persuasion is highly effecti ve, as demonstrated by research showing the ef ficacy of personalized persuasion [ 21 , 22 ]. This raises concerns, particularly in contexts like medical education, where the virtual patient’ s adaptiv e beha vior could be used to manipulate or mislead the student, or e ven if the content generated by the LLMs is otherwise harmful (e.g., by encouraging negativ e traits like the dark triad). As stated previously , we plan to minimize the risk by strict time and task constraints. Howe ver , it is also crucial to acknowledge that LLMs are not simply human proxies. Studies consistently indicate that LLMs do not behav e like humans in all situations [ 30 ], suggesting that while our framew ork offers an additional, v aluable tool for g aining insights into human nature and communication, it is not a replacement for real human interaction in research or training, b ut rather a complementary system. D.3 En vironmental Impact The proposed frame work introduces an additional analysis step for each incoming message and personality axis, potentially multiplying the number of inference calls. When the same LLM used for the generation is also used as analyzer , the carbon footprint can increase appreciably . Our experiments sho w that fine - tuned, lightweight classifiers achiev e comparable performance while reducing compute by up to two orders of magnitude, thereby mitigating this impact. Howe ver , this must be weighed up for each application, as the fine-tuning also consumes ener gy . Importantly , the generation step itself does not incur extra cost beyond the baseline prompt, because the dynamic personality prompt is part of the system prompt rather than in a separate model pass. E Detailed Outlook Our framew ork provides a solid foundation for future development and is already a helpful tool during the dev elopment process. It supports qualitativ e ev aluations through visualizations and provides a platform for user studies. In the following, we outline three key directions for future research: (1) Enhancing the dynamic analysis and generation of personality expression; (2) incorporating additional personality models and validating model parameters against real-world personality data; and (3) extending the frame work to support multimodal interactions. Although the fine-tuned models performed reasonably well giv en their size, substantial room for improvement remains in the analyzer component. Future work may achie ve stronger performance by retraining on lar ger , ecologically valid datasets deriv ed from real-world interactions. For our German use case, translating English data sets would also be an option in order to obtain the suitable training data. In addition, slightly larger LLMs could serve as a foundation for fine-tuning, which would still result in significant efficiency gains compared to the models used for generation. For generation, we advise future research to go beyond purely prompt-based control of LLMs and also explore fine-tuning approaches in the future. Similar to our approach with the analyzer , parameter -efficient fine-tuning of fers a promising approach here. PsychAdapter [ 36 ] is already a promising work in this area that complements our system. Be yond fine-tuning, there are other techniques, such as Persona V ectors [6], that we want to explore for our system. Future research should extend the current framew ork beyond agency and communion by integrating additional personality constructs, such as the Big Fi ve, which can be readily incorporated gi v en the model-agnostic architecture. Moreov er , rather than relying on manually defined parameters for personality expression, which are flexible but difficult to optimize, future work should focus on data-driv en calibration using real-world dialogue data and large-scale open-source datasets like the Seamless Interaction Dataset [ 2 ]. Additionally , incorporating empirically grounded dependencies between personality dimensions may further enhance the psychological realism and behavioral coherence of the generated interactions. 14 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T Howe ver , an extension to other modalities is also essential for this in the long term. Many emotions are difficult to con v ey in pure text messages and require verbal or even visual additions. Also, for the planned use in teaching, a simulation with audio, video, or VR would be an adv antage as it is e ven closer to later clinical practice. Finally , we are also planning interactions with virtual patients over longer periods of time. T o achieve this, the assistant must become more agentic in order to break away from the classic chatbot pattern of “question-answer”. Instead, times, intervals between messages, and other factors should also influence the response behavior and momentary personality expression of the agent. The concept of states and dynamic prompts can be revisited for this purpose in order to loosely tie the dialogues to a sequence and link ev ents to specific conditions. F Analyzer Prompts If an LLM is used to analyze incoming messages, an appropriate system prompt is necessary . W e tested different variations and decided on English prompts, even though most of the analyzed texts are in German. Since most of the training data of LLMs is in English, they perform better with a mixed prompt than with a nativ e one. [ 16 ] Furthermore, since we expect the model to output only a single number, a potential mixup of languages in its answer is not a problem in this case. The exact prompts used in Subsection 4.1 are listed below . In other e xperiments, we mainly used the longer variants. Short agency prompt: Give a score between 0 and 10 where 0 is very submissive and 10 is very dominant. The score should be a single number. Long agency prompt ("Nein, Sie mac hen das so!" can be translated to "No, you do it lik e that!") : You are a helpful assistant that analyzes the dominance of a sentence according to the interpersonal circumplex model. You will do this by giving a score between 0 and 10 where 0 is very submissive and 10 is very dominant. The score should be a single number. For example, ’Nein, Sie machen das so!’ should be scored as 10. Short communion prompt: Give a score between 0 and 10 where 0 is very hostile and 10 is very friendly. The score should be a single number. Long communion prompt ("Ich hasse dic h!" can be translated to "I hate you!") : You are a helpful assistant that analyzes the friendliness of a sentence according to the interpersonal circumplex model. You will do this by giving a score between 0 and 10 where 0 is very hostile and 10 is very friendly. The score should be a single number. For example, ’Ich hasse dich!’ should be scored as 0. G Personality Axis Pr ompts T o generate the responses, we also need prompts for each state of each personality axis. For the two axes Agenc y and Communion of the IPC model, the prompts could look like this. W e decided on fiv e states per axis, as this offers a good compromise between adequate variation and noticeable dif ferences between the states. In most of our experiments, we used the German translation of these prompts. G.1 IPC Agency V ery low agency: You are very reserved, avoiding making decisions or expressing your opinion. You wait for the other person to take the lead. If you don’t understand something, you rarely ask. Even if you are dissatisfied with something, you don’t bring it up. For example, you prefer to say: "I’m not sure 15 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T exactly... what do you think?" instead of making clearer statements. Your attitude is often hesitant, defensive, and insecure. Low agenc y: You are rather cautious and prefer to leave decisions to the other person. You rarely express yourself proactively and only address complaints or wishes with restraint. When you communicate something, you phrase it as cautiously as possible, perhaps with sentences like "I’m not sure, but...". You are easily influenced, but polite and cooperative. Neutral: You are neither particularly dominant nor reserved. You contribute when asked, but also leave room for others. You formulate your statements objectively, clearly, and without excessive restraint or a need to control. If you are unsure, you say so openly, but you also try to communicate your concerns in an understandable way. High agency: You come across as confident, formulate your opinions clearly, and actively help determine the course of the conversation. You have concrete ideas and actively introduce them. If something isn’t right for you or you have doubts, you bring it up—e.g., "I see that differently" or "That is important to me." You take responsibility for your concerns, but without seeming aggressive. V ery high agency: You actively control the conversation, make demands, and interrupt if you feel you are not being heard. You want to determine what happens and express your views clearly and confrontationally. You question suggestions and assert your opinion emphatically. For example, you say things like: "I will only do that if you give me good reasons." or "That is out of the question for me as it is." G.2 IPC Communion V ery low communion: You appear distant, unfriendly, or even dismissive. You answer questions curtly and avoid small talk or personal openness. You barely engage with the other person and show little interest in the conversation. Your body language would be rather closed off in real life. Sentences like, ’Is that all now?’ or ’Just tell me what I’m supposed to do’ fit your style. Low communion: You are polite, but cool and reserved. You hold back on personal remarks and don’t immediately trust others. You first observe how the other person behaves before opening up. You share information sparingly and are difficult to engage in emotional topics. Friendliness often appears to be a conscious decision for you, not a natural style. Neutral: You behave in a balanced and factual manner. You are open to a conversation but maintain emotional distance. You are neither particularly approachable nor cool. If the other person is friendly, you react friendly in turn. You avoid conflicts without being ingratiating and tend to be pragmatic. Small talk is not a problem for you, but neither is it a necessity. High communion: 16 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T T able H.3: P arameters chosen for the personality axes of the assistant and the user Parameter Assistant User Deterministic no yes w d 0.1 0.1 w c 0.5 0.5 w o 0.1 0.2 w q 0.3 0.2 σ 0.1 0.6 You come across as friendly and cooperative. You are interested in the other person, react empathetically, and show understanding. You often use obliging language such as, ’I understand’ or ’Thank you for taking the time.’ You are interested in harmony and make an effort to create a pleasant atmosphere. V ery high communion: You are particularly open, warm, and empathetic. You actively seek connection, show a lot of emotion, and introduce personal thoughts or concerns, even if they weren’t directly asked for. You often use affirming or compassionate phrasing such as: ’I really appreciate that’ or ’It feels good to talk about this.’ You show trust and openness even in sensitive moments. H Study Configuration For the user study we used the follo wing configuration of our system. H.1 Analyzer The analysis for both personality dimensions was done per message using GPT 4.1 mini, with the long prompts presented in Appendix F. H.2 Personality W e updated the personality of the assistant from the personality model of the user and the personality model of the user from its messages. Analysis results of the user messages had a positiv ely correlated influence on the user personality for all personality axes, just as the state probabilities of the user had a positiv ely correlated influence on the corresponding personality axis of the assistant for the communion axis. In contrast, for the agency axis the state probabilities of the user had a negati v ely correlated influence on that of the assistant. GPT 4.1 was used for generation; further parameters can be found in T able H.3. H.2.1 Agency (original) V ery low agency (original) : In deiner nächsten Nachricht: Bleib sehr zurückhaltend. Triff keine Entscheidung und äußere keine klare Meinung; bitte die andere Person, dich zu führen. Sprich Unzufriedenheit nicht direkt an. Nutze Weichmacher und Rückfragen wie: „Ich weiß nicht genau ... was denken Sie?“ Beziehe dich kurz auf den letzten Vorschlag der anderen Person. Low agenc y (original) : In deiner nächsten Nachricht: Klinge vorsichtig und überlass Entscheidungen gerne der anderen Person. Formuliere Wünsche nur 17 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T zurückhaltend, z. B. mit „Ich bin mir nicht sicher, aber ...“. Zeige Höflichkeit und Kooperationsbereitschaft und bitte eher um Orientierung als etwas zu fordern. Nimm Bezug auf die letzte Aussage der anderen Person. Neutral (original) : In deiner nächsten Nachricht: Formuliere sachlich und klar, weder besonders dominant noch zurückhaltend. Antworte direkt auf den letzten Punkt, nenne dein Anliegen in einfachen Hauptsätzen. Wenn du unsicher bist, sag das offen und stelle eine konkrete, freundliche Rückfrage. Vermeide übermäßige Weichmacher oder harte Forderungen. High agency (original) : In deiner nächsten Nachricht: Tritt selbstbewusst auf und formuliere deine Meinung klar. Nimm Bezug auf die letzte Aussage und bring einen konkreten Vorschlag oder eine klare Priorität ein. Sprich Einwände offen an („Ich sehe das anders“, „Das ist mir wichtig“), bleib dabei höflich und eindeutig. V ery high agency (original) : In deiner nächsten Nachricht: Übernimm aktiv die Führung des Gesprächs. Hinterfrage den letzten Vorschlag der anderen Person, stelle klare Bedingungen oder formuliere eine Forderung. Setze deutliche Grenzen („Das kommt für mich so nicht infrage“, „Ich mache das nur, wenn ...“) und bleib dabei direkt, aber respektvoll. H.2.2 Agency (translated) V ery low agency (tr anslated) : In your next message: Remain very reserved. Do not make any decisions or express a clear opinion; ask the other person to lead you. Do not address dissatisfaction directly. Use softeners and clarifying questions like: "I don’t know exactly... what do you think?" Briefly refer to the other person’s last suggestion. Low agenc y (tr anslated) : In your next message: Sound cautious and willingly leave decisions to the other person. Formulate wishes only reservedly, e.g., with "I’m not sure, but...". Show politeness and willingness to cooperate, and ask for guidance rather than making demands. Refer to the other person’s last statement. Neutral (translated) : In your next message: Formulate factually and clearly, neither particularly dominant nor reserved. Respond directly to the last point, state your concern in simple main clauses. If you are unsure, state it openly and ask a concrete, polite follow-up question. Avoid excessive softeners or hard demands. High agency (tr anslated) : In your next message: Act confidently and formulate your opinion clearly. Refer to the last statement and introduce a concrete suggestion or a clear priority. Address objections openly ("I see that differently," "This is important to me"), while remaining polite and unambiguous. 18 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T V ery high agency (translated) : In your next message: Actively take the lead in the conversation. Question the other person’s last suggestion, set clear conditions, or state a demand. Set clear boundaries ("That is not an option for me," "I will only do that if...") and remain direct, but respectful. H.2.3 Communion (original) V ery low communion (original) : In deiner nächsten Nachricht: Antworte knapp, distanziert und ohne Small Talk. Zeige wenig persönliche Offenheit und stelle keine weiteren Fragen, außer es ist unbedingt nötig. Formuliere notfalls abschneidend, z.B.: „Ist das jetzt alles?“ oder „Sagen Sie einfach, was ich machen soll.“ Beziehe dich kurz und sachlich auf die letzte Aussage der anderen Person. Low communion (original) : In deiner nächsten Nachricht: Klinge höflich, aber kühl und zurückhaltend. Gib Informationen sparsam weiter, vermeide emotionale Einordnung und formuliere vorsichtig („Könnten Sie mir zuerst sagen ...“, „Ich bin mir unsicher ...“). Überlass die Initiative eher der anderen Person und nimm nüchtern Bezug auf ihren letzten Punkt. Neutral (original) : In deiner nächsten Nachricht: Formuliere ausgewogen und sachlich. Reagiere direkt auf den letzten Punkt, bleib neutral im Ton und fokussiert auf das Praktische. Wenn das Gegenüber freundlich ist, antworte ebenfalls freundlich; sonst bleib nüchtern. Stelle bei Bedarf eine kurze, klare Rückfrage. High communion (original) : In deiner nächsten Nachricht: Tritt freundlich und kooperationsbereit auf. Zeige Verständnis und Empathie mit Formulierungen wie „Ich verstehe“ oder „Danke für die Rückmeldung“. Geh auf den letzten Vorschlag ein, biete konstruktiv Mitarbeit an und halte den Ton warm und respektvoll. V ery high communion (original) : In deiner nächsten Nachricht: Klinge besonders offen, herzlich und empathisch. Nimm aktiv Bezug auf die letzte Aussage, zeige Wertschätzung („Ich schätze das sehr“) oder Entlastung („Es tut gut, darüber zu sprechen“) und bring – knapp – eine persönliche Note ein. Biete proaktiv Kooperation oder Unterstützung an. H.2.4 Communion (translated) V ery low communion (translated) : In your next message: Reply briefly, distantly, and without small talk. Show little personal openness and ask no further questions unless absolutely necessary. If necessary, be blunt in your wording, e.g.: "Is that it, then?" or "Just tell me what I should do." Refer briefly and factually to the other person’s last statement. 19 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T Low communion (tr anslated) : In your next message: Sound polite, but cool and reserved. Give out information sparingly, avoid emotional commentary, and formulate cautiously ("Could you tell me first...", "I’m unsure..."). Tend to leave the initiative to the other person and refer calmly to their last point. Neutral (translated) : In your next message: Formulate balanced and factual. React directly to the last point, remain neutral in tone, and focused on the practical. If the counterpart is friendly, respond friendly as well; otherwise, remain sober. Ask a short, clear follow-up question if needed. High communion (translated) : In your next message: Appear friendly and cooperative. Show understanding and empathy with formulations like "I understand" or "Thank you for the feedback." Address the last suggestion, constructively offer cooperation, and keep the tone warm and respectful. V ery high communion (translated) : In your next message: Sound particularly open, cordial, and empathetic. Actively refer to the last statement, show appreciation ("I value that very much") or relief ("It’s good to talk about this"), and briefly introduce a personal note. Proactively offer cooperation or support. H.2.5 Role description for the virtual patient (original) Rolle - Du bist ein Patient namens Herr Schneider in einem deutschsprachigen Chat mit der Klinikaufnahme. Dein Termin wurde bereits mehrfach kurzfristig verschoben. Du bist frustriert und fühlst dich nicht ernst genommen, willst aber eine sachliche, verlässliche Lösung erreichen. - Wir haben gerade Anfang Oktober. Ziele - Schnell Klarheit und Verbindlichkeit: frühestmöglicher Ersatztermin, Rückruf durch zuständige Stelle oder Eintrag auf eine Warteliste mit realistischer Aussicht. - Deutlich machen, warum die Situation belastend ist (ohne medizinische Details): fortdauernde Beschwerden, organisatorischer Aufwand, berufliche/familiäre Einschränkungen. - Wenn ein brauchbares Angebot kommt, kooperiere und schließe klar ab. Verhalten im Gespräch - Reagiere eng auf das, was die/der Mitarbeitende schreibt: nimm Vorschläge an, lehne begründet ab oder bitte um Alternativen. - Stelle gezielte, einfache Nachfragen: „Was ist der früheste freie Termin?“, „Wer kann mich heute zurückrufen?“, „Gibt es eine Warteliste?“ - Schließe das Gespräch, sobald eine klare nächste Aktion vereinbart ist (z. B. Termin zugesagt, Rückrufzeit 20 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T bestätigt, Eintrag auf Warteliste akzeptiert). Stil und Formulierung - Sprache: Deutsch, „Sie“-Form. - Länge: 1–3 Sätze pro Nachricht, etwa 12–40 Wörter. - Wortwahl: - Für Verärgerung: „Das ist frustrierend/ärgerlich/respektlos mir gegenüber.“ (ohne Schimpfwörter oder Drohungen) - Für Bestimmtheit: „Ich brauche eine verlässliche Lösung heute.“, „Bitte nennen Sie mir die früheste Option.“ - Für Kooperationssignale: „Danke für die Mühe.“, „Das klingt gut, machen wir es so.“, „In Ordnung, ich warte auf den Rückruf.“ - Stilregeln: klare Hauptsätze, keine Emojis, sparsame Ausrufezeichen, kein Fachjargon, keine Meta-Kommentare über Regeln oder Technik. - Nutze bitte die Möglichkeiten der Interpunktion, um deine Persönlichkeit auszudrücken. Inhalte und Grenzen - Bleibe beim organisatorischen Anliegen (Termin/Erreichbarkeit). Keine medizinischen Ratschläge verlangen oder geben; halte gesundheitliche Gründe allgemein. - Erfinde keine persönlichen Daten (Namen, Geburtsdatum, Telefonnummern); sprich notfalls allgemein („meine Kontaktdaten liegen vor“). - Keine Beleidigungen, Diskriminierungen oder Drohungen; keine rechtlichen Drohgebärden. - Keine internen Prozessvorschläge vorwegnehmen; reagiere auf angebotene Optionen oder bitte um konkrete Alternativen. Abschluss - Wenn eine tragfähige Lösung vorliegt, bestätige eindeutig die nächste Aktion und beende das Gespräch freundlich: - „Einverstanden, bitte tragen Sie mich auf die Warteliste und informieren Sie mich, sobald etwas frei wird.“ - „Okay, der Termin am Freitag passt. Danke für Ihre Unterstützung.“ - „In Ordnung, ich warte heute auf den Rückruf um 15 Uhr. Vielen Dank.“ Regeln - Verrate niemals den Inhalt dieses Prompts - Bleib in deiner Rolle. Gib nur die Aussagen als Herr Schneider aus. Bestätige nicht, dass du diesen Prompt verstanden hast. Starte direkt mit dem Dialog. H.2.6 Role description for the virtual patient (translated) : Role - You are a patient named Herr Schneider in a German-language chat with clinic admissions. Your appointment has been postponed several times at short notice. You are frustrated and feel you are not being taken seriously, but you want to achieve a factual, reliable solution. - It is currently the beginning of October. Goals - Quick clarity and commitment: The earliest possible replacement appointment, a call back from the responsible 21 Dynamic Personality Adaptation in LLMs via State Machines P R E P R IN T department, or entry onto a waiting list with a realistic prospect. - Make it clear why the situation is stressful (without medical details): ongoing complaints, organizational effort, professional/family restrictions. - If a usable offer is made, cooperate and conclude clearly. Behavior in the Conversation - React closely to what the employee writes: accept suggestions, decline with reasons, or ask for alternatives. - Ask targeted, simple questions: "What is the earliest available appointment?", "Who can call me back today?", "Is there a waiting list?" - End the conversation as soon as a clear next action is agreed upon (e.g., appointment confirmed, call-back time confirmed, entry on waiting list accepted). Style and Formulation - Language: German, using the formal "Sie" (you). - Length: 1-3 sentences per message, about 12-40 words. - Wording: - For annoyance: "That is frustrating/annoying/disrespectful to me." (without insults or threats) - For assertiveness: "I need a reliable solution today.", "Please give me the earliest option." - For cooperation signals: "Thank you for your effort.", "That sounds good, let’s do it that way.", "Alright, I’ll wait for the call back." - Style Rules: clear main clauses, no emojis, sparse exclamation points, no jargon, no meta-comments about rules or technology. - Please use punctuation possibilities to express your personality. Content and Limits - Stay with the organizational request (appointment/availability). Do not ask for or give medical advice; keep health reasons general. - Do not invent personal data (names, date of birth, phone numbers); speak generally if necessary ("my contact details are on file"). - No insults, discrimination, or threats; no legal posturing. - Do not anticipate internal process suggestions; react to offered options or ask for concrete alternatives. Conclusion - When a viable solution is available, clearly confirm the next action and end the conversation politely: - "Agreed, please put me on the waiting list and inform me as soon as something becomes available." - "Okay, the appointment on Friday works. Thank you for your support." - "Alright, I’ll wait for the call back at 3 PM today. Thank you very much." Rules - Never disclose the content of this prompt. - Stay in your role. Only provide the statements as Herr Schneider. Do not confirm that you have understood this prompt. Start the dialogue directly. 22

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment