Do Hackers Dream of Electric Teachers?: A Large-Scale, In-Situ Evaluation of Cybersecurity Student Behaviors and Performance with AI Tutors

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

To meet the ever-increasing demands of the cybersecurity workforce, AI tutors have been proposed for personalized, scalable education. But, while AI tutors have shown promise in introductory programming courses, no work has evaluated their use in hands-on exploration and exploitation of systems (e.g., ``capture-the-flag’’) commonly used to teach cybersecurity. Thus, despite growing interest and need, no work has evaluated how students use AI tutors or whether they benefit from their presence in real, large-scale cybersecurity courses. To answer this, we conducted a semester-long observational study on the use of an embedded AI tutor with 309 students in an upper-division introductory cybersecurity course. By analyzing 142,526 student queries sent to the AI tutor across 396 cybersecurity challenges spanning 9 core cybersecurity topics and an accompanying set of post-semester surveys, we find (1) what queries and conversational strategies students use with AI tutors, (2) how these strategies correlate with challenge completion, and (3) students’ perceptions of AI tutors in cybersecurity education. In particular, we identify three broad AI tutor conversational styles among users: Short (bounded, few-turn exchanges), Reactive (repeatedly submitting code and errors), and Proactive (driving problem-solving through targeted inquiry). We also find that the use of these styles significantly predicts challenge completion, and that this effect increases as materials become more advanced. Furthermore, students valued the tutor’s availability but reported that it became less useful for harder material. Based on this, we provide suggestions for security educators and developers on practical AI tutor use.

💡 Research Summary

The paper presents a semester‑long, in‑situ evaluation of an AI‑driven tutoring system (named SENSAI) deployed in a required upper‑division cybersecurity course at Arizona State University. A total of 309 students tackled 396 capture‑the‑flag style challenges spanning nine core security topics (Linux, data encoding, web & SQL, web security, computer architecture, network security, cryptography, reverse engineering, and binary security). The AI tutor was integrated into the course platform so that, whenever a student clicked a “Help” button, the system automatically bundled the current terminal history, the most recent file, the conversation context, and a Socratic system prompt before sending the request to GPT‑4o. This context‑aware design mimics a teaching assistant’s ability to “look over the shoulder,” allowing the model to give targeted guidance without exposing the solution.

Over the 15‑week term, students generated 142,526 queries and responses, which were analyzed alongside post‑semester survey data. The authors first built a taxonomy of help‑seeking behavior from 25,261 conversation logs, extracting features such as turn count, code inclusion, and abstraction level. Clustering revealed three dominant conversational styles:

Short – brief, bounded exchanges focused on quick clarifications.
Reactive – repetitive submissions of code or error messages, often “context‑dumping” without deeper inquiry.
Proactive – goal‑oriented, Socratic questioning that breaks the problem into sub‑tasks.

Statistical testing (χ² = 168.18, p < 0.001) showed that style significantly predicts challenge completion. Logistic regression indicated that Proactive and Short styles increase the odds of successful completion by roughly 12‑18 % compared with Reactive. Moreover, an interaction analysis demonstrated that the predictive gap widens for more advanced modules (χ²(16) = 46.67, p < 0.001), suggesting that meta‑cognitive, self‑directed inquiry becomes increasingly critical as problem complexity grows.

Survey results (78 % response rate) revealed nuanced student perceptions. While 84 % praised the tutor’s 24/7 availability, they rated its usefulness for “understanding concepts” and “debugging assistance” at an average of 4.1/5, but only 2.8/5 for “providing specific technical direction” on harder challenges. Compared with human TAs, the AI tutor scored lower on trust, motivation, and perceived expertise. Open‑ended comments highlighted that the tutor was helpful for quick hints but often failed to guide students through the higher‑level reasoning required for complex exploits.

From these findings, the authors recommend that educators explicitly teach students how to engage in Proactive dialogue—formulating precise, goal‑driven questions—while positioning AI tutors as complementary to human assistance rather than replacements. System design should retain optional context toggles to prevent over‑reliance and to preserve learner agency. Continuous monitoring of conversational style metrics could provide real‑time feedback to students, encouraging adaptive learning strategies.

Overall, this work constitutes the first large‑scale, empirical study of AI tutoring within hands‑on cybersecurity education, demonstrating that conversational style is a strong predictor of learning outcomes and offering actionable guidance for integrating generative AI tools into security curricula.

Do Hackers Dream of Electric Teachers?: A Large-Scale, In-Situ Evaluation of Cybersecurity Student Behaviors and Performance with AI Tutors

💡 Research Summary

Comments & Academic Discussion

Leave a Comment