Not Everyone Wins with LLMs: Behavioral Patterns and Pedagogical Implications for AI Literacy in Programmatic Data Science

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

LLMs promise to democratize technical work in complex domains like programmatic data analysis, but not everyone benefits equally. We study how students with varied experiences use LLMs to complete Python-based data analysis in computational notebooks in a graduate course. Drawing on homework logs, recordings, and surveys from 36 students, we ask: Which experience matters most, and how does it shape AI use? Our mixed-methods analysis shows that technical experience – not AI familiarity or communication skills – remains a significant predictor of success. Students also vary widely in how they leverage LLMs, struggling at stages of forming intent, expressing inputs, interpreting outputs, and assessing results. We identify success and failure behaviors, such as providing context or decomposing prompts, that distinguish effective use. These findings inform AI literacy interventions, highlighting that lightweight demonstrations improve surface fluency but are insufficient; deeper training and scaffolds are needed to cultivate resilient AI use skills.

💡 Research Summary

This paper investigates whether large language models (LLMs) truly democratize technical work in the context of graduate‑level data‑science education, and if not, what behavioral patterns and pedagogical interventions can bridge the gap. The authors conducted an IRB‑approved mixed‑methods study in a Spring 2025 “Data Science for Product Management” course at Carnegie Mellon University. Thirty‑six students from diverse backgrounds (computer science, business, product management) completed a series of four assignments—data cleaning, exploratory data analysis, lightweight machine learning, and data‑driven storytelling—using Google Colab notebooks with the built‑in Gemini assistant. The study collected detailed interaction logs (≈7,000 events), screen recordings, and pre‑ and post‑assignment surveys, allowing the authors to examine both outcomes (grades, completion time) and process (how students used the LLM).

Three research questions guided the work. RQ1 asked whether LLMs close the performance gap between students with differing experience. Regression analyses showed that technical experience (programming and data‑science background) remained the strongest predictor of higher grades, even after controlling for AI familiarity and communication skills. Under tight time constraints, LLM assistance reduced the gap modestly, but when students had ample time, the advantage of technical expertise re‑emerged. RQ2 explored how experience translates into AI‑use behaviors. By annotating logs with a novel LLM‑driven schema, the authors segmented each workflow into four stages: intent formation, input expression, output interpretation, and result assessment. Technically experienced students crafted clearer, context‑rich prompts, proactively asked the model for better visualizations, and used the assistant for planning. Novices tended to rely on the model for immediate error fixing, produced vague prompts, and often fell into “prompt rabbit holes” where repeated refinements failed to converge. They also showed weaker metacognitive practices during output interpretation and result validation. RQ3 examined what can be taught to improve AI usage. A lightweight in‑class demonstration followed by extended task time improved prompt quality for many students, but skills related to evaluating model outputs and integrating results into a coherent analysis showed little change, suggesting that deeper, feedback‑rich practice is required.

Based on these findings, the authors propose an AI‑use competency framework spanning conceptual, procedural, metacognitive, and dispositional knowledge. They argue that AI literacy is not merely familiarity with a tool but a transferable set of competencies for effective human‑AI collaboration. Pedagogical recommendations include: (1) embedding scaffolded prompts that guide students through the four workflow stages; (2) providing iterative feedback on both prompt construction and output evaluation; (3) designing assessments that reward strategic AI use rather than just final results; and (4) integrating reflective activities that make students aware of their own AI‑interaction patterns.

The paper contributes (i) empirical evidence that LLMs do not fully equalize performance across experience levels, (ii) a publicly released log‑annotation pipeline and codebase for future research, and (iii) concrete design implications for curricula and tools aimed at cultivating resilient AI collaborators. In sum, while LLMs can lower entry barriers, sustained success in programmatic data science still hinges on solid technical foundations and deliberate instruction in AI‑augmented problem solving.

Not Everyone Wins with LLMs: Behavioral Patterns and Pedagogical Implications for AI Literacy in Programmatic Data Science

💡 Research Summary

Comments & Academic Discussion

Leave a Comment