Uncertainty-Aware Knowledge Tracing Models

Uncertainty-Aware Knowledge Tracing Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The main focus of research on Knowledge Tracing (KT) models is on model developments with the aim of improving predictive accuracy. Most of these models make the most incorrect predictions when students choose a distractor, leading to student errors going undetected. We present an approach to add new capabilities to KT models by capturing predictive uncertainty and demonstrate that a larger predictive uncertainty aligns with model incorrect predictions. We show that uncertainty in KT models is informative and that this signal would be pedagogically useful for application in an educational learning platform that can be used in a limited resource setting where understanding student ability is necessary.


💡 Research Summary

The paper addresses a critical gap in Knowledge Tracing (KT) research: while most efforts focus on improving predictive accuracy, they overlook the confidence of those predictions, especially when students select distractor options. Incorrect predictions on distractors often go unnoticed, leaving student misconceptions undetected. To remedy this, the authors propose augmenting existing KT models with uncertainty quantification using Monte Carlo Dropout (MC Dropout), a Bayesian approximation technique that enables estimation of predictive uncertainty without redesigning model architectures.

Four representative KT models are examined: Deep Knowledge Tracing (DKT), Self‑Attentive Knowledge Tracing (SAKT), Attentive Knowledge Tracing (AKT), and a newly introduced LLM‑Transformer model that leverages Qwen‑3 0.6B embeddings for question text. For each model, dropout is kept active during inference, and multiple stochastic forward passes (M = 4) are performed. The resulting class‑wise probability vectors are averaged to obtain a predictive distribution. Two uncertainty metrics are derived: (1) total Shannon entropy of the averaged distribution, and (2) standard deviation across the Monte Carlo samples. Higher entropy or standard deviation indicates greater model uncertainty.

Experiments are conducted on a dataset comprising up to 100 question‑answer pairs per student (approximately 20 quiz sessions, each containing five progressively harder questions). The authors first demonstrate that mean entropy is significantly larger for mis‑classified instances than for correct ones. This pattern holds especially when students choose an incorrect option, which is a minority class in the data and inherently harder to predict. Standard deviation shows a similar trend, reinforcing that uncertainty spikes coincide with model errors.

When comparing models, overall entropy averages are comparable across all four, suggesting that the magnitude of uncertainty in the final aggregated prediction is similar. However, standard deviation reveals distinct behaviors: DKT exhibits the highest variance, reflecting instability in its LSTM‑based predictions, while the LLM‑Transformer model shows the lowest variance, indicating more stable and confident outputs. Moreover, analysis of uncertainty over question number uncovers that DKT’s entropy is especially high for the first question, implying that recurrent models may over‑fit to question bias early on. Attention‑based models (SAKT, AKT, LLM‑Transformer) display decreasing standard deviation as more questions are observed, suggesting that they become more confident with additional interaction history. A periodic “spike” in entropy aligns with the quiz structure (every fifth, more difficult question), confirming that harder items raise both student error rates and model uncertainty.

The pedagogical implications are substantial. By flagging high‑uncertainty predictions, an intelligent tutoring system can defer decisions to human experts, provide targeted remedial content, or adjust the difficulty of subsequent items. Uncertainty can thus serve as a trigger for dialogue‑based diagnosis, prerequisite reinforcement, or adaptive learning path adjustments (lower difficulty during spikes, higher difficulty as confidence grows). Importantly, the proposed MC Dropout approach can be retrofitted onto existing deployed KT models, offering a practical route to incorporate uncertainty without extensive re‑engineering.

In conclusion, the study demonstrates that predictive uncertainty in KT models is both informative and actionable. Higher uncertainty reliably co‑occurs with incorrect predictions, offering a signal to mitigate the risk of undetected learning gaps. While total entropy is similar across models, variance across Monte Carlo samples differentiates them, with LLM‑based embeddings delivering the most calibrated confidence. These findings motivate the integration of uncertainty estimation into KT pipelines and suggest that attention‑driven, text‑rich models provide the most reliable signals for instructional decision‑making in mathematics education.


Comments & Academic Discussion

Loading comments...

Leave a Comment