Mapping data literacy trajectories in K-12 education

Mapping data literacy trajectories in K–12 education ROBERT WH Y TE, Raspberry Pi Foundation, Unite d Kingdom MANNI CHEUNG, Raspberry Pi Foundation, Unite d Kingdom KA THARINE CHILDS, Raspberry Pi Foundation, Unite d Kingdom JANE W AI TE, Raspberr y Pi Foundation, United Kingdom SUE SEN T ANCE, University of Cambridge, United Kingdom Data literacy skills are fundamental in computer science education. Howev er , understanding how data-driven systems work repr esents a paradigm shift from traditional rule-based programming. W e conducted a systematic literature revie w of 84 studies to understand K–12 learners’ engagement with data across disciplines and contexts. W e propose the data paradigms framework that categorises learning activities along two dimensions: (i) logic (knowledge-based or data-driven systems), and (ii) e xplainability (transparent or opaque models). W e further apply the notion of learning trajectories to visualize the pathways learners follow across these distinct paradigms. W e detail four distinct trajectories as a provocation for researchers and educators to reect on how the notion of data literacy varies depending on the learning context. W e suggest these trajectories could be useful to those concerned with the design of data literacy learning environments within and beyond CS education. CCS Concepts: • Social and professional topics → K-12 education ; Computing literacy ; • Computing methodologies → Articial intelligence . Additional K ey W ords and Phrases: Data literacy , K–12, literature re view , data-driven, learning trajectories 1 Introduction Data literacy is increasingly positioned as a foundational competency in a data-driven society , spanning domains such as visualization, cognitive science, articial intelligence (AI) and education [ 7 ]. In computer science (CS) education, the rise of data-driven technologies (e.g., AI and machine learning) has le d to calls for CS to expand to incorporate relevant data literacy skills [ 21 ]. How ever , traditional programming instruction, which focuses on rule-base d logic, is insucient for helping students master the data-driven nature of these emerging technologies [ 36 ]. Likewise, what data literacy skills are needed is uncertain. Across curricula, learners may be introduced to concepts and skills relating to coding, database work, statistical analysis, and data ethics, yet the emphasis placed on these skills and how the y relate to one another can dier by context. Cruickshank et al . argues that data science ’s cross-disciplinary nature can produce a “fractured perspective” [ 4 , p. 248] in which connections between contributing domains (e.g., statistics, computer science, mathematics) remain underdeveloped or invisible. T o address this, they propose a more integrated framework that makes core competencies, such as domain understanding, problem formulation, and data management, explicit across the data lifecycle, and claries how these competencies align across disciplines. On the other hand, Olari and Romeike [26 , 27] outline a set of concepts and practices they argue are fundamental to a ‘ data-centric’ CS education. These framings suggest that ‘mapping the eld’ is an ongoing project that warrants an inter disciplinary vie w and that further theorisation is needed of what data literacy looks like in CS education and how instructional e xperiences should be structured into meaningful progressions. This position paper is therefore guided by the question: How can we map the learning experiences taken by K–12 students across interdisciplinary data literacy activities through a common framework? Authors’ Contact Information: Robert Whyte, bobby.whyte@raspberrypi.org, Raspberry Pi Foundation, Cambridge, United Kingdom; Manni Cheung, manni.cheung@raspberrypi.org, Raspberr y Pi Foundation, Cambridge, United Kingdom; Katharine Childs, katharine@raspberrypi.org, Raspberry Pi Foundation, Cambridge, United Kingdom; Jane Waite, jane.waite@raspberrypi.org, Raspberry Pi Foundation, Cambridge, United Kingdom; Sue Sentance, ss2600@cam.ac.uk, University of Cambridge, Cambridge, United Kingdom. 1 2 Whyte et al. 2 Theoretical framework Prior research on AI literacy in K –12 settings has often included learning ab out ML, yet in practice this relies on students’ capability to reason with data, for example , by trying out dierent examples and labels and observing how the model’s outputs change [ 36 ]. Howev er , evidence remains limited on how students and teachers de velop accurate mental models of data-driven systems and the sp ecic concepts and skills that support that dev elopment. Strengthening this line of research likely requires interdisciplinary grounding, particularly from mathematics and statistics education [ 33 ]. W e propose two dimensions w e argue are central to incorporating data science activities into K –12 computing education. 2.1 Knowledge-based vs data-driven models In CS education, students typically solve problems through algorithmic solutions (e .g., programming). This approach, termed rule-based ( or knowledge-based or symb olic ), is based on the idea of “logical proof of correctness” [ 33 , p . 27] and aligns with a traditional algorithmic view of problem solving in which a s olution is formalised and implemented [ 36 , 37 ]. This contrasts with a data-driven approach wher e students instead train and evaluate models using data and evaluate systems through a “statistical demonstration of eectiveness” [ 33 , p. 27]. Shapiro et al . describes these as two notional machines, namely the “classical logical computer” and the “statistical model” [33, p. 28]. 2.2 Explainability of models In many ML systems and to ols, certain processes are often abstracted away (i.e., black-boxed ) so either the inner workings of the learning algorithm and/or the role of data in shaping model outcomes may be less visible to learners [ 24 ]. A pedagogical challenge for e ducators, then, is the extent to which data-driven models can be explained [ 15 ]. Explainability varies across models, including how transparent, interpretable , and understandable a model’s internal logic is to end users [ 2 ]. Explainability can decrease as dimensionality increases since humans cannot readily inspect or reason over all components of a high-dimensional model at once [ 5 ]. For example, a low-dimensional linear regr ession can be considered highly explainable because its decisions can be accounted for in terms of explicit parameters and relationships [ 5 ]. By contrast, data-driven models such as r ecurrent neural netw orks (RNNs) or random forests are typically harder to explain because the pathways by which they produce decisions are comparatively opaque [ 2 ]. Strengthening the explainability of such models, especially where decision-making processes are not transparent, is frequently presented as important for establishing trust and supporting reliable use [20]. Within explainable AI (XAI), post-hoc techniques aim to provide explanations of otherwise opaque models [ 5 ]. Common examples include feature importance methods that indicate which inputs most inuenced a particular output, and counterfactual explanations, which illustrate the smallest changes nee ded to obtain a dierent prediction [ 15 ]. By contrast, ante-hoc explanations, sometimes termed “intrinsic explainability” [ 30 , p. 2], are associated with models whose structure is itself transparent, such as “linear regression, decision tree models, k-nearest neighbors” [ 2 , p. 348], where input–output relationships are more directly legible. Fr om an e ducational perspective, this introduces a tension: if learners’ early experiences with rule-based programming formulated an e xpectation that computational systems are transparent and veriable , they may struggle to reconcile that e xpe ctation with the limited transpar ency often characteristic of more opaque data-driven systems [33, p. 28]. Mapping data literacy trajectories in K–12 education 3 2.3 Summary As CS education evolves to incorporate data literacy skills, questions remain o ver what and how to teach, particularly given the interdisciplinary nature of data science. T o address this “fractured” issue [ 4 , p. 248], we fo cus on two dimensions that could provide a common vocabulary to describe learning environments acr oss an emerging landscape: (i) logic (i.e., knowledge-based versus data-driven), and (ii) explainability (i.e., transpar ent versus opaque models). W e review K–12 interventions through these dimensions to characterise existing data literacy e xperiences and identify shared data literacy pathways. 3 Method 3.1 Systematic literature revie w W e conducte d a systematic review of literature relating to the teaching and learning of data literacy skills in K –12 settings [ 42 ]. Further details on the screening process and criteria, as well as a copy of the literature re view data, are available on the study w ebsite for further reading [ 29 ]. Following the PRISMA 2020 guidelines [ 28 ], we sourced peer-reviewed empirical studies that (i) related to data literacy in K –12 settings; (ii) were published between 2019 and 2024; (iii) were published in computing education and other relevant elds (e .g., in mathematics education, STEM elds, learning sciences) as data literacy is interdisciplinary in nature; and (iv ) took place in classroom-based activities and non-formal contexts (e .g., data camps). 3.2 Data analysis Data paradigms: W e analysed learning activities against two dimensions: (i) whether students engaged in either creating knowledge-based or data-driven models [ 26 ] and (ii) whether these models were transparent or opaque [ 2 , 5 ]. Comparing these two dimensions against one another led to four distinct quadrants (or data paradigms ) in which student activities took place (see the data paradigms frame work in T able 1). W e were also interested to source interdisciplinary literature (computing education, statistics and mathematics, and domain-specic applications) and consider how terminology and language dier across disciplinary boundaries. T able 1. Data paradigms framework Dimension Knowledge-based (KB) Data-driven (DD) Transparent (T) Ex- plainable by design KB+T : Rule-based models which are ex- plainable by design (e.g., rule-based de- cision trees, manual classication) DD+T: Data-driven models which are explainable by design (e.g., linear regres- sion, k-nearest neighbour) Opaque (O) Explain- able through additional methods (e.g. post-hoc) KB+O: Rule-based mo dels which are ex- plainable through additional methods DD+O: Data-driven models ( e.g., neural networks, random forest) which are ex- plainable through additional ( e.g. post- hoc) methods Similar to the quadrant model [ 17 ], we ascrib ed learning activities to each paradigm (or ‘quadrant’) based on students’ engagement with knowledge-based or data-driven systems, and the e xtent to which these were explainable , within learning activities. With the advancement of post-hoc explanation methods, we recognise that this classication may evolve o ver time and activities could be reclassied. Interventions were generally designed for secondary-age d students (12–18 years old) ( n =191, 66.8% of instances) with fewer reported for primary-aged students or younger (3–11 year-olds) ( n =89, 31.1%); several studies provided no 4 Whyte et al. age data ( n =6). Fewer examples of interventions were designed for 18-year-old students though this is likely to reect when tertiary education b egins in most country contexts. Data literacy trajectories: As some papers featured multiple activities that were classie d across multiple paradigms, we wanted to consider how learners progr essed from knowledge-based to data-driven, and from transparent to more opaque, systems. W e therefore visualised activities across the four quadrants pr oposed in our data paradigms framework [ 42 ] to consider whether and how interventions mo ved between each paradigm. Drawing on the notion of learning trajectories in mathematics education [ 3 ] and computing e ducation [ 31 ], we inductively determined what current trajectories—in terms of the underlying learning goals—are employed within the analysed interventions. Three authors qualitatively coded the 84 papers until consensus was reached on the nal coding scheme and two authors coded the remaining papers. Though some activities wer e more challenging to categorize, and required discussion to reach consensus, we nonetheless calculated inter-rater reliability with a Cohen’s Kappa value of 𝜅 = 0 . 85 , indicating strong agreement [ 23 ]. This process resulted in four distinct learning trajectories (see Figures 1, 2, 3, and 4) that characterised student learning within interventions. These are described in detail in the following section. 4 Findings Most studies were conducted in the United States ( n =46, 54.8%), follo wed by Germany ( n =6, 7.1%) and Finland ( n =5, 6.0%). A ustralia, Brazil, Denmark, Hong Kong, Israel, Japan, Nigeria, South Kor ea, and T aiwan each contributed two studies ( n =2, 2.4% each), and examples from A ustria, China, Colombia, India, Singapore , Thailand, the Netherlands, Spain, and Switzerland accounted for one study each ( n =1, 1.2% each). In the following sections, we next outline some emerging trajectories that were identied across the revie wed literature. 4.1 Single paradigm activities Most of the review ed studies describ e learning experiences that remain within a single paradigm ( n =57; KB- T =1; DD- T =36; DD-O =20) [ 42 ]. Activities within this quadrant include discipline-specic activities, such as modelling in science [ 9 ] and data analysis using R in mathematics [ 43 ]. Many examples focus on ‘inspe ctable’ data practices such as visualising and interpreting datasets within the data-driven/transparent paradigm ( DD- T ) [ 11 ], whereas others centre on data-driven/opaque ( DD-O ) activities, such as image classication in computing e ducation, where model reasoning is not made directly inspectable [ 40 ]. Single-paradigm designs also appear in discipline-specic implementations, where the activity is tightly aligned with subject-spe cic goals and tools ( e.g., data analysis using R in mathematics [ 43 ]), which can prioritise domain learning outcomes over cross-paradigm comparison or pr ogression. 4.2 Trajectory #1: Keeping transparent In this trajectory , learners move from the knowledge-based/transparent ( KB- T ) to the data-driven/transparent ( DD- T ) paradigm (see Figure 1). 11 of the 84 studies followed this pathway in the SLR [ 42 ] ( n =11). For example, [ 39 ] begin with unplugged everyday phenomena framed as hand-authored rules (e .g., making pizzas via an “Input-Output Algorithm” [ 39 , p. 8]) before moving to training classiers using T eachable Machine ( KB- T → DD- T ) that replace these rules with data-driven models. In another example, Jiang et al . uses Stor yQ , a web-based ML and text mining tool for young learners to move from identifying sentiment cues in text (e.g dessert revie ws), to rening featur es to improve classication model accuracy through error analysis [ 13 ]. Similarly , Kajiwara et al . justify decision trees as “white-box machine learning” , in contrast to “black-box” approaches such as K-Nearest Neighbor (KNN) and neural networks, positioning rule-based ‘transparency’ as a scaold for an interpretable data-driven classier [14, p . 4]. Mapping data literacy trajectories in K–12 education 5 Fig. 1. Trajectory #1: Keeping transparent Fig. 2. Trajectory #2: Keeping data-driven 4.3 Trajectory #2: Keeping data-driven In this trajectory , learners move from data-driven/transparent ( DD- T ) to data-driven/opaque ( DD-O ) (see Figure 2) ( n =5). Students typically begin with exploratory analysis or simple sup ervised mo dels (e.g., scatter plots [ 16 ], regression/classication prior to CNNs [ 18 ], linear regression on self-collected temperature data[ 22 ]) and then progress to less transparent classiers or deep learning for pr ediction or recognition (e.g., K -Nearest Neighbor (KNN) and/or Support V ector Machine (SVM) on historical weather data [ 22 ] or KNN followed by image recognition based on deep learning [ 6 ]). In Lin et al . , the conversational agent Zhorai shifts from an inspectable, user-data-driven construction to a less transparent classication step summarised via a histogram of word-similarity scores [19]. 4.4 Trajectory #3: Jumping In this trajectory , learners jump from the kno wledge-base d/transparent ( KB- T ) to the data-driven/opaque ( DD-O ) paradigm (see Figure 3) ( n =9) [ 8 , 12 , 32 , 34 , 39 , 41 , 44 – 46 ]. One activity moved from ‘teaching computers through programming’ (i.e. cr eating rules in Scratch; KB- T ) to “teaching computers to learn” [ 34 , p. 377] (i.e. machine learning; DD-O ). Another distinguished between traditional CT practices (e.g., decomposition) and “ML CT practices” [ 32 , p. 4] (e.g., featur e selection). Students construct a “rule-driven ML system” consisting of logic gates and truth tables ( KB- T ), whereas in the Learning ML by teaching course, students create a “ data-driven ML system” using Machine Learning for Kids for classication ( DD-O ). Irgens et al . discussed their approach to “Critical Machine Learning” e ducation program where one activity moved from human-created rules (Pizza algorithm) to thinking critically about and interacting with data-driven systems through QuickDraw! and T eachable Machines by discussing algorithm’s performance limitations, specically its inability to recognise ev ery drawing [ 12 ]. In another example , students take part in a workshop based on ML-based conversational AI, where simple rule-based conversational agents are introduced to “provide a segue” into developing more complex ( and increasingly opaque) ML-based agents [ 41 ]. Across these studies, the jump is typically framed as a shift from explicit, human-authored rules and ‘classical’ CT to train systems whose behaviour must be inferred from data rather than inspected directly . 4.5 Trajectory #4: Bridging Moving students through mor e than two paradigms (Figure 4) was not common with only tw o studies exemplifying this [ 42 ]. A shared pattern is that opaque ML work is not introduced as a conceptual jump , as in the Jumping trajectory , but is sequenced from transparent reasoning and/or inspectable data work. In Broll and Gro ver, learners begin with rule-based explanations through Denial-of-Service (DoS) attacks ( KB-T ), then explor e a T witter dataset using CODAP to visualise the data and pr ovide ideas for classication ( DD-T ), before training a simple form of generative adversarial 6 Whyte et al. Fig. 3. Trajectory #3: Jumping Fig. 4. Trajectory #4: Bridging network [ 1 , p. 15995] ( DD-O ) to explore ideas of the generator and discriminator . In another example, Napierala et al . introduce a curriculum for supervised ML, which transitions students through dierent paradigmatic concepts. Initially , students engage with a set of leaf image training data ( DD- T ) to discover featur es, which is followed by the manual creation of a decision tree ( KB- T ) to develop explicit classication rules. Students then reect on the principles and limitations of Seek , a leaf identication application ( DD-O ), completing a journey that bridges transparent, rule-based reasoning with data-driven, opaque activities [ 25 ]. These curricula are structur ed so that learning experiences bridge between transparent and opaque, through a transparent data-driven activity , drawing attention to how concepts dier across the paradigms. 5 Discussion Researchers have argued that data-driven technologies ( e.g., ML) necessitate a need for data literacy skills [ 21 , 36 ]. In order to critically engage with these technologies, educators and resource need to consider what emerging data literacy skills are needed [ 27 ]. In CS education, we ar e particularly interested in exploring ho w we transition educators and learners from traditional knowledge-based ( or rule-based) logic towards data-driven ( e.g. ML) reasoning. T o that end, we have articulated how the data paradigms framework [ 42 ] and associated traje ctories could provide a starting point for researchers, educators and resource developers to compare how data literacy learning experiences position learners with respect to knowledge-based versus data-driven reasoning and transparent versus opaque systems. 5.1 Articulating a shared language for data literacy goals One enduring challenge, particularly within CS education, is the lack of a shared vocabulary when discussing the nature of data literacy and deciding on common goals. In our examples, we found discrepancies in how appr oaches are discussed, such as transparent knowledge-based approaches ( KB+T ) dene d as “rule-driv en learning” [ 32 , p. 2], “teaching computers through programming” [ 34 , p. 375] or even “rule-based AI” [ 41 , p. 15657]. Conv ersely , opaque data-driven approaches ( DD+O ) are referred to as “ML-based data-driven thinking” [ 32 , p. 1], “ML-based AI” [ 41 , p. 15657] or even “CT 2.0” [36]. These diverse conceptualizations highlight the need for shared vocabularies and e valuative approaches that travel across adjacent literacies and disciplinar y conte xts [ 7 ]. It also underscores concerns that data science is often fragmented across disciplinary traditions [ 4 ]. Making paradigms and trajectories explicit may ther efore function as a translation device for comparing interventions across diverse disciplines and literature . W e recognize that our proposed trajectories are just that— propositions —and welcome further discussion on how these should be dened, what language is useful in characterizing the distinction between paradigms, and how learning diers across trajectories. Mapping data literacy trajectories in K–12 education 7 5.2 Mapping common data literacy trajectories In this position paper , we suggest that the two proposed dimensions—logic (knowledge-based vs data-driven) [ 36 ] and explainability (transparent vs opaque) [ 2 ]—could serve to delineate the epistemic boundaries of diverse data literacy activities. W e have also argued that attention be given to how what approaches to scaolding [ 35 ] are needed to move learners between these paradigms. Dierent trajectories may be appropriate depending on instructional aims and the kinds of use cases students need. Rather than prescribing a trajectory , we propose that these dimensions make visible what should be taught e xplicitly for a given paradigm and traje ctory to supp ort thorough understanding. For example, introducing opaque models requires mor e explicit attention to concepts such as model condence [ 1 , 12 ] and evaluating model performance using confusion matrices [ 13 , 16 ]. Conversely , introducing transparent models requires a greater focus on practices such as feature weighting [ 13 ] or visualising relationships between variables using a heatmap of correlation coecients [ 16 ]. Exposing students to more ‘transparent’ (i.e. glass-box) processes has been argued as necessary to supp ort later engagement with more ‘ opaque’ systems [10]. Few examples within the Bridging trajectory were found which suggests that the distinction between rule-based and data-driven paradigms is not commonly taught [ e.g., 1 , 25 , 36 ]. Activities that purposefully move learners between multiple paradigms—including those that revisit earlier concepts and skills—might support a deep er understanding of the epistemic dierences across paradigms. For example , the Keeping transparent trajectory [ 13 , 39 ] seemingly leverages learners’ prior expectations that rule-based systems ar e inspectable and predictable while gradually shifting from hand-authored rules to data-driven models while preserving explainability . Likewise, the K eeping data-driven trajector y foregrounds data-driven reasoning while decreasing the le vel of trans- parency . These approaches may require additional scaolding as learners’ ability to understand how opaque systems work diminishes. In the Jumping trajectory , learners ‘jump’ from handling rule-based systems to engaging with opaque data-driven models, including neural networks. Without an understanding of how data-driven models work (e.g., through explainable tools and approaches), learners may have incorrect assumptions about the level of ‘correctness’ of data-driven systems and may ascribe certainty to their outputs. W e present the framework and associated trajectories to provoke debate on their usefulness for mapping the progression of learning across data paradigms. A s we have not advocated for any particular trajectory as ‘optimal’ , especially when moving from teaching knowledge-based to data-driven systems, w e are interested in how to dene these trajectories, and which will serve e ducators most ee ctively . By being explicit and juxtaposing dier ences in reasoning across paradigms, students can understand the b enets and trade-os associated with various levels of explainability as well as the underlying logic (i.e . data-driven vs rule-based). Though few in number , these instances were encouraging though they point to spe cic curricular and/or pe dagogical challenges, including context demands, a lack of suitable tools, and the technical complexity as learners and educators move toward less transparent systems. 6 Limitations and future work Our results ar e limited to the extent that we analysed early examples of research-led interventions rather than curricula or grey literature. As such, results are likely to have fewer examples of progressions of learning than more lengthy materials. T o that end, we found that most interventions remained within a single paradigm ( n =57/84), most often in DD- T or DD-O activities (see section 4.1)—such as classication with ML tools [ 38 ]—with few examples of instructional sequences that moved between these paradigms. 8 Whyte et al. Likewise, our understanding of what data literacy skills are needed in CS education is still limited. As curricula are being readily developed to support data literacy , future work could apply the data paradigms framew ork to curricula and associated learning materials to identify further trajectories and progressions between dierent paradigms. Further , research could focus on articulating and enacting dierent learning sequences and using the data paradigms framework to map their trajectories. 7 Conclusion In this position paper , we present the data paradigms framework and propose a set of learning trajectories as a tool to map the landscape of K–12 data literacy initiatives. W e suggest these tools could provide a starting point for researchers and resource developers when considering the design of data literacy learning environments. Our ndings demonstrate how some environments successfully scaold learners from transparent rules to transparent data models ( Keeping Transpar ent ). However , we found few instances where all three critical domains are touched upon (or the Bridging trajectory): knowledge-based logic, then transparent ( or inspectable) data analysis, before encountering opaque ML systems. The Jumping trajectory also indicates a pedagogical challenge as learners ‘bypass’ critical skills in understanding the inuence of data on model behaviour . W e argue that these skills are necessar y to prepare students when encountering ‘black-boxed’ systems in later learning experiences. Beyond CS education, we argue the data paradigms framework and associated trajectories may oer a shared vocabulary for learning scientists and CS researchers to map the epistemic shifts encountered by learners as they move across contexts and disciplines. W e hope to further discuss this work as part of and consider how w e might align data literacy goals across disciplines and better prepare students for a data-driven futur e. References [1] Brian Broll and Shuchi Gro ver . 2024. Beyond Black-Boxes: T eaching Complex Machine Learning Ideas thr ough Scaolded Interactive Activities. AAAI 37, 13 (Jul. 2024), 15990–15998. doi:10.1609/aaai.v37i13.26898 [2] Aishwarya Budhkar , Qianqian Song, Jing Su, and Xuhong Zhang. 2025. Demystifying the black box: A survey on explainable articial intelligence (XAI) in bioinformatics. Computational and Structural Biotechnology Journal 27 (2025), 346–359. doi:10.1016/j.csbj.2024.12.027 [3] Douglas H. Clements and Julie Sarama. 2004. Hypothetical Learning Trajectories: A Special Issue of Mathematical Thinking and Learning (1st ed.). Routledge. doi:10.4324/9780203063279 [4] Iain J Cruickshank, Nathaniel D Bastian, Jean R.S. Blair, Christa M Chewar , and Edward Sobiesk. 2024. Seeing the Whole Elephant - A Comprehensive Framework for Data Education. In SIGSCE . A CM, 248–254. doi:10.1145/3626252.3630922 [5] Montgomery Flora, Corey Potvin, Amy McGovern, and Shawn Handler . 2022. Comparing Explanation Methods for Traditional Machine Learning Models Part 1: An Overview of Current Methods and Quantifying Their Disagreement. arXiv:2211.08943 [stat.ML] doi:10.48550/arxiv .2211.08943 [6] Satoshi Fujishima, Kazuya T akemata, and Akiyuki Minamide. 2022. Practical Examples of Basic Data Science Course for Junior High and High School Students in Club Activity . In Mobility for Smart Cities and Regional Development - Challenges for Higher Education , Michael E. Auer , Hanno Hortsch, Oliver Michler , and Thomas Köhler (Eds.). Springer International Publishing, Cham, 449–455. [7] Lily W . Ge, Michael S. Horn, Duri Long, Judith E. Fan, and Matthew Kay . 2026. Data Literacy for the 21st Century: Perspectives from Visualization, Cognitive Science, Articial Intelligence, and Education. In Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (Barcelona, Spain) (CHI EA ’26) . Association for Computing Machinery , New Y ork, NY, USA, 1–5. doi:10.1145/3772363.3778701 [8] Sara Guerreiro-Santalla, Alma Mallo, Tamara Baamonde, and Francisco Bellas. 2022. Smartphone-Based Game Development to Introduce K12 Students in Applied Articial Intelligence. AAAI 36, 11 (Jun. 2022), 12758–12765. doi:10.1609/aaai.v36i11.21554 [9] Candice Guy-Gaytán, Julia S. Gouvea, Chris Griesemer , and Cynthia Passmore . 2019. T ensions Between Learning Models and Engaging in Modeling. Science & Education 28, 8 (01 Oct 2019), 843–864. doi:10.1007/s11191- 019- 00064- y [10] T om Hitron, Y oav Orlev, Iddo W ald, Ariel Shamir , Hadas Erel, and Oren Zuckerman. 2019. Can Children Understand Machine Learning Concepts? The Eect of Uncovering Black Boxes. In Pr oceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow , Scotland Uk) (CHI ’19) . ACM, New Y ork, NY, USA, 1–11. doi:10.1145/3290605.3300645 [11] Golnaz Arastoopour Irgens, Danielle Herro, Ashton Fisher , Ibrahim Adisa, and Oluwadara Abimbade. 2024. Bop or F lop?: Integrating Music and Data Science in an Elementary Classroom. The Journal of Experimental Education 92, 2 (2024), 262–286. doi:10.1080/00220973.2023.2201570 Mapping data literacy trajectories in K–12 education 9 [12] Golnaz Arastoop our Irgens, Hazel V ega, Ibrahim Adisa, and Cinamon Bailey . 2022. Characterizing children’s conceptual knowledge and computational practices in a critical machine learning educational program. IJCCI 34 (2022), 100541. doi:10.1016/j.ijcci.2022.100541 [13] Shiyan Jiang, Amato Nocera, Cansu Tatar , Michael Miller Y oder, Jie Chao, Kenia Wiedemann, William Finzer, and Carolyn P. Rosé. 2022. An empirical analysis of high school students’ practices of modelling with unstructured data. British Journal of Educational T echnology 53, 5 (2022), 1114–1133. doi:10.1111/bjet.13253 [14] Y usuke Kajiwara, Ayano Matsuoka, and Fumina Shinb o. 2023. Machine learning role playing game: Instructional design of AI education for age-appropriate in K-12 and bey ond. Computers and Education: Articial Intelligence 5 (2023), 100162. doi:10.1016/j.caeai.2023.100162 [15] Jenia Kim, Henry Maathuis, and Danielle Sent. 2024. Human-center ed evaluation of explainable AI applications: a systematic review . Frontiers in A rticial Intelligence 7 (2024). doi:10.3389/frai.2024.1456486 [16] Shin- Yu Kim, Inseong Jeon, and Seong-Joo Kang. 2024. Integrating Data Science and Machine Learning to Chemistry Education: Predicting Classication and Boiling Point of Compounds. Journal of Chemical Education 101, 4 (2024), 1771–1776. doi:10.1021/acs.jchemed.3c01040 [17] David Klahr . 2019. Learning Sciences Resear ch and Pasteur’s Quadrant. Journal of the Learning Sciences 28, 2 (2019), 153–159. arXiv:https://doi.org/10.1080/10508406.2019.1570517 doi:10.1080/10508406.2019.1570517 [18] Siu-Cheung Kong, William Man- Yin Cheung, and Olson Tsang. 2023. Evaluating an articial intelligence literacy programme for empowering and developing concepts, literacy and ethical awareness in senior secondary students. Education and Information T echnologies 28, 4 (01 Apr 2023), 4703–4724. doi:10.1007/s10639- 022- 11408- 7 [19] Phoebe Lin, Jessica Van Brummelen, Galit Lukin, Randi Williams, and Cynthia Breazeal. 2020. Zhorai: Designing a Conversational Agent for Children to Explore Machine Learning Concepts. AAAI 34, 09 (Apr . 2020), 13381–13388. doi:10.1609/aaai.v34i09.7061 [20] Zachary C. Lipton. 2018. The mythos of mo del interpretability . Commun. ACM 61, 10 (Sept. 2018), 36–43. doi:10.1145/3233231 [21] Duri Long and Brian Magerko. 2020. What is AI Literacy? Competencies and Design Considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20) . Association for Computing Machinery , New Y ork, NY , USA, 1–16. doi:10.1145/3313831.3376727 [22] W en-Y en Lu and Szu-Chun Fan. 2023. Developing a weather prediction project-based machine learning course in facilitating AI learning among high school students. Computers and Education: A rticial Intelligence 5 (2023), 100154. doi:10.1016/j.caeai.2023.100154 [23] Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica 22, 3 (2012), 276–282. [24] Luis Morales-Navarro and Y asmin B Kafai. 2024. Unpacking Approaches to Learning and Teaching Machine Learning in K-12 Education: Transpar ency , Ethics, and Design Activities. In WiPSCE ’24 . A CM, Article 3, 10 pages. doi:10.1145/3677619.3678117 [25] Stephan Napierala, Jan Grey , T orsten Brinda, and Inga Gr yl. 2023. What Type of Leaf is It? – AI in Primary Social and Science Education. In T owards a Collaborative Society Through Creative Learning , Therese Keane , Cathy Lewin, T orsten Brinda, and Rosa Bottino (Eds.). Springer Nature Switzerland, 233–243. [26] Viktoriya Olari and Ralf Romeike. 2024. Data-related Concepts for Articial Intelligence Education in K-12. Computers and Education Op en 7, 100196 (July 2024). doi:10.1016/j.caeo.2024.100196 [27] Viktoriya Olari and Ralf Romeike. 2024. Data-related Practices for Creating Articial Intelligence Systems in K-12. WiPSCE ’24 , Article 5 (Sept. 2024), 10 pages. doi:10.1145/3677619.3678115 [28] Matthew J Page, Joanne E McKenzie , Patrick M Bossuyt, Isabelle Boutron, Tammy C Homann, Cynthia D Mulrow , Larissa Shamseer , Jennifer M T etzla, Elie A Akl, Sue E Brennan, Roger Chou, Julie Glanville, Jeremy M Grimshaw , Asbjørn Hróbjartsson, Manoj M Lalu, Tianjing Li, Elizabeth W Loder , Evan Mayo-Wilson, Steve McDonald, Luke A McGuinness, Lesley A Stewart, James Thomas, Andrea C Tricco, Vivian A W elch, Penny Whiting, and David Moher . 2021. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clinical research ed.) 372 (March 2021), n71. [29] Raspberry Pi Foundation. 2025. Exploring the r ole of data science in K–12 computing education. http://rpf.io/datascience [30] Carl O. Retzla, Alessa Angerschmid, Anna Saranti, David Schneeberger, Richard Röttger , Heimo Müller, and Andreas Holzinger . 2024. Post-hoc vs ante-hoc explanations: xAI design guidelines for data scientists. Cognitiv e Systems Research 86 (2024), 101243. doi:10.1016/j.cogsys.2024.101243 [31] Kathryn M. Rich, Carla Strickland, T . Andrew Binkowski, Cheryl Moran, and Diana Franklin. 2017. K-8 Learning Trajectories Derived from Resear ch Literature: Sequence, Repetition, Conditionals. In Proceedings of the 2017 ACM Conference on International Computing Education Research (T acoma, W ashington, USA) (ICER ’17) . Association for Computing Machinery , New Y ork, N Y , USA, 182–190. doi:10.1145/3105726.3106166 [32] Gilad Shamir and Ilya Levin. 2022. T eaching machine learning in elementary school. IJCCI 31 (2022), 100415. doi:10.1016/j.ijcci.2021.100415 [33] R Benjamin Shapiro, Rebecca Fiebrink, and Peter Norvig. 2018. How machine learning impacts the undergraduate computing curriculum. Commun. ACM 61, 11 (Oct. 2018), 27–29. [34] Katharina Simbeck and Y annick Kal. 2024. Understanding how Computers Learn: AI Literacy for Elementary School Learners. In Proceedings of Mensch Und Computer 2024 . ACM, 375–380. doi:10.1145/3670653.3677511 [35] Iris T abak and Eleni A. K yza. 2018. Research on Scaolding in the Learning Sciences: A Methodological Perspective. In International Handbook of the Learning Sciences (1st ed.), Frank Fischer, Clark A. Chinn, Maria Punziano, and Peter Reimann (Eds.). Routle dge, New Y ork, 191–200. doi:10.4324/9781315617572- 19 [36] Matti T edre, Peter Denning, and T apani T oivonen. 2021. CT 2.0. In 21st Koli Calling . A CM, Article 3, 8 pages. doi:10.1145/3488042.3488053 [37] Matti T e dre, Henriikka V artiainen, Juho Kahila, T apani T oivonen, Ilkka Jormanainen, and T eemu V altonen. 2020. Machine Learning Introduces New Perspectives to Data Agency in K—12 Computing Education. In FIE . IEEE, 1–8. doi:10.1109/FIE44824.2020.9274138 10 Whyte et al. [38] Mary Theofanos, Y ee-Yin Choong, and Theodore Jensen. 2024. AI Use T axonomy: A Human-Centered A pproach . Technical Report AI 200-1. National Institute of Standards and T echnology (NIST), Gaithersburg, MD. doi:10.6028/NIST .AI.200- 1 [39] Golnaz Arastoopour Irgens T olulope Famaye and Ibrahim Adisa. 2025. Shifting roles and slow r esearch: children’s r oles in participatory co-design of critical machine learning activities and technologies. Behaviour & Information T echnology 44, 5 (2025), 912–933. doi:10.1080/0144929X.2024.2313147 [40] Tiany T seng, Matt J. Davidson, Luis Morales-Navarro, Jennifer King Chen, Victoria Delaney , Mark Leibowitz, Jazbo Beason, and R. Benjamin Shapiro. 2024. Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices. TOCE 24, 2, Article 25 (April 2024), 37 pages. doi:10.1145/3641552 [41] Jessica V an Brummelen, T ommy Heng, and Viktoriya T abunshchyk. 2021. T eaching T ech to Talk: K-12 Conversational Articial Intelligence Literacy Curriculum and Development T ools. AAAI 35, 17 (May 2021), 15655–15663. doi:10.1609/aaai.v35i17.17844 [42] Robert Whyte, Manni Cheung, Katharine Childs, Jane W aite, and Sue Sentance. 2026. Analysing data paradigms in K –12 data science activities: A systematic literature review [In publication]. In 20th WiPSCE Conference (WiPSCE ’26) . ACM, New Y ork, N Y , USA, 10. doi:XXXXXXX.XXXXXXX [43] Kenia Wiedemann, Jie Chao, Benjamin Galluzzo, and Eric Simoneau. 2020. Mathematical modeling with R: emb edding computational thinking into high school math classes. A CM Inroads 11, 1 (Feb. 2020), 33–42. doi:10.1145/3380956 [44] Randi Williams, Sanah Ali, Nisha De vasia, Daniella DiPaola, Jenna Hong, Stephen P. Kaputsos, Brian Jordan, and Cynthia Breazeal. 2023. AI + Ethics Curricula for Middle School Y outh: Lessons Learne d from Three Project-Based Curricula. IJAIE 33, 2 (01 Jun 2023), 325–383. doi:10.1007/s40593- 022- 00298- y [45] Randi Williams, Hae W on Park, and Cynthia Breazeal. 2019. A is for Articial Intelligence: The Impact of Articial Intelligence Activities on Y oung Children’s Perceptions of Robots. In CHI . A CM, 1–11. doi:10.1145/3290605.3300677 [46] Helen Zhang, Irene Lee, Sanah Ali, Daniella DiPaola, Yihong Cheng, and Cynthia Breazeal. 2023. Integrating Ethics and Career Futures with T e chnical Learning to Promote AI Literacy for Middle School Students: An Exploratory Study . IJAIE 33, 2 (01 Jun 2023), 290–324. doi:10.1007/s40593- 022- 00293- 3

Mapping data literacy trajectories in K-12 education

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment