Integrating data science ethics into an undergraduate major: A case study

In tegrating data science ethics in to an undergraduate ma jor: A case study Benjamin S. Baumer ∗ Statistical & Data Sciences, Smith College and Randi L. Garcia Psyc hology and Statistical & Data Sciences, Smith College and Alb ert Y. Kim Statistical & Data Sciences, Smith College and Katherine M. Kinnaird Computer Science and Statistical & Data Sciences, Smith College and Miles Q. Ott Statistical & Data Sciences, Smith College F ebruary 1, 2022 Abstract W e presen t a programmatic approach to incorporating ethics into an undergrad- uate ma jor in statistical and data sciences. W e discuss departmen tal-level initiatives designed to meet the National Academy of Sciences recommendation for integrating ethics into the curriculum from top-to-b ottom as our ma jors progress from our intro- ductory courses to our senior capstone course, as well as from side-to-side through co-curricular programming. W e also provide six examples of data science ethics mo d- ules used in ﬁv e diﬀerent courses at our liberal arts college, eac h focusing on a diﬀerent ethical consideration. The modules are designed to b e p ortable such that they can b e ﬂexibly incorp orated into existing courses at diﬀerent lev els of instruction with minimal disruption to syllabi. W e connect our eﬀorts to a gro wing b o dy of literature ∗ Benjamin S. Baumer is Asso ciate Professor, Statistical & Data Sciences, Smith College, Northampton, MA 01063 (e-mail: bbaumer@smith.edu ). This w ork was not supported b y an y grant. The authors thank n umerous colleagues and students for their supp ort. 1 on the teac hing of data science ethics, present assessments of our eﬀectiveness, and conclude with next steps and ﬁnal thoughts. Keywor ds: data ethics, education, case studies, undergraduate curriculum 2 “The p oten tial consequences of the ethical implications of data science cannot b e ov erstated.” —National Academies of Sciences, Engineering, and Medicine (2018) 1 In tro duction Data ethics is a rapidly-developing y et inchoate subﬁeld of researc h within the discipline of data science, 1 whic h is itself rapidly-developing (W ender & Klo efkorn 2017). Within the past few y ears, aw areness that ethical concerns are of paramoun t imp ortance has gro wn. In the public sphere, the Cam bridge Analytica episo de (Rosen b erg et al. 2018) rev ealed how the large scale harv esting of F aceb ook user data without user consen t w as not only p ossible, but p ermissable and weaponized for political adv an tage (Da vies 2015). F aceb o ok CEO Mark Zuck erb erg initially c haracterized “the idea that fak e news on F aceb o ok inﬂuenced the [2016 United States Presiden tial] election in any w a y” as “prett y crazy”—comments he later regretted (Levin 2017). Nev ertheless, the subsequen t tongue-lashing and hand-wringing has led to substantiv e changes in the p olicies of sev eral large so cial media platforms, including sev eral prominent public ﬁgures b eing banned. Popular b o oks like O’Neil (2016), Eubanks (2018), Noble (2018), F ry (2018), and D’Ignazio & Klein (2020) ha v e highligh ted ho w algorithmic bias (when automated systems systematically pro duce unfair outcomes) can render even w ell-in tentioned data science pro ducts profoundly destructiv e. These incidents ha ve reviv ed a sense among tech professionals and the public at-large that ethical considerations are of vital imp ortance. In light of this, it seems clear that indiﬀerence to ethics in data science is not an informed p osition. As academics, it is our resp onsibilit y to educate our students about ethical consid- erations in statistics and data science b efor e they graduate (Utts 2021). T o that end, recen t work by Elliott et al. (2018) addresses how to teac h data science ethics. The ma- c hine learning comm unit y conv enes the ACM Conference on F airness, Accountabilit y , and T ransparency (which includes Twitter as a sp onsor), which fo cuses on ethical considera- 1 F or example, the Data Science department at Stanford Universit y lists “Ethics and Data Science” as one of its researc h areas: https://datascience.stanford.edu/research/research- areas/ethics- and- data- science . 3 tions in machine learning research and developmen t. The AI Now Institute at New Y ork Univ ersit y publishes researc h and p olicy resources surrounding the use of artiﬁcial intel- ligence and algorithmic accountabilit y . Some of the ﬁrst wa ve of data science textb o oks include chapters on ethics (Baumer et al. 2021). Most sp eciﬁcally , the National Academies of Sciences, Engineering, and Medicine Roundtable on Data Science P ostsecondary Education dev oted one of its t w elve discussions to “Inte- grating Ethics and Priv acy Concerns into Data Science Education” (W ender & Klo efk orn 2017). National Academies of Sciences, Engineering, and Medicine (2018) includes the follo wing recommendations for undergraduate programs in data science: Ethics is a topic that, giv en the nature of data science, studen ts should learn and practice throughout their education. Academic institutions should ensure that ethics is wo ven in to the data science curriculum from the b eginning and throughout. The data science comm unity should adopt a co de of ethics; such a co de should b e aﬃrmed by mem b ers of professional so cieties, included in professional devel- opmen t programs and curricula, and conv eyed through educational programs. The co de should b e reev aluated often in ligh t of new dev elopmen ts. In the ma jor in statistical and data sciences at Smith College, we ha ve incorporated discus- sions of ethics (in one form or another) in to all of our classes, including the senior capstone, in which ab out 25% of the con tent concerns data science ethics. The default p osition of indiﬀerence prev alen t in the tech communit y is exactly the problem we are trying to help our studen ts recognize and solve. In this sense, indiﬀerence to ethics in data science is coun ter to the mission of our program, and in a larger sense to our profession. Esp ecially in ligh t of concerns ab out academic freedom, w e wish to stress that this treatmen t is not ab out indo ctrinating studen ts ab out what to think, but rather to force studen ts to grapple with the often not-so-obvious ramiﬁcations of their data science work and to dev elop their o wn compasses for na vigating these w aters (Heggeseth 2019). It is not a p olitical stance—it is an educational imp erative, as stressed by recommendations 2.4 and 2.5 in National Academies of Sciences, Engineering, and Medicine (2018). 4 In this pap er, w e presen t a programmatic approach to incorp orating ethics in to an undergraduate ma jor in statistical and data sciences. In Section 2, w e review and delineate notions of ethics in data science. W e discuss departmental-lev el initiatives designed to meet the NAS recommendation for in tegrating ethics into the curriculum from top-to- b ottom, and from side-to-side as w ell through co-curricular programming in Section 3. In Section 4 we pro vide six diﬀerent mo dules that fo cus on data science ethics that hav e b een incorp orated in to ﬁve diﬀeren t courses. The mo dules are designed for p ortabilit y and are publicly a v ailable at our w ebsite. 2 W e review evidence of our progress in Section 5. Section 6 concludes the pap er with next steps and ﬁnal though ts. 2 Ethical considerations in statistics and data science Ethical considerations in statistics ha v e b een taugh t for decades, going bac k to the classic treatmen t of misleading data visualization techniques in Huﬀ (1954). Ho wev er, there are additional nuances to ethical concerns in data science. In deﬁning “data ethics,” Floridi & T addeo (2016) prop ose that: data ethics can b e deﬁned as the branc h of ethics that studies and ev aluates moral problems related to data (including generation, recording, curation, pro- cessing, dissemination, sharing and use), algorithms (including artiﬁcial intelli- gence, artiﬁcial agents, mac hine learning and rob ots) and corresp onding prac- tices (including resp onsible inno v ation, programming, hac king and professional co des), in order to form ulate and supp ort morally go od solutions (e.g. right conducts or right v alues). This means that the ethical challenges p osed b y data science can b e mapp ed within the conceptual space delineated by three axes of researc h: the ethics of data, the ethics of algorithms and the ethics of practices. In this section, w e review the literature on teaching data science ethics under the three categories outlined b y Floridi & T addeo (2016). While the distinctions made by Floridi & T addeo (2016) are helpful, w e use the term “data science ethics” to encompass the full suite of sub jects deﬁned therein. Often, it is the interpla y and dep endence b etw een t wo or more 2 https://bit.ly/2v2cf8n 5 of these categories that provide the richest ethical dilemmas. Our discussions, esp ecially in Section 6, touch up on p ossible inter-divisional synergies in the teac hing of data science ethics in a lib eral arts context. That said, while a lib eral arts setting ma y pro vide some adv an tages, m uch of what we prop ose should b e p ortable to other types of institutions (see Section 6.2). 2.1 Ethics of practices F rom a legal p erspective, the General Data Protection Regulation (Europ ean Parliamen t 2018)—whic h b ecame enforceable in 2018—pro vides Europ eans with greater legal protec- tion for p ersonal data stored online than is present in the United States. This discrepancy highligh ts the distinction b etw een ethical and legal considerations—the former should b e univ ersal, but the latter are patently lo cal. A t some lev el, la ws reﬂect the ethical v alues of a country , but a profession cannot ab dicate its ethical resp onsibilities to lawmak ers. As O’Neil notes: “it is unreasonable to exp ect the legal system to keep pace with adv ances in data science.” (W ender & Klo efk orn 2017) This is not to say that gov ernment agencies are not inv olved. The United Kingdom no w oﬀers guidance to practitioners via their Data Ethics F ramework. F or o versigh t, Germany is considering recommendations for a data science ethics review b oard (T arran 2019). Ma jor professional so cieties, including the American Statistical Asso ciation (ASA) (Committee on Professional Ethics 2018 b ), the Asso ciation for Computing Mac hinery (A CM) (Committee on Professional Ethics 2018 a ), and the National Academy of Sciences (NAS) (Committee on Science, Engineering, and Public P olicy 2009), publish guidelines for conducting research. These do cumen ts fo cus on topics like professionalism, proper treatmen t of data, negligence, and conﬂicts of interest. Similarly T ractenberg (2019 a ), T ractenberg (2019 b ), and Gunaratna & T ractenberg (2016) explore ethics in statistical practice but don’t men tion new er concepts like algorithmic bias . Loukides et al. (2018 a ) fo- cuses on industry and iden tiﬁes ﬁv e framing guidelines for building data pro ducts: consent, clarit y , consistency , con trol, and consequences. Their related blog p ost (Loukides et al. 2018 b ) promotes the use of c hecklists o ver oaths, and is the inspiration for the command line to ol deon . Canney & Bielefeldt (2015) present a framew ork for ev aluating ethical 6 dev elopmen t in engineers. W ashington & Kuo (2020) examine how these ethical co des often protect the interests of corp orations and professional asso ciations at the exp ense of vulnerable p opulations. A broader discussion of professional ethics in statistics and data science w ould include issues surrounding repro ducibility and replicabilit y , which w ould in turn include concepts lik e transparency , v ersion con trol, and p-hac king (W asserstein et al. 2016, W asserstein et al. (2019)). What is more, inappropriate statistical analysis remains a problem in many ﬁelds, including biostatistics (W ang et al. 2018). 2.2 Ethics of data Within statistics, a ma jor ethical focus has b een on h uman sub jects research. The Belmont R ep ort is still required reading by institutional review b oards (IRBs) (National Commis- sion for the Protection of Human Sub jects of Biomedical and Behavioral Researc h 1978). It p osits three ma jor ethical principles (resp ect for p ersons, b eneﬁcence, and justice) and outlines three ma jor applications (informed consent, assessment of risks and b eneﬁts, and selection of sub ject). Y et just as w e reject the argumen t that all legal data science pro jects are ethical, we question the supp osition that all IRB-appro ved data science pro jects are ethical. Many IRBs hav e not b een able to k eep pace with the rapid developmen t of data science research, and ha ve little authorit y ov er researc h fueled b y data collected by corp o- rations. F or example, F aceb o ok data scientists manipulated the news feeds of 689,003 users in order to study their “emotional con tagion” (Kramer et al. 2014). While F aceb o ok did not break the la w b ecause users relinquished the use of their data for “data analysis, testing, [and] researc h” when they agreed to the terms of service, many ethical questions w ere subsequently raised, notably whether informed consen t was legitimately obtained. Moreo v er, Cornell Univ ersit y IRB appro v al w as obtained only after the data had b een collected, meaning that the appro v al co vered the analysis of the data, not the collection or the design of the exp eriment. This example illustrates ho w man y universit y IRBs are ill-equipp ed to regulate “big data” studies (Meyer 2014). More modern manifestations of data ethics are brough t on b y “big data.” These include 7 ethical concerns when scraping data from the w eb, storing your p ersonal data online, de- iden tifying and re-iden tifying p ersonal data, and large-scale exp erimentation b y internet companies in what Zub oﬀ (2018) terms “surveillance capitalism.” 2.3 Ethics of algorithms While the ethics of practices and of data remain crucially imp ortan t—and contin ue to pla y a role in our curriculum—m uch of our fo cus is on examining the impact of deplo y ed data science pro ducts. Most notably , w e center questions of algorithmic bias (which are not simply reducible to the use of biased data). The machine learning comm unity is ha ving intense debates ab out the extent to whic h data or algorithms are ultimately most resp onsible for bias in facial recognition and other AI-driv en pro ducts (Cai 2020). The impact of data science pro ducts up on p eople having marginalized iden tities (V akil 2018), particularly with resp ect to race and gender (Gebru 2020), is a growing focus of inquiry . In addition to bias, Bender et al. (2021) also raise questions ab out the environmen tal impact of Go ogle’s large language mo dels. 2.4 Data science ethics, broadly construed These ethical areas are ob viously informed by longstanding ethical principles, but are dis- tinct in the wa y that computers, the Internet, databases, and tech companies ha v e trans- formed the w a y we liv e (Hand 2018). Our fo cus areas mostly intersect with those iden tiﬁed b y National Academies of Sciences, Engineering, and Medicine (2018) as needed by data scientists: • Ethical precepts for data science and co des of conduct, • Priv acy and conﬁden tiality , • Resp onsible conduct of research, • Ability to iden tify “junk” science, and • Ability to detect algorithmic bias. This paper oﬀers examples for implemen ting these fo cus areas. F or example, Section 4.6 con tains a mo dule that has students apply ethical codes in context (ethics of practices). 8 T able 1: Summary of ethical mo dules describ ed. Categories corresp ond to those identiﬁed b y Floridi and T addeo. ‘Bloom’ refers to Blo om’s taxonomy . Section T opic Categories Blo om 4.1 OkCupid data Application 4.2 Stitc hFix algorithms Application 4.3 Grey’s Anatomy practices, data Application 4.4 Cop ywriting music practices Ev aluation 4.5 Co ding race practices, data Syn thesis 4.6 W eap ons of Math Destruction practices, algorithms Ev aluation The mo dules in Sections 4.1 and 4.3 explore notions of priv acy and conﬁden tiality (ethics of data). Sections 4.5 and 4.3 provide mo dules that illuminate notions of resp onsibilit y when conducting research (ethics of practices). Sections 4.2 and 4.6 present mo dules that encourage students to detect algorithmic bias in action (ethics of algorithms). Y et w e also go b eyond these k ey areas. The mo dule in Section 4.4 explores b oundaries b et w een legal and ethical considerations (ethics of data). In other activities not presen ted here, we engage students in our senior capstone and mac hine learning courses with deep questions ab out the impact that actions by large-scale in ternet companies hav e on our liv es (ethics of data and algorithms). Our program’s conception of data science ethics is broad and inclusive. Th us, while published ethical guidelines can b e used to inform ethical judg- men ts, we encourage our studen ts to consider these guidelines as non-exhaustive, especially giv en their nascen t and rapidly ev olving nature. One exercise in the capstone course asks studen ts to consider what ethical precepts are missing from v arious ethical guidelines. T able 1 summarizes these mo dules and links them to the relev ant categories as deﬁned b y Floridi & T addeo (2016) and the taxonom y of learning deﬁned b y Blo om et al. (1956). 2.5 Approac hes to teac hing data science ethics While discussion ab out data science ethics abounds, there are few successful m o dels for how statisticians and data scien tists can teac h it (Sc hlenker 2019). Indeed, relev ant w ork on 9 teac hing data science by Donoho (2017), Hicks & Irizarry (2018), Baumer (2015), Hardin et al. (2015), and Kaplan (2018) barely men tions ethics if at all. The ﬁrst reading of criteria for accreditation in data science published b y ABET did not include ethics (although the second reading do es) (Blair et al. 2021). Despite recommending the inclusion of ethics in to data science curricula, ev en the National Academies of Sciences, Engineering, and Medicine (2018) rep ort do es not include explicit recommendations for how to do so. One of the primary challenges is that while educators are t ypically w ell-trained in the ethics of h uman sub jects research, few hav e sp eciﬁc training in, say , algorithmic bias, or ev en general ethical philosophy . But wh y should a lack of training preven t us from teac hing our studen ts? As Bruce (2018) p oints out, ethical issues are not really a technical problem, but rather “a general issue with the impact of tech nology on so ciet y ,” to whic h w e all b elong. W e migh t make up for our lac k of training b y partnering with philosophers and ethicists to develop a robust ethical curriculum (Bruce 2018). Ec hoing Bruce (2018) that “there is a long history of sc holars and practitioners becoming in terested in ethics when faced with new technologies,” Gotterbarn et al. (2018) argue forcefully that the recent uptic k in interest in “computing ethics” is merely the most recen t star turn for a longstanding and v alued comp onen t of the computer science curriculum. While this is surely true at some lev el and imp ortant to keep in mind, it hardly seems lik e the renewed attention on ethics is un warran ted. Moreo ver, Gotterbarn et al. (2018)’s fo cus is on artiﬁcial intelligence driven systems like s elf-driving cars, whereas our fo cus is on ethical questions concerning data collected ab out p eople. Sev eral examples of how to teach ethics in statistics, data science, and (mostly) com- puter science exist. Hoﬀmann & Cross (2021) summarize eﬀorts in computer science and engineering education, including discussion of speciﬁc teaching strategies. Neﬀ et al. (2017) tak es a broad view of data science ethics, bringing to ols from critical data studies to b ear on the practice of actually doing data science. Burton et al. (2018) outlines a strategy for teac hing computer science ethics through the use of science ﬁction literature. Elliott et al. (2018) provides a framew ork for reasoning ab out ethical questions through the dual prisms of Eastern (mainly Confucianism) and W estern ethical philosophies. W e found this inclusiv e approac h to b e particularly v aluable giv en the large presence of international (particularly 10 Chinese) students in our classes. P erhaps presaging many recen t scandals, Zimmer (2010) analyzes a F aceb o ok data release through an ethical lens. Chivukula et al. (2021) and Shapiro et al. (2020) discuss approaches to teac hing data ethics through h uman-computer in teraction. Fiesler analyzes ethical topics in a v ariety of computer science courses (Saltz et al. 2019, Fiesler et al. 2020, Skirpan et al. 2018). Grosz et al. (2019) describ es ho w ethics education is in tegrated into the computer science curriculum at Harv ard. Barocas teac hes an undergraduate electiv e course on data science ethics at Cornell (W ender & Klo efk orn 2017). The Universit y of Michigan no w oﬀers a “Data Science Ethics” course through b oth Coursera and edX. Carter & Cro c kett (2019) pro vide a set of ethics “labs” for computer science that migh t complemen t the ones we presen t here. Through 2021, the Mozilla Re- sp onsible Computer Science Challenge has aw arded $ 3.5 million for the dev elopmen t of “curricula that in tegrate ethics with undergraduate computer science training.” These articles illustrate the need to further adv ance the teac hing of data science ethics in diﬀeren t institutional contexts. In this pap er, w e present six additional concrete modules for teac hing data science ethics. W e also outline departmental initiatives for fully in tegrating ethics into an undergraduate data science curriculum and culture. 3 Departmen t-lev el initiativ es A t Smith, every department p erio dically reviews and up dates a list of learning goals for their ma jor. The ma jor in statistical and data sciences (SDS) is designed to cov er a broad range of topics to pro duce versatile future statisticians and data scientists. Our learning goals include skills like: ﬁtting and interpreting statistical mo dels, programming in a high-level language, working with a wide v ariety of data t yp es, understanding the role of uncertain ty in inference, and communicating quantitativ e information in written, oral, and graphical forms. Most recen tly , w e added the following learning goal: Assess the ethical implications to so ciet y of data-based researc h, analyses, and tec hnology in an informed manner. Use resources, suc h as professional guide- lines, institutional review b oards, and published research, to inform ethical resp onsibilities. 11 In supp ort of this learning goal, we ha v e tak en measures to: • incorp orate ethics into all of our classes, culminating in a thorough treatmen t in the senior capstone course; • supp ort student engagemen t in extra-curricular and co-curricular ev en ts that touch on data science ethics; • bring a diverse group of sp eakers to campus to giv e public lectures that often fo cus on ethical questions; • include a candidate’s ability to engage with data science ethics as a criterion in hiring; • increase inclusion at every lev el of our program. W e discuss six sp eciﬁc mo dules for courses in Section 4. In this section we discuss ap- proac hes for the other measures. W e recognize that not every institution has the curricular ﬂexibilit y and resources that w e ha v e at Smith, nor is our student b o dy represen tativ e of those at diﬀeren t t yp es of institutions (e.g., R1’s or t w o-y ear colleges). W e discuss some sp eciﬁc considerations related to teac hing data science ethics in a lib eral arts con text in Section 6.1. Nevertheless, most of the mo dules w e presen t can ﬁt into a single class p e- rio d, whic h should pro vide instructors at an y institution with a reasonable opp ortunit y to incorp orate some of this material. 3.1 Studen t engagemen t in ethics Our students are very in terested in ethical questions in data science (see Section 5.2). As digital nativ es, they bring an imp ortan tly diﬀeren t p ersp ectiv e to questions ab out, for example, sharing one’s p ersonal data online. Man y of them hav e never seriously considered the ramiﬁcations of this. The notion that “if y ou’re not paying for the pro duct, then y ou are the pro duct” is new, scary , challenging, relev ant, p ersonal, and engaging to them in a w ay that helps them see data science as more than just a battery of technical skills (Fitzpatric k 2010). Th us, teaching ethics in data science is another wa y to foster studen t in terest in the discipline . F raming ethical questions in data science as unsolv ed problems helps studen ts imagine themselves making meaningful contributions to the ﬁeld in a wa y that may seem to o remote of a p ossibilit y in, sa y , estimation theory . 12 In particular, algorithmic bias in tersects with questions ab out inclusion and div ersit y with whic h studen ts are already grappling on a daily basis. During the past few y ears, w e ha ve applied for (and received) funds from the communit y engagement center and the Pro v ost’s oﬃce to supp ort student engagemen t with the Data for Black Lives conference (Milner 2019). In 2018, the ﬁrst y ear of the Data for Black Lives conference, w e hosted a remote viewing party on campus. In 2019, one of us attended the conference with ﬁv e studen ts. This exp erience led to a student inviting Data for Black Lives founder Y eshimab eit Milner to campus for a public lecture en titled “Ab olish Big Data.” [Milner is not against the use of data p er se, but identiﬁes unethical data science pro ducts p ow ered b y the large-scale collection of user data as newfangled examples of longstanding systemic racism.] These exp eriences help studen ts connect what they are learning in the classro om to larger mo vemen ts in the real world, and give them the sense that their skills might b e used to aﬀect p ositiv e c hange in the w orld—a p ow erful motiv ator. W e are fortunate that our institution pro vides generous funding for bringing outside sp eak ers to campus, and we ha ve tak en full adv antage of their largesse ov er the past tw o y ears. W e w elcomed Blac kRo c k data scientist Dr. Rac hel Sc hutt to giv e a talk titled “A Hu- manist Approac h to Data Science,” in whic h she underscored the importance of recognizing the p eople b ehind the num b ers, and highligh ted examples of recently published research that raised profound ethical dilemmas. Dr. T erry-Ann Craigie of Connecticut College 3 came to talk ab out the intersections of race, data science, and public p olicy . Dr. Emma Benn of Mount Sinai discussed ho w her in tersectional so cial identit y has informed her w ork as a biostatistician. Alumna Gina DelCorazon spoke ab out her exp eriences as Director of Data & Analytics at the National Math and Science Initiativ e in her talk “F rom In teresting to Actionable: Why go o d con text matters as muc h as go o d co de.” A t the in vitation of a studen t group, Dr. Alisa Ainbinder, an alumna w orking lo cally in data science, discussed ethical considerations in her work in non-proﬁt accelerator programs. Hearing from pro- fessionals ab out the ethical considerations in their w ork helps reinforce the messaging we giv e them in class. 3 Dr. Craigie is no w facult y at Smith College. 13 3.2 Programmatic eﬀorts in ethics The SDS ma jor at Smith includes an “application domain” requirement. One of the pur- p oses of this requiremen t is to ensure that students understand that all data and analyses ha v e a context. Conducting ethical data analysis requires kno wledge of the context in whic h the data are b eing used. F or example, only through ha ving some understanding of the history of racial/ethnic groups in the United States can data scientists hop e to co de and use race appropriately in their analyses (see Section 4.5). The SDS ma jor at Smith requires every student to take one course that fo cuses ex- plicitly on communication. Another simple initiative w as to allo w studen ts to fulﬁll this requiremen t b y taking the “Statistical Ethics and Institutions” course taught at nearby Amherst College by Andreas V. Georgiou, the former Presiden t of the Hellenic Statistical Authorit y (Langkjær-Bain 2017). Georgiou has faced criminal charges in Greece for his role in rep orting economic statistics around the time of the Greek debt crisis. The Ameri- can Statistical Asso ciation—among other organizations—has defended Georgiou’s actions (Pierson & Wilkinson 2021). Although the course did not explicitly fo cus on communica- tion, we made an exception to our p olicy to allow studen ts to ha ve this unique opp ortunity to learn ab out statistical ethics from the p erson at the cen ter of a world-famous episo de. Moreo v er, ethics and comm unication are intert wined, in that conv eying ethical subtleties requires a diﬀeren t skill set than sa y , explaining a statistical mo del. Finally , w e take small steps to ensure that incoming faculty are capable of supp orting our program in meeting this new est learning goal. They cannot be dismissiv e of ethical concerns in data science. In the same w ay that a candidate who didn’t understand correlation w ould not be hireable, we consider whether a candidate who seemed ignoran t of data science ethics w ould b e hireable. T o assess this, w e migh t ask a question ab out data science ethics during a ﬁrst round or on-campus interview. W e might ask candidates to submit a separate statemen t on data science ethics as part of their application, or to discuss ethical considerations in their teac hing and/or research statemen t. T o b e clear, w e cannot and do not infringe up on the candidate’s academic freedom b y assessing what they think ab out data science ethics. Rather, we are merely trying to assess how de eply they ha v e though t ab out data science ethics and th us whether they are suﬃciently prepared to help 14 the program meet our learning goals. 4 Mo dules for teac hing data science ethics In this section we present six mo dules for teac hing ethics in data science that are used in a v ariety of courses. Here, w e giv e a brief description of each mo dule, its learning goals, and the con text of the course in whic h it is delivered. In our supplementary materials, we pro vide more complete teaching materials. 4.1 Three uses of OkCupid data OkCupid is a free online dating service whose data has b een scrap ed on at least three kno wn o ccasions. Kim & Escob edo-Land (2015) presen ted scrap ed data on nearly 60,000 OkCupid users in the early 2010’s for use in the classro om. Around that same time, Chris McKinla y created 12 fak e OkCupid accounts and wrote a Python script that harvested data from around 20,000 w omen from all ov er the coun try (Poulsen 2014). In 2016, Kirkegaard & Bjerrekær (2016) published a pap er in an op en-access psychology journal in v estigating a v ariety of h yp otheses ab out OkCupid users—along with the corresp onding data from 70,000 users. F rom the same underlying data source, these three incidents pro vide fertile ground for substan tiv e discussions ab out the corresp onding ethical considerations. Some further detail reveals fascinating disparities: • Kim & Escob edo-Land (2015) obtained explicit permission from OkCupid CEO Chris- tian Rudder b efore publishing the data in a statistics education journal. Their goal w as to illuminate statistical phenomenon using data that w as relev ant to undergradu- ate studen ts. In addition, the authors remov ed usernames from the data as a mo dest attempt at de-identifying the users. Only later w ere the authors alerted to the fact that even though usernames had b een stripp ed, the full-text of the essay ﬁeld of- ten con tains p ersonally-identifying information lik e F aceb o ok and Instagram handles. The original publication was subsequently corrected in May 2021 following the sug- gestions of Xiao & Ma (2021) to the editor of the Journal of Statistics and Data Scienc e Educ ation (Kim & Escobedo-Land 2021). 15 • McKinlay did not publish the data he collected—his goal was p ersonal. Essentially , he trained his own mo dels on the data he collected to ﬁnd his o wn matc h. It w orked— he is no w engaged to the w oman he met. Only after his story w as published w ere questions raised ab out whether he had violated the Computer F raud and Abuse Act. • Kirkegaard & Bjerrekær (2016) included username, age, gender, and sexual orienta- tion in the data set. This mean t that users w ere easily identiﬁable and particularly vulnerable. While the blowbac k in this case was immediate, Kirkegaard insisted that the data w ere already public and his actions w ere legal. (Zimmer (2010) sheds light on a similar episo de inv olving F aceb o ok.) Collectiv ely , these episo des raise issues ab out informed consent, data priv acy , terms of use, and the distinction betw een laws and ethics. One could use these inciden ts to motiv ate co v erage of technical concepts suc h as k -anonymit y (Sw eeney 2002) and diﬀerential priv acy (Dw ork et al. 2006). In our senior capstone course (see Section 4.6), we ask studen ts to break in to three groups and discuss the relev ant ethical issues in v olved in each case. Then, w e bring students together to write a coheren t resp onse. Some students elect to use these inciden ts as the sub ject of a longer essa y , as describ ed in Section 4.6. 4.2 Algorithmic bias in mac hine learning Discussions on the p erniciousness of “algorithmic bias” in machine learning and artiﬁcial in telligence hav e b ecome more prev alent of late, b oth in the news media as well as in aca- demic circles (Noble 2018, Eubanks 2018, O’Neil 2016). Ho wev er, few of these ideas ha ve b een incorp orated in to the classro om. F or example, in James et al. (2013)—a p opular in tro- ductory textb o ok on mac hine learning—the Credit dataset is often used as an example (it is a v ailable in the companion ISLR R pack age (James et al. 2021)). Readers are encouraged to apply v arious predictiv e algorithms to predict the credit card debt of 400 individuals using demographic predictors like Age , Gender (enco ded as binary), and Ethnicity with lev els African American , Asian , and Caucasian . While the data are simulated, one must still w onder what kind of thinking in students are w e tacitly encouraging by using ethnicit y to predict debt and th us p erhaps credit score. This is esp ecially fraught in ligh t of existing 16 inequalities to access to credit that fall on demographic lines. 4 In other words, to quote Milner (2019), “What are w e optimizing?” In this mo dule, we prop ose a hands-on in-class activit y to help students question the supp osed ob jectivity of machine learning algorithms and serve as a gatewa y to discussions on algorithmic bias. The activity centers around StitchFix, an online clothing subscription service that uses mac hine learning to predict whic h clothes consumers will purchase. New users are ask ed to complete either a men’s or w omen’s “St yle Proﬁle” quiz, whose responses are then used as predictor information for the compan y’s predictiv e algorithms. How ever, b oth quizzes diﬀer signiﬁcan tly in the t yp es of questions ask ed, how the questions are ask ed, in which order they are asked, and what information and visual cues are provided. Figure 1 (current as of December 16, 2019) presen ts one example relating to clothing st yle preferences, sp eciﬁcally jean cut. The prompt in the men’s quiz shows photographs of an individual actually wearing jeans, whereas the women’s quiz presents the options in a m uc h more abstract fashion. On top of diﬀerences relating to clothing st yle and ﬁt, man y diﬀerences exist in ho w demographic information is collected. Figure 2 presen ts an example of a question p ertaining to age. While b oth groups are asked the same question of “When is your birthda y?” individuals completing the women’s quiz are primed with a “W e w on’t tell! W e need this for legal reasons!” statemen t, whereas those completing the men’s are not. One has to susp ect suc h a diﬀerence was not coincidental, but rather reﬂects a prior b elief of the quiz designers as to the manner in which one should ask ab out age. Other diﬀerences include questions p ertaining to paren ting and o ccupation. Man y of these diﬀerences can b e attributed to prev ailing biases and b eliefs on the nature of gender and thus can serv e as fertile ground for student discussions on what algorithmic bias is. While the stakes in purc hasing clothing are lo w er than for criminal recidivism or residen tial lending, it do es present students with a clear example in which h uman biases in clothing preferences inﬂuence the outcome of a (purp ortedly ob jective) mac hine learning algorithm (O’Neil 2016). Thus, this algorithm has the potential to reinforce these biases. Sp eciﬁcally , this mo dule can satisfy three goals. First, it pro vides studen ts with an example of algorithmic bias to whic h they can directly relate. This example stands in 4 The subsequen t ISLR2 (James et al. 2022) update to the ISLR pack age remov es the Gender and Ethnicity v ariables from the Credit dataset. 17 Figure 1: Example diﬀerence b etw een men’s (left) and w omen’s (righ t) StitchFix St yle Quizzes: Question on jean preferences. Contrast the abstract presen tation of jeans sho wn to women with a picture of someone actually wearing jeans sho wn to men (current as of 2019-12-16). Figure 2: Example diﬀerence b etw een men’s (left) and w omen’s (righ t) StitchFix St yle Quizzes: Question ab out age. W e note the disclaimer present for w omen is omitted for men (current as of 2019-12-16). 18 con trast to more abstract and less accessible examples discussed in academic readings and news media, suc h as facial recognition softw are (Kanta yya & Buolamwini 2020). Second, it asks studen ts to view the statistical, mathematical, and machine learning topics cov ered in class through a so ciological lens, in particular relating to the nature of gender (Gebru (2020) develops this further). Third, it giv es students the opp ortunit y to think ab out statistical mo dels in a ric h, real, and realistic setting, in particular what predictor v ariables are b eing collected and what mo deling metho d/tec hnique is b eing used. These three goals tie in to the greater goal of imbuing studen ts with ethical thinking b y encouraging them to think ab out the implications of their model and algorithm design c hoices b eyond the strict goal of maximizing prediction accuracy . 4.3 So cial net works P erhaps in part thanks to the aptly-named F aceb o ok mo vie ( The So cial Network ), so cial net w orks are intuitiv e to students. The relatively simple mathematical form ulation of net w orks (i.e., graphs) makes them easy to understand, but the complex relationships and b eha viors in suc h net works lead to profound research problems. Moreo v er, analyzing so cial net w ork data leads to thorn y ethical questions. A 300-lev el course on statistical analysis of so cial net work data has as its primary ob jectiv e for studen ts to “learn how to answer questions by manipulating, summarizing, visualizing, and mo deling netw ork data while b eing vigilant to protect the p eople who are represen ted in that data.” Th us, ethical concerns surrounding priv acy and conﬁden tiality are wo ven directly into the main course ob jectiv e. The primary textb o ok is Kolaczyk & Cs´ ardi (2014), which pro vides a thorough treat- men t of b oth the theoretical and applied asp ects of so cial netw ork analysis. Ho wev er, supplemen tary readings are esp ecially imp ortan t, since Kolaczyk & Cs´ ardi (2014) fails to address the man y complex ethical issues that arise for these data. W e emplo y supplemen tal readings to address data ethics on topics including: • collecting social net work data • informed consen t for so cial net work surv eys • data iden tiﬁability and priv acy in so cial netw orks 19 • link prediction • data ethics sp eciﬁc to so cial net works In our supplemen tary materials w e presen t a mo dule applied during the ﬁrst week of class in whic h w e use an example from p opular culture (the television show Gr ey’s A natomy ) to motiv ate ethical issues in social netw ork analysis (recall Burton et al. (2018)). It has sev eral goals: • Prime studen ts to alwa ys think ab out how the data w ere collected • Prime studen ts to think ab out the beneﬁts of and risks of eac h data collection / analysis / visualization, etc. • Encourage studen ts to create their o wn understanding of how data ethics p ertain to so cial netw ork data as opp osed to b eing provided with data ethics rules. This encourages critical thinking whic h can then b e transferred to other topics and types of data. It is esp ecially imp ortan t to in tro duce ethical considerations on the ﬁrst day of the course to set the tone and giv e students the message that data ethics is inextricable from the rest of the con tent of the course. 4.4 Cop ywrited m usic and academic researc h Ethical usage of data can come in to conﬂict with cop yright law. Music usage, for example, is hea vily protected by cop yright la ws. The ﬁeld of Music Information Retriev al (MIR) seeks to address questions ab out m usic, such as ﬁnding all co v ers of a particular song or detecting the genre of a song. In MIR, access to m usic is critical to conducting researc h, and that access is gov erned b y copyrigh t la ws (which are themselv es a frequen t topic when teac hing tech ethics (Fiesler et al. 2020)). Music is also a medium that has a fraught history navigating the line b et ween shar- ing and violating copyrigh t. This history is complicated b y the p ow er dynamics at pla y b et w een recording companies and artists, and recording companies and listeners. T o day , m usic is often consumed through streaming services, distorting our understanding of music o wnership. Since m usic is heavily protected b y copyrigh t but remains omnipresent in our 20 liv es, conv ersations ab out data access require n uance ab out o wnership, sharing, and the subtleties of ethical vs. legal considerations. T o explore data access and cop yrigh t, we provide a mo dule in whic h students hav e a debate ab out whether the m usic copyrigh t la ws should b e softened for those conducting MIR research. This debate is not as simple as whether to relax these laws, instead one side is defending the role and purp ose of copyrigh t la ws for music while the other side not only adv o cates for relaxing these la ws but also for how to accomplish this. This requirement of prop osing a solution required students to hold the resp onsibilities of a researcher who has broad access to data in con trast with the ease at which w e can share music (and data). Understanding that the goal of copyrigh t is to protect artists, and then con trasting studen ts’ exp eriences of accessing and digesting music, this debate’s o v erarching goal is to hav e studen ts na vigate legal considerations (i.e., cop yrigh t) and ethical considerations (i.e., when to share or not share data) in the contexts of pushing research forw ard and of capitalist motiv ations of the music industry . The legal restrictions of copyrigh t and the ethical resp onsibilities of a researcher to protect and appropriately use (and share) their data provide a fascinating grey area in which to ha ve this debate. The generational exp erience of our current students informs their notions of moralit y and access, whic h in turn leads them to confron t legal restrictions in an in teresting w ay that diﬀers from previous generations. This debate activit y was originally part of a senior seminar introducing studen ts to the ﬁeld of MIR, but this activity could b e done in any course where data prov enance, data usage, or data access is discussed. Studen ts were randomly assigned to one side of the debate. In preparation for the debate, students w ere required to submit a p osition pap er (due just b efore the debate) that presented a coherent argument that is w ell supp orted by the literature. Studen ts were also barred from sharing arguments with eac h other (even if assigned to the same side of the debate). How ever, they could share resources with each other (just not their opinion of these resources). This structure of a preparatory pap er follow ed by a debate required studen ts to engage with the researc h pro cess at a deep level. F or the actual debate, eac h side was giv en opp ortunities to presen t their ideas and oﬀer rebuttals to the other side. This meant that 21 not only did they ha ve to ﬁnd resources and digest them, they had to discuss the ideas both in written text and orally in a debate setting. Requiring students to engage in this kind of “p ersp ectiv e-taking” may be v aluable in its o wn righ t (Murado v a 2021, Giroux et al. 2016). 4.5 T eac hing ab out race and ethnicit y data In an upp er-level research seminar on in tergroup relationships cross-listed in the psyc hology departmen t, students learn the psycho logy of close relationships b etw een p eople who ha v e diﬀering so cial group iden tities (e.g., racial/ethic and gender group identities). In addition, studen ts learn to analyze dyadic data through multilev el mo deling (i.e., mixed linear mo d- eling), and write repro ducible researc h reports in AP A format with the R pac k age papaja (Aust & Barth 2022). This course attracts a div erse group of studen ts in terms of ma jors, professional goals, interests, statistical preparation, and p ersonal identities. In this ethics mo dule, w e describ e a discussion and data cleaning activity used to get studen ts thinking in a more careful and nuanced wa y ab out the use of race and ethnicit y data in their analyses. The instructor pro vides psychological data from her own researc h program, and the o v erarching focus of the course is to form research questions answerable through the analysis of data that has already b een collected. Since the fo cus is on analyzing existing data (in addition to talking ab out race), w e also discuss: • how to transparen tly communicate one’s use of conﬁrmatory versus exploratory anal- yses • the philosophical diﬀerences b etw een inductiv e and deductive reasoning • the prev en tion of p-hacking (W asserstein et al. 2016, W asserstein et al. (2019)) and HARKing [Hyp othesizing After the Results are Known; Kerr (1998)] On the ﬁrst da y of this course, w e hav e a class discussion ab out how we will try to create a climate of psychological safet y (Edmondson 1999) together. This initial discussion helps to set the tone of resp ect and generosity that we will need in order to hav e fruitful discussions ab out race and ethnicity data. In the ﬁrst half of the course, class sessions alternate betw een discussions about assigned readings (from psyc hology) and the statistical and data science instruction they need to complete their pro jects. In the second half of 22 the course, class sessions are mainly used for activ ely working on their pro jects. The t wo parts of this ethics mo dule (discussion and data cleaning) might b e split across t wo class sessions. The activit y describ ed in this mo dule consists of a class discussion ab out race and a race/ethnicit y data cleaning activit y in the con text of a psyc hology article about interracial ro ommate con tact (Sho ok & F azio 2008). The structure of this activit y invites students to discuss the article ﬁrst in small groups, and then as a class. The larger class discussion p ortion of this activit y is designed to evolv e into a broader discussion ab out the co ding and use of race and ethnicit y data in quantitativ e researc h. Some imp ortant revelations that migh t b e pulled from the discussion include: • Researchers studying in terracial in teractions mak e choices ab out who to fo cus on, and, in the past, this choice has often b een to fo cus on white participan ts only . An ac kno wledgement of white privilege and who, historically , has b een asking the researc h questions migh t come out as w ell. • A p erson’s p ersonal racial/ethnic identit y ma y b e diﬀerent from how they are p er- ceiv ed by another p erson (ro ommate). • The c hoice to use a p erson’s o wn racial/ethnic identit y data or someone’s p erception of their race depends, in part, on the research question. When is iden tity or p erception more imp ortant for the sp eciﬁc researc h con text? • Race is not as clear of a categorical v ariable as w e think it is. Can we think of other instances of this, for example, with gender categorization? • Are there times when it could serve a so cial go o d to use race in our analyses and, in con trast, are there w ays in which using race and ethnicit y data in analyses might reify so cially constructed racial categories? • If you decide to use race in your analyses, what might you do in smaller samples if there are very small num b ers of ethnic minorit y groups relative to White/Europ ean- Americans? Is it ever OK to collapse racial/ethnic categories? What immediate consequences do these choices ha v e for the in terpretation of y our analysis and what broader consequences might these c hoices hav e when y our results are consumed b y y our intended audience? 23 The second part of this activit y asks students to co de raw race/ethnicity data in to a new categorical v ariable called race clean . They do this part in pairs. Then, in small groups, they discuss the decisions they made when completing this task and also an y feelings they had during the task, as those feelings reﬂect the hard realities that researchers must confron t in their w ork. The raw data comes in c hec k-all-that-apply and free resp onse formats. Students will ﬁnd this task quite diﬃcult, and perhaps uncomfortable. The goal is not to ha ve them ﬁnish, but to get them to recognize the ambiguit y inheren t the construction of categorical race/ethnicity v ariables. They may hav e used the clean v ersion of race/ethnicity v ariables in the past without thinking muc h of it. Lastly , the mo dule also contains notes on closing thoughts the instructor migh t oﬀer their students after this activity . It is very imp ortan t not to skip the wrap-up for this activit y . Let studen ts kno w that this is not the end of the discussion. As future data scien tists, they can play an active role in creating ethical guidelines for mo ving tow ards more appropriate use of race and ethnicity data—a critical area of need (Gebru 2020, Benjamin 2019, Milner 2019, Kan tayy a & Buolamwini 2020). 4.6 We ap ons of Math Destruction in the senior capstone In the senior capstone course, roughly 25% of the course is dev oted to learning ab out data science ethics. During the ﬁrst half of the semester, we sp end every other class p erio d discussing ethical considerations that arise from weekly readings of O’Neil (2016). These readings introduce studen ts to episo des in whic h often well-in tentioned data science pro d- ucts ha v e had harmful eﬀects on so ciet y (e.g., criminal sent encing algorithms, public school teac her ev aluations, US News and W orld rep ort college rankings, etc.). These episo des are accessible to studen ts and provide many opp ortunities to engage students in thoughtful con v ersation. The material in O’Neil (2016) also intersects with a wide v ariet y of statistical topics, suc h as mo deling, v alidation, optimization, Ba yesian statistics, A/B testing, T yp e I/II er- rors, sensitivity and sp eciﬁcity , reliabilit y and accuracy , Simpson’s paradox, multicollinear- it y , confounding, and decision trees. A clev er instructor could probably build a successful course entirely around these topics. 24 Moreo v er, the ethical considerations that O’Neil (2016) raises ab out algorithmic bias, informed consen t, transparency , and priv acy , also touch on hot-button so cial questions sur- rounding structural racism, gender equity , softw are licensing, cheating, income inequalit y , propaganda, fak e news, scams, fraud, pseudoscience, and p olicing bias. Situated in the fallout from the 2008 global ﬁnancial crisis, but presaging Cam bridge Analytica and fak e news, the b o ok feels simultaneously dated and relev ant. Our students lived through the global ﬁnancial crisis but most w ere to o y oung to understand it—for many of them the b o ok allo ws them to grapple with these ev en ts for the ﬁrst time as adults. The ﬁrst ma jor goal of the mo dule is to ha ve studen ts interrogate the manifold ethical considerations in data science. Reading O’Neil (2016) and in-class active learning activities help accomplish this learning goal. W e emplo y a v ariety of techniques, including think- pair-share, break out groups, student-led discussions, and ev en lecturing to keep studen ts engaged in class. This work helps students ac hieve the lo w-level “iden tiﬁcation” thinking from Blo om et al. (1956). Ho w ever, the second ma jor goal is to hav e students in terpret the actions of data sci- en tists in real-w orld scenarios and comm unicate their ev aluations in writing. T o this end, more structured readings are needed (these are the “resources, suc h as professional guide- lines” that are alluded to in our learning goal). W e present studen ts with t wo frameworks for thinking critically about data science ethics: Data V alues and Principles (Gershk oﬀ et al. 2019) and the Hipp o cratic Oath for Data Science (National Academies of Sciences, Engineering, and Medicine 2018). 5 The former deﬁnes four v alues (inclusion, exp erimen- tation, accountabilit y , and impact) and tw elve principles that “tak en together, describ e the most eﬀective, ethical, and mo dern approach to data team work.” The latter pro vides a data science analog to the oath that medical do ctors hav e taken for centuries. Students then complete tw o assignments that require high-level “ev aluativ e” thinking (again from Blo om). First, each studen t writes an essay in which they analyze a data science episo de— p erhaps dra wn from O’Neil (2016)—in the con text of one of the aforementioned framew orks. This assignmen t forces studen ts to assess whether the actions of sp eciﬁc data scientists w ere 5 The Oxford-Munich Co de of Conduct for Professional Data Scientists is another similar eﬀort: http: //www.code- of- ethics.org/code- of- conduct/ 25 ethical, using published resources for guidance. Second, each pro ject group (consisting of 3–5 students) writes an ethical statemen t ab out their semester-long pro ject, in which they collectiv ely describ e any ethical issues related to their w ork, foresee p ossible negativ e ramiﬁcations of their work, and justify the v alue of their pro ject. T ogether, these assignments not only impress up on studen ts the imp ortance of ethics in data science, but also give them to ols and exp erience to reason constructively ab out data science ethics in the future. The goal is to pro duce studen ts who hav e fully integrated ethics in to their understanding of statistics and data science and who p ossess the knowledge to ev aluate actors in the ﬁeld. 5 Assessmen t of our ethical curriculum Early returns suggest that our emphasis on teaching data science ethics is ha ving an impact and producing graduates with the ability to translate what they ha ve learned in to action. T o supp ort this claim we relate ﬁv e concrete anecdotes, analyze results from an anonymous studen t survey , and reﬂect on the exp erience of facult y in our program. 5.1 Data science ethics in action There is very little research on the impact of data science ethics instruction on undergrad- uate students. Th us, w e present the following ﬁv e examples, whic h—while anecdotal— illustrate ho w our (former) students are putting their kno wledge of data science ethics in to practice. In each case, we tie their actions to the taxonomy in Blo om et al. (1956) and our learning goal. Tw o studen ts in the senior capstone course w ere so enga ged with the discussion of ethics surrounding the OkCupid data from Kim & Escob edo-Land (2015) that they independently wrote a letter to the editor calling for a review and recommending sp eciﬁc mo diﬁcations (Xiao & Ma 2021). This letter formed the basis of the correction to the original article published in May 2021 (Kim & Escob edo-Land 2021). While their eﬀorts had the full supp ort of the authors, this was not part of the capstone course and they received no credit for it. Their w ork necessitates high-lev el thinking ab out data science ethics in con text 26 (ev aluation and syn thesis) and demonstrates attainment of our learning goal. Another student used her exp erience with data science ethics directly in a summer in ternship with an anon ymous compan y to help draft the compan y’s heretofore non-existent p olicies around ethical data use (Con wa y Center for Inno v ation and Entrepreneurship 2019). “[She] w as also the ﬁrst data scien tist to w ork in the [company] space. Un til her arriv al, [compan y]’s businesses lack ed clear guidelines for collecting data and wa ys for using that data to generate insights. Surprised b y this, [she] ﬁrst initiated con versations with the [company] team around ethical concerns in data collection. Dra wing on lessons from her academic work , and discussions with her Smith men tors, she help ed to dev elop p olicies for [compan y] businesses to ethically collect, manage, and act on customer data mo ving forward.” W e note that the connection b et ween data science ethics in practice and her academic coursew ork was made explicit by the student . Again, this generativ e work represents high- lev el thinking that meets our learning goal. In 2021, a curren t SDS ma jor won a national prize in a Human Rights Essa y Contest sp onsored b y the American Asso ciation for the Adv ancemen t of Science. Although her essa y w as ab out telehealth—not normally considered a data science topic—the student connected her success directly to the treatmen t of ethics in her SDS courses (Solow 2021): “Something else Smith do es really w ell is include the ethical element–understanding ho w y ou mak e c hoices based on y our o wn biases,” she adds. “Ev ery statistics class I’ve had here has included a discussion of ethics.” The studen t’s exp erience supp orts our claim that ethics is wo ven into the curriculum and her essay constitutes high-level thinking that meets our learning goal. Another anecdote in volv es a student group supp orting students in our ma jor that held their ann ual “Data Science Da y” on No vem b er 9th, 2019. At the op en house p ortion of the ev en t, in addition to op erating b o oths on data visualization and mac hine learning, studen ts set up a “data ethics” b o oth with handouts p osing ethical and philosophical questions ab out the use of data (see Figure 3). While this even t was sp onsored by the program, programming for the even t was entirely determined b y studen ts. The inclusion 27 Figure 3: The SDS studen t group chose to staﬀ a ‘data ethics’ bo oth at Data Science Da y 2019. of the b o oth suggests that students see ethics as an in tegral comp onent of data science, on par with data visualization and machine learning. Although these actions do not meet our learning goal, they do reﬂect an ability to form ulate ethical conundrums in data science that indicates high-lev el thinking. F urthermore, in the wak e of discussions on racism and white supremacy spurred b y the m urder of George Floyd in May 2020, t wo students created a Data Science Resources for Change w ebsite. They state: “In order to b e though tful, eﬀective, and inclusiv e data scien tists, we b eliev e it is imp ortan t to understand the wa ys in whic h bias can play a dangerous role within our ﬁeld, to understand the wa ys in whic h data can b e used to either reinforce/exacerbate or ﬁght oppression, and to supp ort the inclusion of v oices of color within the communit y .” T o this end this website includes numerous resources suc h as reading lists, videos and p o dcasts, organizations to support, and notable p eople to follow. These actions reﬂect lo wer-lev el thinking and a p o w erful desire to b e part of a solution to complex problems. W e interpret all of these as early signs of our program’s success at pro ducing more ethical data scien tists. 28 Figure 4: Student self-assessment of their ethical capabiliities, and the importance of data science ethics in their education, from an anonymous surv ey of 23 studen ts. W e note that nearly all respondents saw the inclusion of data science ethics as an important enhancement to their education, although they w ere less certain of their o wn capabilities in analyzing ethical concerns. Created using ggplot2 (Wic kham et al. 2021) for R (R Core T eam 2021). 5.2 Analysis of surv ey resp onses W e conducted an anonymous online surv ey during the summer of 2019, in which 23 studen ts participated. 6 The results in Figure 4 rev eal that studen ts are in terested in learning more ab out data science ethics and feel that it is an imp ortan t part of their education. How ever, they are less certain that they hav e achiev ed our stated learning goal. Unfortunately , none of the resp ondents had tak en the capstone course (see Section 4.6), and so these results almost certainly undersell the eﬀectiv eness of our ethical curriculum. The ﬁrst panel in Figure 4 reﬂects self-assessments from studen ts ab out t wo asp ects of our ma jor learning goal. The questions reﬂect b oth the abilit y of a studen t to assess 6 This survey was approv ed by the Smith College IRB, proto col 18-111. 29 the ethical implications of data science work, as w ell as their ability to draw on published materials to inform their thinking. These ideas are most explicitly and thoroughly tackled in the senior capstone, and so the lac k of resp onden ts with that course under their b elt renders this picture incomplete. The second panel addresses the imp ortance of ethics to a student’s data science edu- cation. Here, studen ts universal ly b eliev e that data science ethics is imp ortan t to them in their education, with most resp onding that it is “very imp ortan t.” This ﬁnding supp orts the recommendation of National Academies of Sciences, Engineering, and Medicine (2018). Finally , the third panel in Figure 4 makes plain that no studen ts feel that the inclusion of data science ethics detracts from their data science education, with most students seeing the inclusion as an enhancemen t. W e encourage data science programs contemplating adding ethical con tent to consider this p oin t particularly . That is, the resp onden ts to this survey did not see the inclusion of data science ethics as a distraction from more imp ortan t, interesting, technical, or v aluable con ten t. Rather, learning ab out data science ethics enhances that curriculum. 5.3 Self-reﬂection The faculty in SDS remains committed to our goal of developing data scientists who can assess the ethical implications of their work. A t the same time, the greater emphasis on ethics is not without its c hallenges. The emphasis on data science ethics has p ermeated our departmen tal culture in a p osi- tiv e w a y . Because w e are all attuned to the w ays in whic h ethics intersect with our teac hing, researc h, and curriculum dev elopment, discussions of ethical considerations are natural and no one feels like they alwa ys hav e to b e the one p erson to surface ethical concerns. Raising a w areness ab out data science ethics often means bringing news items in to the classro om, whic h is go o d practice that helps us stay current and connect our academic work to cur- ren t ev en ts. Seeing our studen ts put their ethical training into practice (see Section 5.1) is particularly rewarding. Man y of our students seem particularly dra wn to ethical questions in data science, and seeing the wa ys in which they are able to integrate what they learn in our classes in to their other classes, as well as their p ersonal and professional liv es, is 30 gratifying. The biggest c hallenge for facult y is na vigating our role in the ﬁeld of data science ethics. Sta ying curren t with notable ethical breac hes enables us to raise aw areness among studen ts, but do esn’t make us experts in the ﬁeld of data science ethics. Most of us do not consider data science ethics to b e among our ﬁelds of research—do we p ossess the knowledge to teac h these topics at suﬃcient depth? As noted in Section 2.5, most of us hav e little to no formal training in applied ethics—are we capable of helping students reason ab out why certain actions are ethical or unethical? F or the most part, our training in the humanities is at the undergraduate lev el—how w ell do we assess the critical thinking of our studen ts in essa y form? These questions reﬂect (what we hop e is a health y) academic tension b et ween “our careers as we imagined them in graduate school” and “our careers as we think they should b e no w.” In Section 6.1, we provide some greater institutional con text for these questions. 6 Conclusion The landscap e of data science ethics contin ues to ev olve. W e conclude with some next steps, institutional considerations, and ﬁnal though ts. 6.1 Next steps and institutional con text A cen tral challenge noted in Sections 2.5 and 5.3 is the lac k of formal training in ethics among the SDS faculty . W e are pursuing sev eral av enues to improv e the richness of our ethical curriculum, taking adv antage of our lib eral arts setting where p ossible. First, while changing the exp ertise of the facult y is a long-term pro cess, bringing con- siderations of data science ethics in to our hiring practices has already been helpful (see Section 3.2). Our most recen t ten ure-trac k hire has expertise in data ethnography that b oth broadens the scop e of the ethical questions we can presen t to students, and p erhaps more imp ortantly , signiﬁcan tly increases the depth to which studen ts can pursue their in terest in data science ethics. This p erson is oﬀering a new course in data ethnogra- ph y and a ﬁrst-year seminar (FYS 189) on intersections b etw een data and so cial justice. 31 (Elisa Raﬀaghelli (2020) discusses the relationship b etw een so cial justice and related courses on data literacy .) Second, w e are curren tly exploring potential points of in tersection b et w een SDS and the philosoph y departmen t. While the minor in ethics at Smith w as recently decommissioned (due to unreplaced retirements among facult y with sc holarly exp ertise in ethics ), those who remain ha ve increasingly directed their courses to wards applied ethics, often in the con text of data. A recen t ﬁrst-y ear seminar (FYS 105) is titled “Ethics of Big Data.” These ﬁrst-y ear seminars—whic h all en tering students are required to tak e as part of our lib eral arts curriculum and which are often in terdisciplinary—are a useful mechanism for helping studen ts—who migh t not otherwise ha ve a fully-developed in terest in data science—connect data science to larger issues in so ciety . A standalone course on data science ethics, p ossibly co-taugh t by members of b oth SDS and philosoph y , is another p ossible innov ation under discussion. Third, at the same time that the ethics program is closing, interest in ethics among scien tists is only increasing. A groups of c hemists and biologists are designing a course called “Ethics and Scientiﬁc Research.” Our colleague in computer science is teaching a course called “Resp onsible Computing” (recall the Mozilla Resp onsible Computer Science Challenge). A colleague in gov ernment teaches a course called “The Politics of Data” that discusses Zub oﬀ (2018) and coun ts tow ards the comm unication requiremen t for the SDS ma jor. The proximit y to data science-adjacen t sc holars, as w ell as philosophers trained in applied ethics, is one adv an tage of reforming curriculum in a lib eral arts environmen t such as Smith. Lev eraging these eﬀorts to impro ve our own ma jor is something we are actively exploring. Finally , more rigorous assessmen t of our eﬀorts is necessary , as we build to w ards a curricular mapping exercise (this coming y ear) and a decennial review (in three years). 6.2 P ortabilit y While all institutions are unique, w e hav e designed our ethical modules to be p ortable. An y instructor should b e able to put our mo dules into practice in their o wn classro om with a minimal amount of customization and preparation. Some of the departmental initiativ es 32 describ ed in Section 3 require money , curricular ﬂexibilit y , or exc hanges with nearb y colleges that ma y not exist at other institutions. Ho wev er, these initiatives merely implemen t the recommendations of National Academies of Sciences, Engineering, and Medicine (2018), so all institutions should b e able to use that call to action to push for greater institutional supp ort, where necessary . Lik ely the biggest obstacle to replicating our w ork at your institution is the p otentially large discrepancy b etw een the studen t culture at Smith (a selectiv e, lib eral arts college for w omen) and yours. W e note that the v ast ma jority of authors in the data ethics space happ en to b e w omen, man y of whom established researc h programs in foundational ma- c hine learning and data science b efore examining data and algorithmic ethics. Researchers pushing the b ounds of both technical and ethical considerations can b e imp ortan t role mo dels in a ﬁeld with to o man y examples of tec hnical w ork being used to main tain or exacerbate inequities. W e encourage readers to consider the p ossible b eneﬁts of including ethics in your data science curriculum, particularly in terms of retaining studen t interest from introductory courses through senior capstone courses. 6.3 Final though ts The long-term health of data science as a discipline relies on public trust. Ethical lapses, or gross indiﬀerence to ethics, has resulted in the deploymen t of data science pro ducts that are harmful to so ciet y , due to biases that we now recognize. Our studen ts are part of the generation of data scien tists who will address these issues and restore faith in data-driv en applications. In order to do this, they need to not only recognize ethical considerations as in tegral to data science, but also ha ve the abilit y to assess the ethical b eha vior of data scien tists. W e present our approach to ac hieving this in the hop es that others will em ulate and reﬁne what we hav e started. References Aust, F. & Barth, M. (2022), p ap aja: Pr ep ar e r epr o ducible AP A journal articles with R Markdown . R pac k age version 0.1.0.9997. 33 URL: https://github.c om/crsh/p ap aja Baumer, B. S. (2015), ‘A data science course for undergraduates: Thinking with data’, The A meric an Statistician 69 (4), 334–342. URL: http://dx.doi.or g/10.1080/00031305.2015.1081105 Baumer, B. S., Kaplan, D. T. & Horton, N. J. (2021), Mo dern Data Scienc e with R , 2nd edn, Chapman and Hall/CRC Press: Bo ca Raton. URL: https://www.r outle dge.c om/Mo dern-Data-Scienc e-with-R/Baumer-Kaplan- Horton/p/b o ok/9780367191498 Bender, E. M., Gebru, T., McMillan-Ma jor, A. & Shmitc hell, S. (2021), On the dangers of sto c hastic parrots: Can language mo dels b e to o big?, in ‘F AccT ’21: Proceedings of the 2021 ACM Conference on F airness, Accoun tabilit y , and T ransparency’, pp. 610–623. URL: https://doi.or g/10.1145/3442188.3445922 Benjamin, R. (2019), R ac e After T e chnolo gy: A b olitionist T o ols for the New Jim Co de , P olit y: Cam bridge. Blair, J. R., Jones, L., Leidig, P ., Murra y , S., Ra j, R. K. & Romanowski, C. J. (2021), Establishing ab et accreditation criteria for data science, in ‘Pro ceedings of the 52nd A CM T echnical Symp osium on Computer Science Education’, pp. 535–540. URL: https://doi.or g/10.1145/3408877.3432445 Blo om, B. S. et al. (1956), ‘T axonomy of educational ob jectives. v ol. 1: Cognitive domain’, New Y ork: McKay 20 (24), 1. Bruce, K. B. (2018), ‘Fiv e big op en questions in computing education’, A CM Inr o ads 9 (4), 77–80. URL: https://d l.acm.or g/citation.cfm?id=3230697 Burton, E., Goldsmith, J. & Mattei, N. (2018), ‘Ho w to teach computer ethics through science ﬁction’, Communic ations of the ACM 61 (8), 54–64. URL: https://d l.acm.or g/citation.cfm?id=3154485 34 Cai, F. (2020), ‘Y ann LeCun quits Twitter amid acrimonious exc hanges on AI bias’, Synced: AI T echnology & Industry Review. URL: https://sync e dr eview.c om/2020/06/30/yann-le cun-quits-twitter-amid- acrimonious-exchanges-on-ai-bias/ Canney , N. & Bielefeldt, A. (2015), ‘A framew ork for the developmen t of so cial resp onsi- bilit y in engineers’, International Journal of Engine ering Educ ation 31 (1B), 414–424. URL: https://dialnet.unirioja.es/servlet/articulo?c o digo=6922074 Carter, L. & Crock ett, C. (2019), An ethics curriculum for cs with ﬂexibilit y and con tin uity , in ‘2019 IEEE F ron tiers in Education Conference (FIE)’, IEEE, pp. 1–9. URL: https://doi.or g/10.1109/FIE43999.2019.9028356 Chivukula, S. S., Li, Z., Pivonk a, A. C., Chen, J. & Gray , C. M. (2021), ‘Surv eying the landscap e of ethics-focused design metho ds’, arXiv pr eprint arXiv:2102.08909 . URL: https://arxiv.or g/abs/2102.08909 Committee on Professional Ethics (2018 a ), A CM Co de of Ethics and Pr ofessional Conduct , Asso ciation for Computing Mac hinery , Inc. URL: https://www.acm.or g/binaries/c ontent/assets/ab out/acm-c o de-of-ethics- b o oklet.p df Committee on Professional Ethics (2018 b ), Ethical guidelines for statistical practice, T ech- nical rep ort, American Statistical Asso ciation. URL: http://www.amstat.or g/asa/ﬁles/p dfs/Ethic alGuidelines.p df Committee on Science, Engineering, and Public P olicy (2009), On b eing a scientist: a guide to r esp onsible c onduct in r ese ar ch , 3 edn, W ashington, DC: National Academies Press. URL: https://www.ncbi.nlm.nih.gov/pubme d/25009901 Con w ay Cen ter for Innov ation and En trepreneurship (2019), ‘One data scientist’s exp eri- ence innov ating at [company]’, The Jill Ker Conw a y Innov ation and En trepreneurship Cen ter. Article is no longer av ailable online. 35 Da vies, H. (2015), ‘T ed Cruz campaign using ﬁrm that harvested data on millions of un- witting F aceb o ok users’, The Guardian. URL: https://www.the guar dian.c om/us-news/2015/de c/11/senator-te d-cruz-pr esident- c amp aign-fac eb o ok-user-data D’Ignazio, C. & Klein, L. F. (2020), Data F eminism , Boston: MIT Press. URL: https://mitpr ess.mit.e du/b o oks/data-feminism Donoho, D. (2017), ‘50 y ears of data science’, Journal of Computational and Gr aphic al Statistics 26 (4). URL: https://amstat.tandfonline.c om/doi/ful l/10.1080/10618600.2017.1384734 Dw ork, C., McSherry , F., Nissim, K. & Smith, A. (2006), Calibrating noise to sensitivit y in priv ate data analysis, in S. Halevi & T. Rabin, eds, ‘Theory of cryptography’, Springer, pp. 265–284. URL: https://link.springer.c om/chapter/10.1007/11681878 14 Edmondson, A. (1999), ‘Psyc hological safety and learning behavior in work teams’, A d- ministr ative scienc e quarterly 44 (2), 350–383. URL: https://doi.or g/10.2307/2666999 Elisa Raﬀaghelli, J. (2020), ‘Is data literacy a catalyst of so cial justice? a resp onse from nine data literacy initiatives in higher education’, Educ ation Scienc es 10 (9), 233. URL: https://doi.or g/10.3390/e ducsci10090233 Elliott, A. C., Stokes, S. L. & Cao, J. (2018), ‘T eac hing ethics in a statistics curriculum with a cross-cultural emphasis’, The A meric an Statistician 72 (4), 359–367. URL: https://www.tandfonline.c om/doi/abs/10.1080/00031305.2017.1307140 Eubanks, V. (2018), A utomating ine quality: How high-te ch to ols pr oﬁle, p olic e, and punish the p o or , St. Martin’s Press. Europ ean Parliamen t (2018), Regulation on the protection of natural p ersons with regard to the pro cessing of p ersonal data and on the free mo v emen t of suc h data, and re- 36 p ealing directive 95/46/ec (data protection directiv e), T echnical rep ort, European Union. URL: https://eur-lex.eur op a.eu/le gal-c ontent/EN/TXT/?uri=CELEX%3A32016R0679 Fiesler, C., Garrett, N. & Beard, N. (2020), What do w e teac h when w e teach tech ethics? a syllabi analysis, in ‘Pro ceedings of the 51st ACM T ec hnical Symp osium on Computer Science Education’, pp. 289–295. URL: https://d l.acm.or g/doi/10.1145/3328778.3366825 Fitzpatric k, J. (2010), ‘If y ou’re not pa ying for it; you’re the pro duct’. URL: https://lifehacker.c om/if-your e-not-p aying-for-it-your e-the-pr o duct-5697167 Floridi, L. & T addeo, M. (2016), ‘What is data ethics?’, Philosophic al T r ansactions of the R oyal So ciety A . URL: http://dx.doi.or g/10.1098/rsta.2016.0360 F ry , H. (2018), Hel lo world: Being human in the age of algorithms , New Y ork: WW Norton & Company . URL: https://wwnorton.c om/b o oks/Hel lo-World Gebru, T. (2020), Race and gender, in M. D. Dubb er, F. P asquale & S. Das, eds, ‘The Oxford Handb o ok of Ethics of AI’, Oxford Univ ersity Press, pp. 251–269. URL: https://www.oxfor dhandb o oks.c om/view/10.1093/oxfor dhb/9780190067397.001.0001/oxfor dhb- 9780190067397-e-16 Gershk oﬀ, A., Therriault, A., Sat y anaray an, A., Jones, B., Burg, B., Hurt, B., Granger, B., Jacob, B., Doig, C., F ryar, C., Ramanan, D., Bharga v a, D., Perez, F., Greenleigh, I., F eng, J., Loy ens, J., Morgan, J., Ram, K., Green, L., Barba, L., Colaco, M., Ro cklin, M., Jamei, M., Horn, M., Harris, N. E., Elprin, N., Kaldero, N., Chopra, N., McGarry , P ., T o dk ar, R., Jurney , R., Brener, S., Couture, T., Thib odeaux, T. & McKinney , W. (2019), ‘Data v alues and principles’. URL: https://datapr actic es.or g/manifesto/ Giroux, M. E., Coburn, P . I., Connolly , D. A. & Bernstein, D. M. (2016), P ersp ective-taking abilities across the lifespan: A review of hindsigh t bias and theory of mind, in M. E. 37 T oplak & J. W eller, eds, ‘Individual Diﬀerences in Judgemen t and Decision-Making’, 1 edn, London: Psyc hology Press, pp. 157–175. URL: https://doi.or g/10.4324/9781315636535 Gotterbarn, D., W olf, M. J., Flick, C. & Miller, K. (2018), ‘Thinking professionally: The con tin ual ev olution of interest in computing ethics’, ACM Inr o ads 9 (2), 10–12. URL: https://d l.acm.or g/citation.cfm?id=3204466 Grosz, B. J., Gran t, D. G., V reden burgh, K., Behrends, J., Hu, L., Simmons, A. & W aldo, J. (2019), ‘Em b edded EthiCS: In tegrating ethics across CS education’, Commun. ACM 62 (8), 54–61. URL: https://doi.or g/10.1145/3330794 Gunaratna, N. S. & T ractenberg, R. E. (2016), Ethical reasoning with the 2016 revised ASA ethical guidelines for statistical practice, in ‘Pro ceedings of the Joint Statistical Meetings’, American Statistical Asso ciation. URL: https://www.r ese ar chgate.net/public ation/313309250 Ethic al R e asoning with the 2016 R evise d A SA Ethic al Guidelines for Statistic al Pr actic e Hand, D. J. (2018), ‘Asp ects of data ethics in a changing world: where are w e no w?’, Big Data 6 (3), 176–190. URL: https://doi.or g/10.1089/big.2018.0083 Hardin, J., Ho erl, R., Horton, N. J., Nolan, D., Baumer, B. S., Hall-Holt, O., Murrell, P ., Peng, R., Roback, P ., T emple Lang, D. & W ard, M. D. (2015), ‘Data science in statistics curricula: Preparing students to ’think with data”, The Americ an Statistician 69 (4), 343–353. URL: https://doi.or g/10.1080/00031305.2015.1077729 Heggeseth, B. (2019), ‘In tert wining data ethics in intro stats’, Symp osium on Data Science and Statistics. URL: https://drive.go o gle.c om/ﬁle/d/1GXzVMpb6GVNfWPS6b d9jggtqq1C77Wsc/view Hic ks, S. C. & Irizarry , R. A. (2018), ‘A guide to teaching data science’, The Americ an Statistician 72 (4), 382–391. URL: https://amstat.tandfonline.c om/doi/abs/10.1080/00031305.2017.1356747 38 Hoﬀmann, A. L. & Cross, K. A. (2021), ‘T eac hing data ethics: F oundations and possibilities from engineering and computer science ethics education’. URL: http://hd l.hand le.net/1773/46921 Huﬀ, D. (1954), How to lie with statistics , WW Norton & Company , Inc. James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013), A n Intr o duction to Statistic al L e arning: with Applic ations in R , Springer. URL: https://faculty.marshal l.usc.e du/gar eth-james/ISL/ James, G., Witten, D., Hastie, T. & Tibshirani, R. (2021), ISLR: Data for an Intr o duction to Statistic al L e arning with Applic ations in R . R pack age version 1.4. URL: https://www.statle arning.c om James, G., Witten, D., Hastie, T. & Tibshirani, R. (2022), ISLR2: Intr o duction to Statis- tic al L e arning, Se c ond Edition . R pack age v ersion 1.3-1. URL: https://www.statle arning.c om Kan ta yya, S. & Buolamwini, J. (2020), ‘Co ded bias’, 7th Empire Media. URL: https://www.c o de dbias.c om/ Kaplan, D. (2018), ‘T eaching stats for data science’, The Americ an Statistician 72 (1), 89– 96. URL: https://amstat.tandfonline.c om/doi/ful l/10.1080/00031305.2017.1398107 Kerr, N. L. (1998), ‘Harking: Hyp othesizing after the results are known’, Personality and So cial Psycholo gy R eview 2 (3), 196–217. URL: https://doi.or g/10.1207/s15327957pspr0203 4 Kim, A. Y. & Escob edo-Land, A. (2015), ‘OK Cupid data for introductory statistics and data science courses’, Journal of Statistics Educ ation 23 (2). URL: https://amstat.tandfonline.c om/doi/abs/10.1080/10691898.2015.11889737 Kim, A. Y. & Escob edo-Land, A. (2021), ‘Correction to OkCupid data for in tro ductory statistics and data science courses’, Journal of Statistics and Data Scienc e Educ ation 39 29 (2), 216–216. URL: https://doi.or g/10.1080/26939169.2021.1924516 Kirk egaard, E. O. & Bjerrekær, J. D. (2016), ‘The OK Cupid dataset: A v ery large public dataset of dating site users’, Op en Diﬀer ential Psycholo gy 46 . Kolaczyk, E. D. & Cs´ ardi, G. (2014), Statistic al analysis of network data with R , V ol. 65, Springer. Kramer, A. D. I., Guillory , J. E. & Hanco ck, J. T. (2014), ‘Exp erimental evidence of massiv e-scale emotional contagion through so cial net w orks’, Pr o c e e dings of the National A c ademy of Scienc es 111 (24), 8788–8790. URL: https://www.pnas.or g/c ontent/111/24/8788 Langkjær-Bain, R. (2017), ‘T rials of a statistician’, Signiﬁc anc e 14 (4), 14–19. URL: https://rss.onlinelibr ary.wiley.c om/doi/ful l/10.1111/j.1740-9713.2017.01052.x Levin, S. (2017), ‘Mark Zuc kerberg: I regret ridiculing fears ov er F aceb o ok’s eﬀect on election’, The Guardian. URL: https://www.the guar dian.c om/te chnolo gy/2017/sep/27/mark-zuckerb er g- fac eb o ok-2016-ele ction-fake-news Loukides, M., Mason, H. & Patil, D. (2018 a ), Ethics and data scienc e , Sebastop ol, CA: O’Reilly Media. URL: https://www.or eil ly.c om/libr ary/view/ethics-and-data/9781492043898/ Loukides, M., Mason, H. & Pa til, D. (2018 b ), ‘Of oaths and c hecklists’, O’Reilly . URL: https://www.or eil ly.c om/r adar/of-o aths-and-che cklists/ Mey er, R. (2014), ‘Ev erything we know ab out F aceb o ok’s secret mo o d manipulation exp erimen t’, The A tlan tic. URL: https://www.the atlantic.c om/te chnolo gy/ar chive/2014/06/everything-we-know- ab out-fac eb o oks-se cr et-mo o d-manipulation-exp eriment/373648/ Milner, Y. (2019), ‘Data for Blac k Liv es I I’. URL: http://d4bl.or g/c onfer enc e.html 40 Murado v a, L. (2021), ‘Seeing the other side? Persp ective-taking and reﬂectiv e p olitical judgemen ts in in terp ersonal delib eration’, Politic al Studies 69 (3), 644–664. URL: https://doi.or g/10.1177/0032321720916605 National Academies of Sciences, Engineering, and Medicine (2018), Data scienc e for un- der gr aduates: opp ortunities and options , National Academies Press. URL: http://sites.nationalac ademies.or g/cstb/curr entpr oje cts/cstb 175246 National Commission for the Protection of Human Sub jects of Biomedical and Behavioral Researc h (1978), The Belmon t report: Ethical principles and guidelines for the protection of h uman sub jects of researc h, T ec hnical Rep ort 0012, Departmen t of Health, Education, and W elfare. URL: https://vide o c ast.nih.gov/p df/ohrp b elmont r ep ort.p df Neﬀ, G., T an w eer, A., Fiore-Gartland, B. & Osburn, L. (2017), ‘Critique and contribute: A practice-based framework for impro ving critical data studies and data science’, Big Data 5 (2), 85–97. URL: https://www.lieb ertpub.c om/doi/ful l/10.1089/big.2016.0050 Noble, S. U. (2018), A lgorithms of Oppr ession: How Se ar ch Engines R einfor c e R acism , NYU Press. URL: http://www.jstor.or g/stable/j.ctt1pwt9w5 O’Neil, C. (2016), We ap ons of math destruction: How big data incr e ases ine quality and thr e atens demo cr acy , New Y ork: Cro wn. URL: https://we ap onsofmathdestructionb o ok.c om/ Pierson, S. & Wilkinson, L. (2021), ‘Asa, international communit y contin ue to decry geor- giou p ersecution’, AMST A T News . URL: https://magazine.amstat.or g/blo g/2021/05/01/asa-de cries-ge or giou-p erse cution/ P oulsen, K. (2014), ‘How a math genius hack ed OK Cupid to ﬁnd true lov e’. URL: https://www.wir e d.c om/2014/01/how-to-hack-okcupid/ 41 R Core T eam (2021), R: A L anguage and Envir onment for Statistic al Computing , R F oun- dation for Statistical Computing, Vienna, Austria. URL: https://www.R-pr oje ct.or g/ Rosen b erg, M., Confessore, N. & Cadw alladr, C. (2018), ‘Ho w T rump consultan ts exploited the F aceb o ok data of millions’, The New Y ork Times. URL: https://www.nytimes.c om/2018/03/17/us/p olitics/c ambridge-analytic a-trump- c amp aign.html Saltz, J., Skirpan, M., Fiesler, C., Gorelic k, M., Y eh, T., Hec kman, R., Dewar, N. & Beard, N. (2019), ‘In tegrating ethics within mac hine learning courses’, A CM T r ansactions on Computing Educ ation (TOCE) 19 (4), 1–26. URL: https://d l.acm.or g/doi/10.1145/3341164 Sc hlenk er, L. (2019), ‘The ethics of data science*’, T ow ards Data Science. URL: https://towar dsdatascienc e.c om/the-ethics-of-data-scienc e-e3b1828aﬀa2 Shapiro, B. R., Meng, A., O’Donnell, C., Lou, C., Zhao, E., Dankw a, B. & Hostetler, A. (2020), Re-shap e: A metho d to teach data ethics for data science education, in ‘Pro ceedings of the 2020 CHI Conference on Human F actors in Computing Systems’, pp. 1–13. URL: https://doi.or g/10.1145/3313831.3376251 Sho ok, N. J. & F azio, R. H. (2008), ‘In terracial ro ommate relationships: An exp erimen tal ﬁeld test of the con tact h yp othesis’, Psycholo gic al Scienc e 19 (7), 717–723. URL: https://doi.or g/10.1111/j.1467-9280.2008.02147.x Skirpan, M., Beard, N., Bhaduri, S., Fiesler, C. & Y eh, T. (2018), Ethics education in con text: A case study of no vel ethics activities for the CS classro om, in ‘Pro ceedings of the 49th A CM T echnical Symp osium on Computer Science Education’, pp. 940–945. URL: https://d l.acm.or g/doi/10.1145/3159450.3159573 Solo w, B. (2021), ‘Quinn White ’23: Studying trends through an ethical lens’, Gr ´ ecourt Gate. 42 URL: https://www.smith.e du/news/quinn-white-23-studying-tr ends-thr ough-an-ethic al- lens Sw eeney , L. (2002), ‘k-anon ymit y: A mo del for protecting priv acy’, International Journal of Unc ertainty, F uzziness and Know le dge-Base d Systems 10 (5), 557–570. URL: https://www.worldscientiﬁc.c om/doi/abs/10.1142/S0218488502001648 T arran, B. (2019), ‘German commission calls for risk-based regulation of algorithmic sys- tems’, Signiﬁc anc e 16 (6), 4–5. URL: https://doi.or g/10.1111/j.1740-9713.2019.01329.x T ractenberg, R. E. (2019 a ), ‘Strengthening the practice and profession of statistics and data science using ethical guidelines’. URL: https://osf.io/pr eprints/so c arxiv/93wuk T ractenberg, R. E. (2019 b ), ‘T eaching and learning ab out ethical practice: The case anal- ysis’. URL: https://osf.io/pr eprints/so c arxiv/58umw/downlo ad Utts, J. (2021), ‘Enhancing data science ethics through statistical education and practice’, International Statistic al R eview 89 (1), 1–17. URL: https://doi.or g/10.1111/insr.12446 V akil, S. (2018), ‘Ethics, identit y , and p olitical vision: T o w ard a justice-cen tered approach to equity in computer science education’, Harvar d Educ ational R eview 88 (1), 26–52. URL: https://doi.or g/10.17763/1943-5045-88.1.26 W ang, M. Q., Y an, A. F. & Katz, R. V. (2018), ‘Researc her requests for inappropriate analysis and rep orting: A US surv ey of consulting biostatisticians’, A nnals of Internal Me dicine 169 (8), 554–558. URL: https://doi.or g/10.7326/M18-1230 W ashington, A. L. & Kuo, R. (2020), Whose side are ethics codes on? p o wer, responsibility and the so cial go o d, in ‘Pro ceedings of the 2020 Conference on F airness, Accountabilit y , 43 and T ransparency’, pp. 230–240. URL: https://d l.acm.or g/doi/abs/10.1145/3351095.3372844 W asserstein, R. L., Lazar, N. A. et al. (2016), ‘The ASA’s statement on p-v alues: con text, pro cess, and purpose’, The Americ an Statistician 70 (2), 129–133. URL: https://doi.or g/10.1080/00031305.2016.1154108 W asserstein, R. L., Schirm, A. L. & Lazar, N. A. (2019), ‘Mo ving to a wor ld b ey ond “ p < 0 . 05”’, The Americ an Statistician 73 (sup1), 1–19. URL: https://doi.or g/10.1080/00031305.2019.1583913 W ender, B. & Klo efkorn, T. (2017), ‘Roundtable on data science p ostsecondary education, Meeting #5 highligh ts’, The National Academies of Sciences, Engineering, and Medicine. URL: https://www.nationalac ademies.or g/event/12-08-2017/do cs/D8EE65EFC7F4B0C368D267ED AD 10 E5A B 1BAFBE3369D2 Wic kham, H., Chang, W., Henry , L., Pedersen, T. L., T ak ahashi, K., Wilke, C., W o o, K., Y utani, H. & Dunnington, D. (2021), ggplot2: Cr e ate Ele gant Data Visualisations Using the Gr ammar of Gr aphics . R pac k age version 3.3.5. URL: https://CRAN.R-pr oje ct.or g/p ackage=ggplot2 Xiao, T. & Ma, Y. (2021), ‘A letter to the Journal of Statistics and Data Science Education—a call for review of “OkCupid data for introductory statistics and data sci- ence courses” by Alb ert Y. Kim and Adriana Escob edo-Land’, Journal of Statistics and Data Scienc e Educ ation pp. 1–2. URL: https://doi.or g/10.1080/26939169.2021.1930812 Zimmer, M. (2010), ‘”but the data is already public”: on the ethics of researc h in F aceb o ok’, Ethics and information te chnolo gy 12 (4), 313–325. URL: https://link.springer.c om/article/10.1007/s10676-010-9227-5 Zub oﬀ, S. (2018), The A ge of Surveil lanc e Capitalism: The Fight for a Human F utur e at the New F r ontier of Power , 1 edn, New Y ork: PublicAﬀairs. 44

Integrating data science ethics into an undergraduate major: A case study

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment