The benefits of artificial intelligence (AI) human partnerships-evaluating how AI agents enhance expert human performance-are increasingly studied. Though rarely evaluated in healthcare, an inverse approach is possible: AI benefiting from the support of an expert human agent. Here, we investigate both human-AI clinical partnership paradigms in the magnetic resonance imaging-guided characterisation of patients with brain tumours. We reveal that human-AI partnerships improve accuracy and metacognitive ability not only for radiologists supported by AI, but also for AI agents supported by radiologists. Moreover, the greatest patient benefit was evident with an AI agent supported by a human one. Synergistic improvements in agent accuracy, metacognitive performance, and inter-rater agreement suggest that AI can create more capable, confident, and consistent clinical agents, whether human or model-based. Our work suggests that the maximal value of AI in healthcare could emerge not from replacing human intelligence, but from AI agents that routinely leverage and amplify it.
Integrating artificial intelligence (AI) systems into clinical practice represents one of the most promising technological advances for modern medicine [1][2][3][4][5][6][7] . But, whereas a great deal of initial research and development has focused on tools that might ultimately reduce, or even replace, clinician involvement in a patient's care [8][9][10][11][12][13] , research demonstrating AI's capacity to enhance existing practice through AI-agent and human-agent partnerships is now actively growing 3,4,6,[14][15][16][17][18][19] . Whilst many are illuminating in their field-advancing discoveries, it is notable that all of these studies adopt a unidirectional approach to human-AI partnership, namely, the human agent's performance gains in a clinical task when supported by an AI agent 3,4,6,14,[17][18][19] . For example, Wu et al. recently conducted a randomised controlled trial comparing diagnostic accuracy among ophthalmologists with and without AI support, reporting superior accuracy when the model was used 17 .
Although rarely evaluated in healthcare, an inverse paradigm is possible, where AI performance is guided by a human expert. (Extended Data Fig. 1). Similarities could be drawn from autopilot systems in both the aviation and car industries [20][21][22] . A vehicle may have driverassistance features, such as lane-departure warnings, that activate when its human operator steers beyond a lane’s boundaries. Alternatively, a vehicle may have autopilot systems that entirely control its path. Yet, it still contains a yoke/steering wheel, should human support of the pilot/driver be required (Extended Data Fig. 2). Providing the best possible care for the individual patient should always be our primary concern 23 , but since this alternative human-AI partnership paradigm is understudied, we remain in the dark about whether the greatest patient benefits may come from clinical human agents supported by AI, or rather, from AI agents supported by human ones.
Our task is to evaluate these two alternative paradigms in a challenging healthcare setting. We examine the specialist evaluation of brain tumour imaging data, evaluating the benefits not only for the patient but also for the practitioner and healthcare provider. Brain tumours are heterogeneous in appearance, varying widely in size, shape, signal characteristics, and location throughout the brain, as assessed with magnetic resonance imaging (MRI) [24][25][26][27][28] . The MRI assessment will typically include (though it is highly centre-dependent) a variety of structural imaging sequences, including T1-weighted, T2-weighted, Fluid-Attenuated Inversion Recovery (FLAIR)-weighted, and post-contrast T1-weighted sequences 29 (Extended Data Fig. 3). Post-contrast imaging (mostly a post-contrast T1-weighted sequence) is acquired following the intravenous injection of gadolinium contrast medium. The post-contrast T1 often, though not always, yields particularly informative and clinically actionable data, such as for the neurosurgeon planning and undertaking resection 30,31 or the oncology team tracing the lesion for stereotactic radiotherapy 32 . Nonetheless, it is not always desirable (or feasible) to administer gadolinium. This may be due to contraindications such as allergy or renal impairment, or it may simply be less desirable, particularly in patients who undergo frequent follow-up imaging, including children [33][34][35][36][37][38][39] .
Requiring a radiologist to decide whether to administer gadolinium and acquire post-contrast sequences is, however, a challenging task. Definitionally, it requires a best guess (or perhaps more appropriately labelled, a ‘gamble’) to be made by the clinician for whether, given the noncontrast imaging sequences, added clinical benefit will be gleaned with additional postcontrast data 40,41 . It should come as no surprise that guesswork and gambles should be avoided at all costs in healthcare 23 , and especially in high-stakes settings such as neurooncology 25 , where calculated data-driven decision-making ought always to be preferred. This naturally raises the possibility that AI-enabled systems might help here. Multiple groups have recently demonstrated that deep learning models can reasonably predict whether a patient’s brain tumour scan will contain enhancing disease (typically seen only with post-contrast imaging) when reviewing the non-contrast sequences alone 24,[42][43][44][45][46][47] .
Here, we investigate the utility of such a tool in a multi-case, multi-reader, randomised crossover study of expert agent performance among expert radiologists who ordinarily provide frontline care across some of the UK’s largest and most specialist hospitals. In the largest known study of its kind, spanning ten datasets, four countries, and five neuro-oncology disease categories, we evaluate both human agents’, in the form of board-certified experienced radiologists, and an AI agent’s ability to predict whether a patient’s MR
This content is AI-processed based on open access ArXiv data.