A dual mode adaptive basal-bolus advisor based on reinforcement learning

JBHI-00 967-2018.R1 1 Abs tra ct — Self-monitoring of blood glucose (SMBG) and continuous glucose monitoring (CGM) are comm only used by type 1 diabetes (T1D) patients to measure glucose concentrations. The proposed adaptive basal-bolus algorithm (ABB A) supports inputs from either SM BG or CGM devices to provide personalised su ggestions for the daily basa l rate and prandial insulin do ses on the basis of the patients’ g lucose level o n the previous day. The ABB A is based o n reinforcement learning (RL), a type of artificial intelligence, and was va lidated in silico with a n FDA- accepted popu lat ion of 100 ad ults under different realistic scenarios lasting three si mula ted months. The sce narios involve three main meals and one bedtime snack per day, alo ng with different variabilities and uncertainties for insulin sensitivity, m ealtime, carbohydrate a mount, and glucose measurement ti me. T he resu lts indicate that the proposed approach achieves comparable performance with CGM or SMBG as input signals, without influencing the tot al daily insulin dose. The results are a pro mising indication that AI algorithmic approaches can provide personalised adaptive ins ulin optimisation an d a chieve glucose co ntrol - indep endently of the type of glucose monitoring technolo gy. Index Terms — Diab etes, insulin trea tment personalisation, reinforcement learning, artificial inte lligence, adaptive system I. I NTRODU CTION OST cases of diab etes can be broadly classified as t ype 1, whe re n o ins uli n i s s ec re te d d ue to d e str uc tio n of the Manuscript re ceived August 30, 2018 This research was carried out within the framewo rk of the MyTreat research and d evel opment pro ject, supported by the Swiss Commi ssion of Technolo gy and Innovation (CTI ) under G rant 18172.1 PFL S- LS. Q. Sun is with th e w ith the ARTORG Center for B iomedical E ngineering Research, Unive rsity of Bern, 3008 Bern, Switzerland (e -mail: qingnan.sun@artorg .unibe.ch). M. Jankovic i s with t he ART ORG Center for Biomedical E ngineering Research, University of Bern, 3008 Bern, Switzerland, and the Department of Emerge ncy Medicine, Bern Uni versity Hospital “I nselspital”, 3010 Bern, Switzerl and (e- ma il: marko.jankovic @artorg.unibe.ch ). J. Budzinski is with the Debiotech S.A ., 1004 Lausanne, Switzerland (e - mail: j.budzinski@ debiotech.com ). B. Moore is with the St. Mary 's University, School of Science, Engineering, and T echnology , San Antonio, T exas, United State s of Ame rica (e -mail: brett.moor e@gmail.com ). P. Die m is w ith the I nsel G ruppe AG , Freiburgstrasse 18, 3010 Bern, Switzerl and (email: peter.diem@insel .ch). C. Stettler is with the Division of Diabetes, Endocrinology, Clinical Nutrition and Metabolism, Bern University Hospital “I nselspital”, 3010 Be r n, Switzerl and (e-mail: christoph.stettle r@insel.ch ). S. Mougiakakou* is with the ART ORG Center fo r Biomedical Enginee ring Research, Unive rsity of Bern, 3008 Bern Switzerland, and the Division of Diabetes, Endocrinolo gy, Clinical Nutrition and Metabolism, Bern University Hospital “I nselspital” (e -mail: stavro ula.mougiakakou@ar torg.unibe.ch ). pancreatic beta cells, or t ype 2, w here ei ther the panc reas d oes not produce enoug h insulin or the bo dy does not e ffectively use the insulin produced. The main goal of diabetes manage ment is to maintain glucose levels within a healthy range, and t his ob jective may require glucose monitoring and insulin therap y via p umps or injections. Diabetes patie nts ma y u se two different approaches for insulin management: dev ices for the self -monitor ing of blood glucose (SMBG) , which meas ure glucose w ith one dr op of finger blood several times during the day, or continuous gl ucose monitoring (CGM) systems, which use a subcutaneous m iniaturised sensor to m easure glucose levels every few mi nutes. In t he case of adults with diabetes type 1 using SMB G, it is recommended to test glucose levels at least four times a d ay, i.e. before ea ch meal and before going to b ed [1]. Relatively fe w diabetic patients use CGMs [2] , although this is expected to increase world wide in the near future. Insulin pumps deliver basal a nd bolus insulin. Basal insuli n maintains glucose concentrat ion at consistent levels duri ng periods of f asting, while bolus ins ulin compensates for the effects o f m eal intake. Basal i nsulin is usuall y adj usted b y the attending p hysician a fter reviewing the p atient’s glucose records, w hile the bolus dose is calc ulated using a bolus advisor. B olus a dvisors use simple algorithms to e stimate the insulin dose on the basis of the carbohydrate ( CHO) content o f the meal, the c urrent blood glucose concentration (usually driven by an SMBG d evice), the patie nt’s personal settin gs (e.g. correction factor, insu lin- to -carbohydrate ratio - CIR), and the insulin on boar d [ 3]. As insuli n se nsitivity changes during the da y, CIR and b asal rate (BR) should be upd ated over tim e. Since the daily activities of diabetic p atients tend to be repetitive, e.g. with respect to meal timing, meal amount etc., Owen et al. [4] incorporated a Run- to -Run ( R2R) algorithm into a controller, which led to a more advanced b olus advisor. The advisor updated the bolus dail y, using two po stprandial SM BG measurements at 60 min and 90 min after the star t of the respective meal. T he advisor was cli nically evaluated a nd gave promising results [5] . It has bee n propo sed that bo lus insulin could be estimated fro m CGM data, as supported by ca se - based reasoning (CB R) and R 2R [ 6] , [7] . The clinical safety of the algorithm has been presented in a sin gle-ar m pilot study [8]. T he method has bee n extend ed to adj ust the BR [9]. The in silico results indicated that this approach is o f potential value. The ad aptation o f BR was initially i nvestigated by Paler m et A dual mode adaptive basal - bolus advisor based on reinforcement learning Qingnan Sun, Mark o V. Jankov ic, João Budzinski, Brett Moo re, Peter Diem , Christoph Stettler, Stavroula G. Moug iakakou, Membe r IEEE M JBHI-00 967-2018.R1 2 Fig. 2 . Main me als and the corre sponding CI Rs. al . [5] , using an R2R approach similar to th e one p resented in [4] and includin g five prop erly timed SMBG mea surements. More recently, T offanin et a l . [10 ] adjusted the dail y basal therapy using a number of well-establis hed clinical in dice s, e.g. as d erived from CGM data, and an R2R algorithm. The algorithm performed well i n an in silico diabetic population. The ad aptation o f BR a nd/or C IR h as also been p roposed within t he fram ework of an artificial pa ncreas (AP). The AP provides an autono mous option for controlled i nsulin treatment - b y combining an insulin p ump, a CGM, and a control algorithm. Pro portional – integral – derivat ive co ntrollers (PIDs), model predictive contr ollers (MPCs), and fuzzy logic (FL) methods ha ve been trad itionally emplo yed for clinicall y validated A Ps [11]. The MP C algorithm may, f or example, be tuned by employing a n R2R ap proach. T his adapts t he BR during t he night and the CIR during the day [10] , [ 12]. An R2R approach, together with CBR, was used within a closed - loop controller to adapt the CIR [13]. The in silico results were promising, but a cli nical trial is needed for confirmation. To address the challenges related to the inter- and i ntra- patient variabilities and achieve per sonalisation of the insulin treatment, reinforcement lear ning (RL) has been introduced [14]. RL is a branch of machine learni ng (ML) that allows systems to d evelop self-learning abilities and thus to i nteract within u ncertain environ ments. Moore et al . [15] introduced RL for opti mal control of propo fol-induced h ypnosis. In a subsequent study in healthy human vol unteers, the RL a gent demonstrated clinically appropriate perform ance [1 6] , [17]. In [18], a model-free RL -based control al gorithm was implemented and validated in silico for its ab ility to deal with inter- and intra-patie nt variability and en vironmental uncertainties. The diab etic population in this study wore CGM and was treated with an insulin pump. The algorit hm updated the BR a nd CI R eac h day o n t he basis of t he patie nts’ gluco se level the day before. The algorithm’s tuning was personalised and automatically based on the transfer entropy (TE) from insulin to glucose signals [19]. The research p resented here is a continuation of our previous studies [14] , [18] and targets the entire insulin pump population of adults with diab etes T ype 1, independently of the technology u sed for gl ucose monitoring. T he algorithm allows daily adj ustment of the ins ulin i nfusion p rofile to compensate for fluctuatio n in the patient’ s glucose level. Information from SMBG or CGM provides input to the algorithm, whic h outp uts the daily B R and three CIRs per day – o ne value for each of the three main meals. The self -learning approach is adap table and personalises the dail y in sulin values to ensure glucose control, despite the inter - and intra-pati ent variabilities. The ap proach is data -driven, real-ti me and of lo w computational cost. T o v alidate the newly introduced algorithm, an FDA -approved d iabetes simulator was used. II. M ETHODOLOGY The structu re of the pr oposed dual mode adap tive b asal- bolus advisor (ABB A) - along with its input s and outputs - is illustrated in F ig. 1. Each d ay, ABBA p rovides one constant BR and three CIRs. Laimer et al. [2 0] analysed the BR profiles of 3118 fe male and 24 27 male patients, and co ncluded that BR p rofiles with higher variability are associated with an increased frequency of acute complications in adult s with diabetes Type 1. The study considered the dawn phenomen on as a factor influe ncing intra -day variabilit y in insulin sensitivity, while t he effect of intensive ph ysical e xercises was not take n into account. Further more, Bouchonville et al. [21 ] found that - for patients with insulin pumps - changing the basal rate in the ea rly morning could not reduce the influence of the d awn phenomenon, but increased the risk of hypoglycaemia. Thus, in this stu dy, the BR was con sidered as constant within a single da y. To address the intra -day variat ion in i nsulin sensitivity (SI) d uring dif ferent meal timings, three different CIRs for breakfast, lunch a nd din ner were consider ed (Fig. 2 ). ABBA e mploys the Actor-Critic ( AC) method , a branch o f RL, for updatin g BR a nd CI Rs. The parameters of t he act or - only method are directly estimated by simulation, and are updated in the direction of i mprovement. Critic -onl y methods rely excl usively o n approximation o f v alue function and ai m to learn an approximate so lution to the Bellman equatio n, which will then hopefull y prescribe a near -optimal policy [22] . The AC m ethod was selected b ecause it co mbines the strong points of the acto r-only and critic-only met hods. In comparison with the crit ic-only method, for which convergence is guaranteed in li mited settings, the AC metho d may converge i n wider settings. On the other hand, it c an achieve more rapid convergence th an actor -only methods. In the next section, we w ill give a brief introduction to the AC method. Fig. 1. Structure of ABBA w ith inputs and outp uts. JBHI-00 967-2018.R1 3 A. Actor-Critic (AC) method The cr itic uses an approximation architecture and si mulation to learn a value functio n, which is then used to update the actor's polic y para meters i n the directio n of perfor mance improvement [22]. The AC method w as introduced to minimise the average co st function   as d efined by:   󰇛 󰇜     󰇛  󰇜  󰇛  󰇜    , (1 ) where  󰇛  󰇜 is local cost,   󰇛  󰇜 is the stationary probability of the Marko v chain 󰇝      󰇞   is the state, and  is control action. The critic agent e valuates the current control policy through the approximation of the long- term expected cost. T he critic provides temporal d ifference ( TD) error to the Actor for polic y optimisation. T he value function is defined b y   󰇛 󰇜    󰇟     󰇛      󰇜       󰇠 , (2 ) which can be formalised as:   󰇛  󰇜   󰇛      󰇜     󰇛  󰇜 , (3 ) where  is a discount f actor in the ran ge   󰇛󰇜 ,  is the next state       Linear approximation was used for th e parameterised function:     󰇛  󰇜        󰇛  󰇜      󰇛  󰇜    , (4 ) where   is the transpose o f the para meter vector  and   󰇛 󰇜 is a vector of basis function. T he esti mation of T D er ror  can then be d efined as:    󰇛    󰇜       󰇛  󰇜      󰇛  󰇜 . (5 ) The parameter vector  is updat ed with the T D error:              , (6 ) where   is a positive non-incr easing learning rate sequence and   is the eligibility vector updated acco rding to:          󰇛   󰇜 . (7 ) The update for approximation o f t he actio n -value function follows a similar appro ach. The actor agent ai ms to optimise the control polic y in o rder to achieve the final goal of t he AC meth od, i.e. to m inimise the avera ge cost functio n   show n in equatio n (1). T he policy gradient method is e mployed for this purpose:             󰇛 󰇜 , ( 8 ) where   is learning rate and     󰇛 󰇜 the gradient o f   󰇛󰇜 with respect to the policy para meter vector   as c alculated by:     󰇛  󰇜     󰇛   󰇜    󰇛   󰇜  , (9 ) where   is the TD error at time t and   󰇛   󰇜 is the basis function for the action -value functio n. B. SMBG version of A BBA 1) SMBG measurements as sys tem inpu ts The SMBG version of ABBA ( ABBA SMBG ) was designed to determine t he “system sta tus” (features) using fo ur b lood glucose measurements: before breakf ast, lunch, dinner, and bedtime. T his feature v ector was used to u pdate the co ntrol policy. Specifically, a day’s glycaemic p rofile was describ ed by t wo types of f eatures, F hyper a nd F hypo , w hich were related to the system’s hyperglycaemic and hypoglycaem ic status, respectively. For calc ulation o f these t wo features, we used the lower and upper bord er of tight target range, i.e . G L =90 mg/dL and G H =150 mg/dL, as thresh olds:           󰇛      󰇜 ( 10 )           󰇛      󰇜 , ( 11 ) where     and     are the features in the k- th da y for the updated BR for the next d ay,    and    are the SMBG valu es that are above   and below    r espectively.    and    are the numbers of     and    in the k- th d ay. I f    or    is 0, the corresponding feature will have the value 0. The feature calculation of     and     for th e three CIRs follow a similar app roach as for BR. T he i in the superscript enumerates the correspo nding CIR s (1 : breakfast, 2: lunch o r 3: dinner). Using different features f or BR and CIRs, it is possible to update the B R and CIRs i n a relativel y independent manner. In previous w ork [18 ], the same f eatur es were used for both BR and CIR, and the basal and bolus insulin al ways chan ged simultaneousl y in t he same directio n, i.e. th e algorithm al ways o ffers increased basal insulin along with i ncreased bolus insulin and vice versa. In order to overcome this li mitation, we introduced t hree CIRs with different features and different update rules for BR and CIRs, as explained in the next sectio n. Both for BR and CIRs, the features were nor malised in to the range [0, 1], and t he normalised feature could be presented in vector format:     󰇛        󰇜  . ( 12 ) With these features, a local c ost c co uld be defined as:               , ( 13 ) where     and     are the scale para meters for weighting the hyper glycemic and h ypoglycemic features. The critic part of ABBA could be updated as described in [1 8]. JBHI-00 967-2018.R1 4 2) Up date process The update o f BR and CIR from day k -1 to d ay k consider s the values of da y k - 1:             ( 14 )                 , ( 15 ) where    and     are the co ntrol actions for update BR and for CIRs in the k -th da y. The subscript i i n ( 15 ) defines the type of t he meal for which the CIR i s applied (1 for b reakfast, 2 for lunch and 3 for dinner). To si mplify the d escription of the eq uations, we introduced a new variable  to represent   and    fro m equations ( 14 ) and ( 15 ), and named the final control ac tion as    to replace b oth    and     . Thus, equatio ns ( 14 ) and ( 15 ) can be summarised as:           , ( 16 ) where   is t he value of B R and CIRs on d ay k , while    is the value on day k -1. In order to achieve a smooth update of B R and CIRs, we introduced a fus ion value of   and   :       󰇛   󰇜   . ( 17 ) The value of  was experimentall y c hosen to be 0.5 . According to eq uations ( 16 ) and ( 17 ), the fused AP was defined as:          󰇛    󰇜󰇛         󰇜           . ( 18 ) For th e BR update, the fi nal BR was identical to the f used BR value:      . ( 19 ) In order to avoid simultaneous increase/decrease of basal insulin a nd bolus i nsulin, an ad ditional rule w as established for updating CIRs:          󰇛   󰇜   , ( 20 ) where l is a switch para meter (0 or 1) that specifies whether the f inal CIR should be the sam e as the fused val ue or th e previous val ue. The l parameter is defined by t he f ollowing equations:   󰇱                                         ( 21 ) A further co nstraint was considered to li mit the maximu m change from   to   within 5%. As in [1 4], th e   in this work consists o f three parts: the linear deterministic control action   , the s upervisory control action    and the exploratory part N (0, σ), which could be presented as Gaussian noise with zero mean and standard deviation σ . σ is calc ulated as follo ws:         , ( 22 ) where the coe fficient   has value 0.05. The value o f σ depends on the performance of the controller in the previ ous iteration. If ABBA achieves an op timised polic y, i.e. the feature     , the e xploration for next iteration i s red uced correspondingly. The calculation of   can be described as:       󰇛   󰇜    󰇛 󰇜 , ( 23 ) where h = 0 .5 and is a weighting factor to b alance the contribution of P a and P s to the fi nal control action P e . The sum of the first two terms in equation ( 23 ) could be named as P d :       󰇛   󰇜  ( 24 ) Both P a and P s are calculated on the basis of the features   . The lin ear deterministic co ntrol action P a is defined as the linear co mbination o f t he features and policy par ameter vector  :           ( 25 ) In this work, t he calculatio n of P s for CIR is si milar to that described in [ 18], i.e.                                                        ( 26 ) where i indicates the i -th CIR.     and     ar e the hyperglycaemic and hypo glycaemic features, respec tively. The calculation of P s for BR was m odified b y eval uating the values of the measure ments in different glucose le vel ranges:                                                        , ( 27 ) where Hyponumber is the num ber of measure ments which are below 70 mg /dL.   is the num ber of measurements below 8 0 mg/dL, while   is the number of m easure ments above 130 mg/dL. The variables Hyponumb er, N 1 and N 2 repr esent an overall trend of glucose le vel of the previous da y. Finally, the polic y parameter update was defi ned as: JBHI-00 967-2018.R1 5 Fig. 3 . Il lustration of in silico eval uation settings.                    , ( 28 ) where  is the actor learnin g ra te values 0.5, and   is the TD error. A one week i nitialisation phase was applied before the normal co ntrol phase of ABB A. During t he initialisat ion phase, the patien ts used t heir regular treat ment. A C GM device was used for co llecting blood glucose measurements for initiali sation. Wit h the s even -day measurements, the control polic y parameter is initialised with the T E method, as described in [ 19]. C. CGM version o f ABBA The CGM version of ABB A (ABBA CGM ) follo ws a similar approach to that of th e SMBG version f or “system stat us ” calculation and update of BR and CIRs. The calculation of P s was modified as belo w:    󰇱                                          ( 29 ) The upp er sign in ( 29 ) refers to the calculation for BR and the lower sign to CIR. Like the SMBG vers ion of ABBA, a one w eek initialisation phase with t he TE method was applied before the control phase. III. E XPER IMENTA L P ROTOCOL A. Simulation En vironment The two versions of A BBA were evaluated usin g the FDA - accepted adult population (100 virtual subjects) with the UVA/Padova T1DM si mulator [23][24 ] . The simulator’s default pump wa s selected f or both CGM and SMBG versions of ABBA. As regard s the glucose monitoring devices, ABBA C GM used Dexcom50 CGM with a sampli ng ti me of 5 minutes. T his CG M was also used during the initialisation of both versions of ABB A, while ABB A SMBG used the defa ult SMBG device during t he operational per iod. In t he in silico environment, the system define s t he type of meal b ased on the m eal ti me. In fact, the user announces the meal b y pro viding the CHO content of t he upco ming meal. In our exper iment, n o bolus in sulin was consider ed f or bedtime snacks. B. Experimental Proto col The p roposed approach was tested in silico on 100 simulated adults of the FDA ac cepted UVa/Padova Si mulator using a number of scenarios e mulating equivale nt nu mber of in silico clinical trials. Each trial lasted f or 98 days (3 months and 1 w eek), excluding day 1 (no insulin on boar d is considered for day 1 ). Each patent’s data from day 2 (D2) to D8 was used to initialise the control po licy parameters. An initialisation period of seven days was chosen to include the weekly cycle o f insulin sensitivity change, since t he patient may have di fferent behaviours over w eekdays a nd weekends. During t he initialisation period, the BR and CIR provided by the simulator w ere used to simulate standar d treatment (S T). From D9 to D98, a p eriod of 3 m onths, the ABB A was active. Dawn phe nomenon and inter-da y SI variability were considered u ntil D9 0, w hile fixed SI was employed d uring the last 8 days (D9 1-D98) . The last two weeks (D84 -D90: W eek 13 (W13) and D92 -D98: W14) were used to evaluate the performance of ABB A against the O L period (D2 - D8: W1) . The D91 was excluded from evaluation since it was the transition day fro m with S I variability to with out SI va riability . The experimental protoco l is illustrated in Fig. 3. 1) In ter-day Variability of Insulin Sensitivity an d Dawn Phenomenon The inter-day variability of SI was simulated with a uniformly distributed varia bility of ±25%. The intraday variability usually caused b y the d awn p henomenon was also considered. Da wn phenomeno n, originally described in [25] , refers to p eriodic episodes of hyperglycaemia o ccurring in the early morning h ours before and after b reakfast [2 6]. In th at work, SI d ropped every day between 04:00 and 08:00 to 50% of its no minal value, and SI ramped up or do wn within a tim e - frame of 30 minutes. 2) Meal P rotocol Four meals of specific CHO content were co nsidered for each d ay d uring t he in silico trials: breakfast at 07:0 0 (50 g), lunch a t 12:00 (60 g), d inner at 1 8:30 (80 g) and bed time snack at 23:00 (15 g). Meal variabilit y was introduced by considering a meal size variability of ±10 g for main meals and ±5 g for the be dtime snack and a meal-time variability of ±15 m inutes. Furthermore, an uncertainty of ±50% i n the CHO estimation was introd uced. Both variab ilities and uncertainties followed unifor m distributions. Furthermore, the random skip of two main meals per w eek was considered (the corresponding insulin bo lus was also skipped ). 3) Gluco se measurements In the case of ABBA SMBG, the four glucose measure ments o f the previous da y were u sed t o update the BR a nd CIRs. T he three pre -meal measurements were co nsidered 20 minute s before the main meals, while the bed time measurement took place at 23:00h. No pre - and p ostprandial measurements were taken for s nacks and no bolus insulin infusion for bed time snacks was required. All the measurements were used to estimate the “system status” ( features) for BR, while for the case of CIRs only t he measure ments cor responding to t he respective time window, i.e. th e m easure ments till the next CHO anno uncement o f main meal, were taken into consideration. JBHI-00 967-2018.R1 6 TABLE I G LUCOSE L EVELS (M EAN ± S TANDARD D EVIATION ) D02-D08: Week 1 (standard treatm ent) D84-D90: Week 13 (with SI variability) D92-D98: Week 14 (without SI variability) % in target range % in Hypo % in Severe Hypo % in Hyper % in Severe Hyper % in target range % in Hypo % in Severe Hypo % in Hyper % in Severe Hyper % in target range % in Hypo % in Severe Hypo % in Hyper % in Severe Hyper S1 89.9±8.7 2.5±3.0 1.5±3.3 6.1±8.2 0.0±0.1 85.9±12.9 1.0±1.0 0.3±0.8 12.8±12.1 0.0±0.0 89.8±7.9 0.3±0.9 0.1±0.5 9.8±7.5 0±0.1 S2 84.2±12.8 0.5±0.8 0.2±0.6 15.2±12.4 0.0±0.0 88.5±8.8 0.2±0.6 0.1±0.4 11.2±8.4 0.1±0.4 S3 84.8±12.6 0.4±0.7 0.1±0.4 14.7±12.3 0.0±0.0 88.7±8.7 0.2±0.7 0.1±0.4 11.0±8.5 0.1±0.4 S4 - 78.4±15.2 0.1±0.3 0.0±0.1 21.5±15.0 0.1±0.5 88.7±9.3 0.3±0.7 0.1±0.3 11.0±9.1 0.0±0.2 Fig. 5. Wee kly LBGI and HBGI trends in 98-day trial. In the case o f A BB A CGM , all the CGM measurements of the previous d ay were us ed to estimate t he “system statu s” and update the BR, while for the ne w CIR i , all the C GM measurements for CIR i o f the previous day, i.e. between previous da y’s CHO an nounce ment for CIR i and its next CHO announcement, were considered. Whenever the last measurement of the day was available (announced by the patient in the case of A BBA SMBG or at midnight in the case o f ABB A CGM ), the new flat B R was estimated and activated to be used for the entire day. Fo r the intraday CIRs, whenever a new meal was annou nced, the current CIR was d eactivated and th e CIR for the upco ming meal was estimated and activated. The update process was visualised in Fig. 4 . 4) Sce narios Four in silico scenarios were considered :  Scenario 1 (S1) : Combined use of CGM, ABBA CGM and insulin pump;  Scenario 2 (S2): CGM f or initialisation phase, SMBG, ABBA SMBG and insulin pu mp;  Scenario 3 (S3) : Identical to S2 + uncertai nty on SMBG measurement time;  Scenario 4 ( S4 ): Identical to S3 + skip of main meals. In order to mimic real life situations, an uncertainty of ±10 min o n standard gl ucose measurement tim e was considered in Scenarios 3 and 4. 5) Eva luation metrics To evaluate and comparatively assess the performance of each approach, the following widely used metrics were implemented: percentage ti me in glucose target ran ge [70,180] mg/dl; p ercentage time i n hypo glycaemia [5 0 70) mg/dl; perce ntage time in severe hypo glycaemia <5 0 mg /dl; percentage time in hyper glycaemia (180, 300 ] mg/dl; a nd percentage time in se vere hyperglycae mia >30 0 mg/dl. In addition, the low blo od glycaemic index (i.e. ri sk of hypoglycaemia; LBGI), hig h b lood glycaemic index (i.e. ri sk of hyperglycae mia; HBGI ), the mean a mplitude o f glycaemic excursion (MAGE), and the total daily insulin intake (TDI) in units of insulin were estimated . IV. R ESULTS AN D D ISCUSSION Table I presents the in silico results ob served in t he tested scenarios. T he ABB A SMBG version s ( S2 and S3 ) achieved comparable per formance to AB BA CGM ( S1 ), although only fe w SMBG measureme nts per day w ere available. In S2 and S3 , the n umber o f hypoglycaemic events w as further reduced. The percentages in target range were sligh tl y decreased, m ainly due to th e increase in hyperglycaemic events. This increase was anticipated, since ABBA was designed to give high Fig. 4. Update process of BR and CI Rs for one day. JBHI-00 967-2018.R1 7 priority to hypoglycae mia, the more dangerous metabolic stat e. Furthermore, as e xpected, the perce ntages in hyper -a nd hypoglycaemic ranges duri ng W 14 (evaluation phase with out SI variability) were lower than during W13 (evaluation p hase with SI variability). The co mparison of W1 (standar d treat ment) to W 13 indicates t hat both ABBA CGM and ABBA SMBG significantly decreased the p ercentage of time in hypo- and seve re hypoglycaemic ran ges (Wilc oxon test s, p <0.05), while the respective perce ntages for hyperglycae mia were increased . The weekly LBGI and HB GI [27 ] ar e illustrated in F ig. 5. In all scenarios, the LBGI value w as decreased f ro m low range (1.1 - 2.5) in W1 to minimal range (< 1.1) in W13, while HBGI remained in minimal ra nge (<5 ). After W3 (2 nd week o f ABBA), both LBGI and HBGI converged. During this t wo - week transition phase, ABB A p rogressively decrease s the value of LBGI, and keeps HBGI within mini mal range. The fact t hat HBGI was not increa sed over the tr ial p eriod shows that the increa se i n the hyp erglycaemias in all sce narios remained within the acceptab le range. Furt hermore, LBGI in the case of ABBA SMBG was lower than i n the case of ABBA CGM , while the opposite was observed for HBGI. Fig. 6 prese nts the wee kly mean value of mean amplitude of glycaemic excursions (MAGE ) among the 100 subjects. The MAGE value i ndicates diab etic instability; a small M AGE value i ndicates more stable blood glucose co ncentration [2 8] . In comparison to S1 , bo th S2 and S3 slightly decreased the MAGE value. The MAGE value o f S4 shows that blo od glucose regulation in this scenario is not as stable as in t he other scenario s, since t wo meals p er week were rando mly skipped. T he b ox plo t i n Fig. 7 s hows the d istribution of weekly mean total d aily insuli n (TDI) of the 100 subj ects during the 98-da y trial. I n ea ch week, b oth S2 and S3 of ABBA SMBG had s imilar median values a nd d istributions to ABBA CGM ( S1 ). As for S4, since two m eals per w eek along with the co rrespondi ng bol us insulin w ere rando mly skipped, the TDI was clearly lower than in the other sce narios. V. I M PL EMENTATION Both versions of ABB A are easily app lied to diabetic patients treated with insulin pumps. Durin g the first se ven operation days, ABB A pro vides the patient ’s standar d treatment and, in parallel, collects the CGM and insulin pu mp data. For the case of SMBG users, the CGM can be pro vided by t he atte nding p hysician. At the end of this p eriod, the algorithm auto matically es timates the T E and in itialises the policy parameters, a nd is then read y to pro vide personali sed insulin tr eatment with dail y ad aptation of B R and three CIRs based on either CGM o r SMBG data. The patient can decid e whether to accept or reject the suggested cha nge. If t he p atient believes the ABB A suggested change exceed s his own estimation a nd decides to reject the v alue, he can choose to use the previous value o r manually enter a new val ue for BR or CIRs. ABBA CGM a nd ABBA SMBG ar e i mplemented in an Android platform. On this Android P latform, Debiotech’s JewelPUMP application allo ws th e patient to monitor and control the insulin p ump. A co mmunication protocol between ABB A and the JewelPUMP applications w as defined and i mplemented, on the basis of stan dard Android Inter - Proce ss Communication (IP C) mechanisms that allo ws communicatio n between activitie s, as depict ed in Fig. 8. In particular, t he communication mechanism allows J ewelPUMP to s end messages to ABB A when a) there are basal p r ofile c hanges, b) or bolus infusion or c) SMBG measurements are perfor med. The co mmunication mechanis m a lso enables ABBA to info rm JewelPUMP ab out BR or bolus updates. W hen the patient announces a meal or the last BG measure ment of the da y, data synchronisation i s performed, in ord er to ensure that all messages w ere p roperly sent from J ewelPUMP to ABBA, and to send any that were not. The implemented co mmunication mechanism :  Implements Inter -Process Com munication (IPC) bet ween JewelPUMP and ABB A.  Allows Je welPUMP to send messa ges to ABBA.  Ensures these messages are proper ly received by ABBA. Fig. 6. Weekly me an amplitude of glycaemic excursion (MAGE) of t he 100 subjects Fig. 7. Wee kly mean total daily insulin (TDI ) of the 100 subjects. JBHI-00 967-2018.R1 8  Enables Histor y Synchronisation.  Allows synchro nised bidir ectional co mmunication between Je welPUMP and ABBA. With this communication protoco l in p lace, the JewelPUMP Application is able to send and receive information to a nd from the ABBA ap plication, in order to propose these personalised CIR values and basal rates to the patient, and subsequently to appl y these values when co ntrolling the infusion through the insuli n patch pu mp. The aforementioned imple mentation was conducted and tested on Je welCOM, an Android 4.4 .4 based mobile platform. However, the ABB A application could be installed on smartphones with other Android version as well. In that ca se , compatibility issues need to be co nsidered. VI. C ONCLUSI ONS An RL-based adaptive basal -bolus advisor, ABBA, i s proposed. T he advisor aims to minimise the ri sk of hypoglycaemia by providi ng personalised su ggestions on dai ly BR and bolus do se o n the b asis of glucose m easure ments fro m either CGM or SMBG devices. T he prop osed ap proach w as evaluated in silico on 100 adults fro m the FDA -accepted UVa/Padova Simulator under a num ber of challen ging scenarios. A wide variet y of d ifferent scenarios have b een published, with different meal schemes and variability in insulin sensitivity. T hese are often co mbined with dispar ate variabilities and uncertainties. Therefore, it is not straightforward to compare perfor mance in the pr esent study with other publicatio ns. To this end, we co nsidered four scenarios to eval uate b oth AB BA SMBG and ABBA CGM , which were m ore challenging than those included in our previous research i n the f ield. T hese s cenarios co nsisted of co mplex meal pr otocols, including un certainties ab out the size of th e announced CHO and varia bilities in meal anno uncement times, inter - and intra -day variabilities in insulin se nsitivit y and dawn phenomenon, as we ll as u ncertainties about the t i me of SMBG glucose mea surements. T he perfor mance of ABBA SMBG and ABBA CGM co nverged after two wee ks o f operation, while, during the transit ion phase, both version s of ABBA pr ogressively achieve d better glucose co ntrol in ter ms of LBGI. The results indicate that - independent o f the technology used for glucose measurement - t he propo sed RL approach is able to i) lear n the patient’s characteristics and ii) provide perso nalised s uggestions on insulin trea tment. The insulin suggestion s virtually eliminated hypo gl ycaemias a nd maintained glucose in the target range most o f the st udied time, eve n in the case of extre me scenario s with uncertainties, variabilities, and skipp ed main meals. Furthermore, the proposed ap proach relies on the sta ndard medical treat ment as starting point, is ea sily app lied, and the SMBG version i mplements the NI CE guidelines with resp ect to the minimum number o f fa sting glucose measurements per day. The t wo vers ions hav e alread y been integrated on Android smartphones that are able to commu nicate wirelessl y with a patch pump. The next step is to co nduct a feasibility s tudy within the framework of a pilo t clinical tr ial to confirm t he in sili co results. R EFERENCES [1] NICE guid eline [NG17], Type 1 diabetes in adults: diagnosis and manageme nt. 2015. Available: https://www.nice.org .uk/guidance/ng17 [2] Global glucose mo nitoring system market fo recast 2018- 2026 .Available: https://www .inkwoodrese arch.com/reports/global-glucose- monitoring- system -market-forecast-2018-2026 [3] J. Walsh, R. Robert s, T. S. Bailey, an d L. Heinemann, “Bolus Advisors: Sources of Error, Targets for Improvement,” J. Diabetes Sci. Technol. , vol. 12, no. 1, pp. 190 – 198, 2018. [4] C. Owens, H. Zisser, L . Jovanovic, B. Srinivasa n, D. Bonvin, F .J. Doyle, “Run - to -r un control of blood glucose concentrations f or peo ple with type 1 diabetes mellitus,” IEEE Trans Biomed Eng . vol. 53 , no. 6, pp. 996 -1005, Jun. 2006 [5] C.C. Palerm, H. Zisser, W .C. Bevier, L. Jovanovič, F.J. Doyle , “Prandial Insulin Dosing Usin g Run- to - Run Control,” Di abetes C are , vo l. 30, no. 5, pp. 1131-1136 , May 2007 . DOI: 10.2337/dc06-2115. [6] Herrero P, Pesl P, Reddy M, Oliver N, Georgiou P, Toumazou C. Advanced insulin bolus advisor based on run- to -ru n co ntrol and case- based re asoning. IE EE J Biomed H ealth I nform ., vo l. 19 , n o. 3, pp. 1087-1096, 2015. [7] Herrero P, Pesl P, Bondia J, et al . Method for automatic adjustmen t of an insulin b olus calculator: in silico robustness evaluat ion under i ntra- day variability. Comput Methods Progr ams Biomed ., vol. 119, no. 1, pp.1-8, 2015. [8] Reddy Monika, Pesl Peter , Xenou Maria, Toumazou Christofe r, Johnston De smond, G eorgiou Pan telis, He rrero Pau , and Ol iver Nick. Clinical Safety and Feasibility of the A dvanced Bolus Calculator for Type 1 Diabetes Based on Case-Based R easoning: A 6-Week Nonrandomize d Single-Arm Pilot Stu dy, Diabetes Tech nology & Therapeutics. vol .8, no.8,pp.487-493.2 016 . DO I : 10.1089/dia.2015.0413. [9] Herrero P, Bondia J, G iménez M, O liver N, Georgiou P. Automati c Adaptation of B asal Insulin Us ing Se nsor-Augme nted Pump Therapy. J Diabetes Sci Technol. , vol . 12 , no. 2, pp. 2 82-294. Mar . 2018. DOI: 10.1177/19322968 18761752. [10] Toffanin, C. , V isentin, R., M essori, M., D i Pal ma, F ., Magni, L ., & Cobelli, C. . Towards a R un- to -Ru n Adaptive Artificial Pancreas: In Silico Results. IEEE Transactions on Biomedical Engineering , 1 – 1. 2017. DOI: 10.1109/TBME.2017.2 652062 [11] A. Haidar, "The Artificial Pancreas: How Closed -Loop Control I s Revolutionizing D iabetes," IEEE Control Systems , vol . 36, no. 5, pp. 28- 47, Oct. 2016. [12] Messori M, Kro pff J, Del Favero S, et al . Individually ad aptive artificial pancreas in sub jects with type 1 diabetes: a one-month p roof- of -conce pt trial i n fre e-living conditions. Diabetes Technol Ther ., vol. 19, pp. 560- 571 . 2017. [13] Herrero P, Bondia J, Adew uyi O, et al . Enhancing automat ic closed-loop glucose control in type 1 diabetes with an adaptive meal bolus calculator — in silico eval uation under intra-day variability. Comput Fig. 8. Communica tion betwee n JewelPUMP and A BBA applications . JBHI-00 967-2018.R1 9 Methods Programs Biomed. , vol.146, pp. 125-131, 2017. DOI: 10.1109/MCS.2 016.2584318. [14] Daskalaki E, Diem P, Mougi akakou SG .An Actor-Critic based control ler for gl ucose regulation i n type 1 diabetes. Comput Metho ds Programs Biomed ., vol . 109, no. 2, pp. 116- 25 , Feb . 2013. DOI: 10. 1016/ j.cmpb.2012.03 .002. [15] B. L. Moore, A . G. Doufas, a nd L . D. Pye att, “Reinfor cement learning: A novel method for optimal control of p ropofol- induced hypnosis,” Anesth. Analg. , vo l. 112, no. 2, pp. 360 – 367, 2011. [16] B. L . M oore, P. Panousis, V. Kulkarni, and A. G. P yeatt, Larry DDoufas, “Re inforcement Le arning for Closed -Loop Propofol Anesthesia: A Human Voluntee r Study,” Proc. Twenty-Second In nov. Appl. Artif. I ntell. Conf. , pp. 1807 – 18 13, 2010. [17] B. L. Moore, “Reinfor cement Learning for Closed-Loo p Propofol Anesthesia : A Stu dy in Human Volunteer s,” Acm Jmlr , v ol. 15 , pp. 655 – 696, 2014. [18] E. Daskalaki, P. Diem, S. G. M ougiaka kou. Model-F ree M achine Learning in Biome dicine: Feasibility Study i n Ty pe 1 Diabetes. PLoS ONE, vol . 11 , no. 7, 2016 . DOI : 10.1371/journal. pone.015872 2 [19] Daskalaki, E., Diem, P., & Mougiakakou, S. G . Personaliz ed tuning of a reinforceme nt le arning control algo rithm for glucose r egulation. 35th Annual International Confere nce of the IEEE Engineering in Medicine and Biology Socie ty (EMBC) , pp. 3487 – 3490, 2013. DOI : 10.1109/EMBC. 2013.6610293. [20] RM. Laimer , A. M elmer, J. K. Mader, I. Schütz -Fuhrmann, H.-R. Engels, G . Götz, R. W. Holl . Variability of Basal Rate Profil es in Insulin Pump Thera py a nd Association w ith Complicatio ns in Ty pe 1 Diabetes Mellitus. PLoS ONE , vol. 11 , no. 3, 2016. DOI: 10.1371/journal.pone . 0150604 [21] M. Bouchonvill e, J. Jaghab, E. Duran-Valdez, R. Schrader, and D. Schade, “The Effectiveness and Risks of Programming an Insulin Pump to Co unteract the Dawn Phenomeno n in Type 1 Diabetes,” Endocr. Pract. pp. 1 – 25, 2014. [22] Konda, V. R., & Tsitsiklis, J. N. Actor-Critic Algorithms. Control Optim , vol. 42 , no. 4 , pp. 1143 – 1166, 2003. DOI: 10.1137/ S036301290138 5691 [23] C. D. Man, F. Michel etto, D. Lv , M. Breton, B. Ko vatchev, C. Cobelli (2014). The UVA/PADO VA Type 1 Diabetes Simulator: New Features. Journal of Diabetes Science and Technology , vol . 8, no. 1, pp. 26 – 34. DOI : 10.1177/193229 6813514502 [24] M. Schiavon, C. D. Man, Y. C. Kudva, A. Basu, C. Cobelli. In Silico Optimizat ion of Basal Insulin Infusion Rate during Exercise: Implication for Artificial Pancreas. Journal of Diabetes Science a nd Technology , vol . 7, no. 6 , p p. 1461 – 1469, 201 3 [25] Schmidt, M. I ., Hadji -Geo rgopoulos, A., Rendell, M., Marg olis, S., & Kow arski, A. The dawn phenomeno n, an early morning glucose rise: implications for diabetic intraday blood glucose variation. Diabetes Care , vol. 4, no. 6 , pp. 579 – 585, Nov. 1981. [26] O’Neal, T., & Luther, E. Dawn Phenomenon . StatPear ls . StatPearls Publishing. 2017. Available : http://www.ncbi.nlm. nih.gov/pubmed/ 28613643 [27] Kovatchev, B. P., St raume, M., Cox, D. J., & Farhy, L. S. Risk Analysis of Bl ood Glucose Data: A Q uantitative A pproach to O ptimizing the Control of I nsulin Dependent Diabete s. Journal of Theoretical Medic ine , vol. 3, no. 1 , pp. 1 – 10 , 2000 . DOI: 10.1080/10273 660008833060 [28] Baghurst, P. A. Calculating the Mean Amplitude of Glycemic Excursion from Continuous Glucose Monitoring Data: An Automated Algorithm. Diabetes Technology & Therapeutics , vol. 13 , no. 3 , pp. 296 – 302. 2011 DOI: 10.1089/ dia.2010.0090

A dual mode adaptive basal-bolus advisor based on reinforcement learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment