Design and Analysis Strategies for Pooling in High Throughput Screening: Application to the Search for a New Anti-Microbial

A major public health issue is the growing resistance of bacteria to antibiotics. An important part of the needed response is the discovery and development of new antimicrobial strategies. These require the screening of potential new drugs, typically…

Authors: Byran Smucker, Benjamin Brennan, Emily Rego

Design and Analysis Strategies for Pooling in High Throughput Screening: Application to the Search for a New Anti-Microbial
Design and Analysis Strategies for P o oling in High Throughput Screening: Application to the Searc h for a New An ti-Microbial Byran J. Sm uc k er ∗ 1,2,3 , Benjamin Brennan 3 , Emily Rego 4,5 , Meng W u 6,7 , Zhihong Lin 6 , Brian M. M. Ahmer 4,5 , and Blak e R. P eterson 6,7 1 Departmen t of Epidemiology & Biostatistics, College of Human Medicine, Mic higan State Univ ersit y , East Lansing, MI, USA 2 Henry F ord Health + Mic higan State Univ ersit y Health Sciences, Detroit, MI, USA 3 Departmen t of Public Health Sciences, Henry F ord Health, Detroit, MI, USA 4 Departmen t of Microbiology , The Ohio State Univ ersit y , Colum bus, OH, USA 5 Departmen t of Microbial Infection and Imm unit y , The Ohio State Univ ersit y , Colum bus, OH, USA 6 The Ohio State Univ ersit y Comprehensiv e Cancer Cen ter – Arth ur G. James Cancer Hospital and Ric hard J. Solo v e Researc h Institute, Colum bus, OH, USA 7 Division of Medicinal Chemistry and Pharmacognosy , The Ohio State Univ ersit y , Colum bus, OH, USA Abstract A ma jor public health issue is the gro wing resistance of bacteria to an tibiotics. An imp ortan t part of the needed resp onse is the discov ery and developmen t of new an timicrobial strategies. These require the screening of p oten tial new drugs, typ- ically accomplished using high-throughput screening (HTS). T raditionally , HTS is p erformed b y examining one comp ound per w ell, but a more efficient strategy p o ols m ultiple compounds p er w ell. In this w ork, we study several recently prop osed p o ol- ing construction metho ds, as w ell as a v ariety of p o oled high-throughput screening analysis metho ds, in order to provide guidance to practitioners on which metho ds to ∗ Corresp onding author (sm uck er6@msu.edu) 1 use. This is done in the con text of an application of the methods to the searc h for new drugs to combat bacterial infection. W e discuss b oth an extensiv e pilot study as w ell as a small screening campaign, and highlight b oth the successes and c hallenges of the p o oling approach. KEYW ORDS: exp erimen tal design, supersaturated experiment, regularized re- gression, an tibiotic-resistan t bacteria 1 In tro duction P athogens are b ecoming increasingly resistant to an tibiotics, and this fact has alarming implications for public health including increased mortalit y and asso ciated healthcare costs [1, 2]. The United Nations estimates that deaths due to antimicrobial resistance will b e 10 million p er year by 2050 [3]. F or a sp ecific example, the typhoidal strains of Salmonel la enteric a cause t yphoid fever resulting in more than 100,000 deaths p er year [4, 5], and non- t yphoidal strains are a leading cause of death from fo o d-b orne illness in the United States [6] and a con tributor to the mortality in dev eloping nations due to diarrhea [7, 8]. Because of the surge in an tibiotic-resistan t t yphoidal and non-t yphoidal Salmonel la enteric a strains, the CDC and the WHO hav e called for the developmen t of new drugs to address this need [1]. In resp onse to this problem, a new anti-microbial strategy has b een suggested [9] with p oten tial in many bacteria including Salmonel la enteric a . An imp ortant part of dev eloping that strategy is drug disco v ery , and the initial phase of this searc h is accomplished via high-throughput screening (HTS). High-throughput screening is used widely in drug dis- co v ery , chemical biology , and man y other scien tific and industrial domains. HTS in v olves the placement of sp ecified comp ounds in to wells in 384- or ev en 1536-well plates. The plates are then assay ed and c hec ked for a desired indication. Normally , a single comp ound is placed in each well, but v arious authors hav e suggested and sho wn that p o oling multiple 2 comp ounds in each well can impro ve throughput and/or statistical efficiency . Pooling has b een con trov ersial and con tested, with b oth successes [e.g., 10, 11, 12, 13] and cautions [14, 15] in the literature. Recen tly , how ever, there ha ve b een rep orts of several successful p o oled screening pro cedures [16, 17, 18]. Older metho dologies, using approaches like or- thogonal p ooling [19] or p oolHiTS [20], neither constructed their p o ols nor analyzed them using statistical methods, while the new approac hes use statistical design ideas for p ool construction [16, 18] and statistical regularization for analysis [16, 17, 18]. In this w ork, w e add to this emerging p o oling literature in sev eral wa ys. First, we de- scrib e in some detail a particular application related to a searc h for an timicrobials discussed at the outset (Section 2). In Section 3, w e describ e a n umber of existing p o ol construction metho ds and p o oled HTS analysis metho ds, and mak e a set of extensiv e comparisons be- t w een them in Section 4. Most of the metho ds w e consider are from the literature, but w e prop ose a new Lasso thresholding metho d that exploits our knowledge of effect directions. W e also discuss a secondary analysis metho d to address a problem in these t yp es of screens: they often pro duce to o man y false p ositiv es, whic h consume a large amount of resources. The secondary criterion severely reduces the n um b er of comp ounds to b e v alidated, while still detecting large effects. In Section 5 w e then present an extensive description of the pi- lot study (Section 5.1) used to establish p o oling as a viable approach in this setting, as w ell as results from an initial screen (Section 5.2) which iden tified several promising compounds while reducing the num b er of considered false positives. W e finish with a Discussion in Section 6. 3 2 The Screening Problem Sc h wieters et al. [9] rep ort that the enzyme mannitol-1-phosphate 5-deh ydrogenase (MtlD) offers antimicrobial p otential in man y bacteria. This enzyme conv erts the comp ound mannitol-1-phosphate to fructose-6-phosphate. When mtlD mutan t bacteria are pro vided mannitol, mannitol-1-phosphate accum ulates, into xicating the bacterium leading to re- duced gro wth and atten uated virulence in animal mo dels. Th us, the goal is to identify a comp ound that inhibits MtlD in the presence of mannitol. Using a wild-type bacterium, inhibition of MtlD in the presence of mannitol will result in lac k of growth. Ho w ev er, nu- merous comp ounds will inhibit the growth of a wild-type bacterium for reasons unrelated to Mtl D inhibition. T o eliminate these from consideration, a parallel screen is used in whic h the bacterium cannot be harmed b y a MtlD inhibitor (because it is a mtlA mutan t and cannot form mannitol-1-phosphate). Thus, the screening problem is to find a drug which inhibits gro wth of the wild-t yp e (WT) bacteria but not of the mtlA m utant (MUT) when assa y ed against b oth. W e call suc h a drug a true hit . A drug whic h inhibits b oth WT and MUT is called a pseudo-hit . Our goal is to identify true hits. Previous single-replicate, one-comp ound-one-w ell (OCOW) screening of 10,000 com- p ounds for this system yielded 140 that app eared to inhibit WT, and 40 of those did not app ear to inhibit MUT. Ho w ev er, up on retest in duplicate, none of the 140 were v alidated as true hits. Contemplating a similar screen scaled up to hundreds of thousands of com- p ounds, the inv estigators realized that even a false p ositive rate of 1% would b e extremely exp ensiv e. This led them to consider p o oling as a more efficient alternativ e, since it is a w a y to observ e each compound more than once while using more comp ounds than w ells. This yields the p oten tial of larger true p ositive rates with the same or smaller false p ositive rates. The researc hers agreed that eight compounds p er p o ol was feasible. F rom their 4 previous testing, they expect the set of true hits to b e extremely sparse within the space of comp ounds they anticipate searching. Statistically , this do es ha v e an adv antage: Inter- actions betw een comp ounds are not exp ected, unless tw o or more true hits end up in the same p o ol. Because of the level of sparsit y , this is unlik ely . Despite its promise, p o oling also presen ts challenges. Logistically , the comp ounds, stored in source plates, m ust b e transferred to target plates in p o ols dictated by the ex- p erimen tal design. This is accomplished via automated liquid handling machines, but constructing the p o oled plates is still more time-consuming than OCO W. It also requires careful preparation and communication b et w een the statistician and the p ersonnel imple- men ting the design. W e revisit this screening problem in Section 5. 3 Design and Analysis Metho dologies In addition to demonstrating an application of po oling in HTS and the asso ciated data and statistical c hallenges, tw o additional ob jectives of this work are to (1) compare p ooled HTS design strategies; and (2) compare po oled HTS analysis strategies. A basic question w e seek to learn ab out: If one wishes to undertak e a p o oled high-throughput screen, what design and analysis metho d will b e most effectiv e? In this section, we describe the designs and analysis metho ds we consider. Throughout, w e use the designs and analysis metho ds to estimate the main effects mo del, which w e assume to b e true: y = β 0 1 + X β + ϵ , (1) where y is n × 1, X ∈ {− 1 , 1 } n × k , β = ( β 1 , . . . , β k ) T and ϵ ∼ N ( 0 , σ 2 I ) with 0 an n -v ector 5 and I the n × n iden tit y matrix. 3.1 Designs F ollowing the mo del ab ov e, we assume n wells to study k comp ounds, with n < k . The design X is co ded suc h that entry ij is − 1 if the j th comp ound is absent in the i th w ell, and +1 if the comp ound is present. W e represent the size of the i th p o ol b y c i , with c min and c max denoting the smallest and largest p o ol size in a design, resp ectiv ely . If c min = c max , let c denote the common size across all po ols. Similarly , the num b er of times the j th comp ound app ears in the design is a j , with a min , a max , and a defined analogously . In our study , w e primarily consider three types of designs: (1) the Constrained Row Screening designs of Sm uc ker et al. [18], based on a criterion from the sup ersaturated design literature; (2) the matrix-augmen ted p o oling strategy of Ji et al. [16], using ideas from the compressed sensing literature; and (3) a semi-random construction approac h from Liu et al. [17]. 3.1.1 Constrained Ro w Screening Designs The Constrained Row Screening (CRo wS) designs of Smuc ker et al. [18] are constructed using the U E ( s 2 )-criterion from the sup ersaturated design literature. In this design for- m ulation, the maxim um p o ol size c max is sp ecified and this translates to a constraint on the n umber of +1’s in each ro w. Giv en the {− 1 , 1 } co ding, this translates to CRo wS de- signs resp ecting the constrain t P k j =1 x ij ≤ 2 c max − k for all rows, i = 1 , 2 , . . . , n . The U E ( s 2 ) criterion is U E ( s 2 ) = P i µ + r σ , where µ is the av erage resp onse for a well with no active compounds, r is a user-sp ecified v alue, and WLOG w e assume p ositive effects are of interest. In our case, w e considered p s ∈ { 0 . 75 , 1 } and r ∈ { 2 , 3 } . Practically , for the k = 640 CRowS design for which each comp ound app ears in 4 wells, this means that we considered four secondary criteria: “3 W ells > 2 SD (ab o ve µ )”, “4 W ells > 2 SD”, “3 W ells > 3 SD”, and “4 W ells > 3 SD”. W e did a sim ulation of this secondary analysis strategy , using CRo wS with P ≡ λ - sp ecific Gauss-Lasso using τ λ = max( ˆ β λ ). The simulation settings were the same as with the previous sim ulations in Section 4; that is, with k ∈ { 500 , 640 , 960 , 1280 } and β ∈ { 1 , 2 , 3 , 4 } . W e show TPR and FPR for the secondary criteria compared with the results of P without a secondary criterion in Figure 3. As exp ected, b oth the TPR and FPR plummet as the secondary criteria get more strict. If we fo cus on k = 640, which is the screen size w e ended up using in the real exp eriments, w e see that only the most strict “4 W ells > 3 SD” secondary criterion hav e unacceptably low p ow er for the largest effect. Ho wev er, this highlights that these secondary criteria are really only viable if the exp erimenter is willing forgo the detection of small or even medium-sized effects. But the adv an tage is also imp ortan t: the num b er of false p ositiv es is muc h more manageable. With a FPR of 19 0.0015 as in the analysis with no secondary criterion, in vestigators w ould b e c hasing 15 false p ositives for a 10,000 comp ound screen and 1,500 for a million-comp ound screen. But with the “3 W ells > 3 SD” secondary criterion with an FPR of around 0 . 00006, one would exp ect less than 1 and 60 false positives, resp ectively . k: 960 k: 1280 k: 500 k: 640 1 2 3 4 1 2 3 4 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Eff ect Size T rue P ositive Rate k: 960 k: 1280 k: 500 k: 640 1 2 3 4 1 2 3 4 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 Eff ect Size F alse P ositive Rate Method Primar y 3 W ells > 2 SD 4 W ells > 2 SD 3 W ells > 3 SD 4 W ells > 3 SD Figure 3: Comparison of FPR and TPR in CRo wS designs for differing secondary criterion, using the λ -sp ecific Gauss-Lasso with τ λ = max( ˆ β λ ) primary thresholding criterion. 5 Designing and Analyzing the An ti-Microbial Screens In this section, w e describ e the pro cess by whic h our screens were designed, executed, v alidated, and used to identify promising comp ounds for the MtlD system. First, we conducted several pro of-of-concept screens to ensure that our metho ds could detect kno wn hits, for differen t lev els of screening b oldness. Then, we rep ort results of a small screening campaign the goal of whic h was to identify new compounds that inhibit MtlD. Recalling Section 2, each plate m ust b e measured using a MUT and WT assay in order to find a drug that inhibits the WT but not the MUT. W e used the basic metho dology describ ed in [18] to construct CRowS designs, then 20 used the Lasso to obtain lists of p oten tial hits. Details of our analysis are pro vided in the results b elow, but ultimately w e used the λ -sp ecific Gauss-Lasso along with the secondary criteria of Section 4.4 to construct hit lists. Note that we rep ort on actual exp erimen ts conducted b efore we did the extensiv e testing describ ed in Sections 3 and 4. Thus, our recommendations from the comparative w ork ma y not map precisely on to what w e actually describ e in this section. The purp ose of rep orting these results is to demonstrate a high- qualit y solution to the screening problem presented in Section 2, while pro viding insight and guidance for other HTS applications. 5.1 Pro of-of-Concept Screening Results A first step when considering p o oling for a particular assay and for the type of comp ounds to b e screened is to ensure that the p o oling metho d can recognize kno wn hits. This is imp ortan t b ecause it is p ossible for p o oled HTS to fail. If there are large and/or n umerous in teractions, the ability to identify main effects may b e degraded [see results referenced in the Discussion of 18]. There ha v e also b een rep orts of comp ounds reacting together in w ells in therap eutically uninteresting w a ys [e.g., 15]. Thus, for the pro of-of-concept screens describ ed in this section, w e spiked the screens with a known pseudohit (Carb enicillin, a drug known to hit b oth the WT and MUT assa ys, denoted CntP1A03 ), as w ell as a known true hit (Chloramphenicol, a drug kno wn to hit WT but not MUT, denoted CntP1A04 ). W e used comp ounds from the MedChem Express Diversit y library (5,000 Scaffold Library; Cat. No.: HY-L902) W e also wished to inv estigate the exten t to which our screening metho d could b e successfully stretched. This led us to executing the follo wing p o oled HTS exp erimen ts, using CRo wS designs: • ( n = 320 , k = 500 , c = 8), with a min = 5 and a max = 6. That is, eac h comp ound 21 app ears in either 5 or 6 p o ols. • ( n = 320 , k = 640 , c = 8), with a = 4. • ( n = 320 , k = 960 , c = 8), with a min = 2 and a max = 3. • ( n = 320 , k = 1280 , c = 8), with a = 2. The results of these exp eriments w ere promising. Examining the Lasso profile plots, along with the data brok en out b y the spik ed comp ounds of interest, show ed that p o oling w as w orking in b oth the k = 500 and k = 640 settings. Figures 4 and 5 show these plots for ( n = 320 , k = 640 , c = 8). In Figure 4 observe that CntP1A03 is annotated as one of the most prominen t among the profiles in both the WT and MUT assa y , while CntP1A04 only app ears prominen t in the WT. This suggests, correctly , that CntP1A03 is a pseudohit and that CntP1A04 scores as a true inhibitor. This is visually confirmed in Figure 5, where CntP1A03 clearly inhibits b oth WT and MUT, but CntP1A04 inhibits only WT. In order to reduce the sub jectivit y in these judgmen ts, we used the γ -sp ecific Gauss-Lasso describ ed in Section 4.3, with r = 0 . 9. Using this pro cedure, w e found three comp ounds that hit WT but not MUT, including ControlP1A04 , as exp ected. The tw o false positiv es are consistent with a false p ositiv e rate of ab out 0.3%. W e found similar results for ( n = 320 , k = 500 , c = 8), with the profile plots, data, and γ -sp ecific Gauss-Lasso all identifying ControlP1A04 as an inhibitor, and the latter pro cedure iden tifying in addition four presumed false p ositives. The results from the more aggressive ( n = 320 , k = 960 , c = 8) and ( n = 320 , k = 1280 , c = 8) exp eriments w ere also promising, but more am biguous. Visually , the profile plots yielded no hint of the expected pseudohit and true hit, and the γ -sp ecific Gauss-Lasso failed to identify the comp ounds. How ever, plotting the comp ounds separately from the rest of the data for the exp eriments still suggested that CntP1A03 inhibited b oth, while CntP1A04 inhibited just the WT. Still, with just 2 or 3 p o ols per compound, there wasn’t 22 enough statistical information to clearly identify them. W e hav e included the plots for the ( n = 320 , k = 1280 , c = 8) exp erimen t in the Supplementary do cument. Based on these results, we mov ed forw ard to the screening stage with plans to use the ( n = 320 , k = 640 , c = 8) design. −10 −8 −6 −4 −2 0 2 −0.06 −0.02 0.02 Log Lambda Coefficients 418 361 284 133 0 0 0 Degrees of Freedom HYCPK39161P HYCPK39162I19 HYCPK39162M HYCPK39161B12 HYCPK39161B11 CntP1A03 CntP1A04 HYCPK39162I7 HYCPK39162I13 HYCPK39162N 640 compounds: Wildtype −10 −8 −6 −4 −2 0 2 −0.08 −0.02 0.02 Log Lambda Coefficients 416 368 272 90 0 0 0 Degrees of Freedom HYCPK39161B11 CntP1A03 HYCPK39162N HYCPK39161P HYCPK39162M HYCPK39161N HYCPK39161B12 HYCPK39162F HYCPK39162H HYCPK39162I 640 compounds: Mutant Figure 4: Lasso profile plots for ( n = 320 , k = 640 , c = 8) pro of-of-concept CRo wS design. The plot annotates the top 10 comp ounds in terms of magnitude at the smallest λ . 23 0.2 0.4 0.6 WT Controls WT P ools MUT P ools MUT Controls Gro wth (OD600) W ell T ype Chloramphenicol Present Carbenicillin Present Neither Present No Compounds Figure 5: Plots of the data for ( n = 320 , k = 640 , c = 8) pro of-of-concept CRo wS design. The control wells consist of Carb enicillin as a p ositive control and DMSO (no comp ounds) as a negative control. 5.2 Screening Results F ollowing the pro of-of-concept exp eriments, we conducted a small-scale screen consisting of 16 plates, called PLINGS. The comp ounds used in this screen were from ChemBridge Macro cycle Library (10,000 Macro cycles; 2018 v ersion; N1558-1) As p er the test screens, eac h PLING included 640 unique compounds, which means the total n umber of compounds represen ted in this initial screen is 10,240. F or this system, the exp ectation is that true hits may b e as rare as 1 in 100,000. Thus, there is no certaint y that in this first screen we will find a true hit. Still, w e presen t the results that we hav e obtained to this point. 24 In the Supplemen tary Do cument, we ha ve included the b o xplots from the PLINGS. Predictably , with real data there are ambiguities. F or instance, it is clear that in some plates there are a relativ ely small num b er of w ells that demonstrate inhibition, and in others there are indications of potential activ ation. Also, some plates ha v e more apparen tly inhibited wells than w ould b e exp ected in such a sparse system. Finally , the MUT vs. WT w ere exp ected to ha v e similar distributions, but in realit y the MUT assa y is consistently cen tered low er than the WT assa y (This is certainly due to the WT utilizing mannitol in the medium while the MUT cannot. This could be remedied in future assays b y reducing the concen tration of mannitol). Despite these data c hallenges, eac h plate was analyzed visually via profile plots. Though w e don’t include the profile plots from all of the PLINGs, in Figure 6 we pro vide four: (a) In PLING2 there is a promising comp ound S01F010 that app ears to inhibit in WT but not in MUT; (b) in PLING3 there are no clear inhibitors for either assa y; (c) in PLING7, there app ear to b e three pseudohits, S02H008 , S01D009 , and S01D022 , b ecause they hit on b oth assa ys; and (d) in PLING11 there are sev eral p ossible true hits, S02N020 , S02E007 , and S01M005 , which show up on WT but not MUT. The preceding analysis is relatively sub jective, so we use the λ -sp ecific Gauss-Lasso with r = 0 . 9 to pro vide an ob jective hit list. T o review, this Gauss-Lasso pro cedure sets an y Lasso estimate to 0 if the estimate is greater than − 0 . 9 × max ˆ β λ . That is, the only estimates that survive are the ones that are relativ ely large and negativ e. This is done for b oth the WT and MUT assays, for eac h of the PLINGs separately . Then, for each PLING, w e take an initial hitlist to b e those that hit on WT but not on MUT. Using the 90% λ -sp ecific Gauss-Lasso yields 1.1% of the studied comp ounds as p otential hits. Though this seems to b e a reasonable hit rate—ev en if most of them are false p ositiv es—but even 25 −10 −8 −6 −4 −2 0 −0.010 −0.005 0.000 0.005 Log Lambda Coefficients 369 284 120 0 0 0 Degrees of Freedom WT for PLING2 S01F010 S02F019 S02M012 S01G017 S01N019 S02I007 S02K015 S01G004 S02H010 S01G013 −10 −8 −6 −4 −2 0 2 −0.005 0.000 0.005 Log Lambda Coefficients 350 285 75 0 0 0 0 Degrees of Freedom MUT for PLING2 S01F015 S02P020 S02M022 S02L003 S02B012 S01F003 S02F004 S02B011 S02M005 S02D007 (a) −10 −8 −6 −4 −2 0 2 −0.005 0.000 0.005 Log Lambda Coefficients 358 284 119 0 0 0 0 Degrees of Freedom WT for PLING3 S02G021 S01O014 S01A010 S01F017 S01N018 S02J003 S01K015 S01E012 S02C008 S01M003 −10 −8 −6 −4 −2 0 −0.006 −0.002 0.002 0.006 Log Lambda Coefficients 349 268 67 0 0 0 Degrees of Freedom MUT for PLING3 S02O009 S01E003 S01C013 S02N008 S01N007 S02K017 S02E005 S02H011 S01H009 S01B003 (b) −10 −8 −6 −4 −2 0 2 −0.04 −0.02 0.00 Log Lambda Coefficients 368 257 34 3 0 0 0 Degrees of Freedom WT for PLING7 S01D022 S01D009 S02H008 S01G022 S02J013 S02O006 S02K007 S01P003 S01H009 S02G012 −10 −8 −6 −4 −2 0 2 −0.025 −0.015 −0.005 0.005 Log Lambda Coefficients 368 261 61 3 0 0 0 Degrees of Freedom MUT for PLING7 S01D022 S01D009 S02H008 S02G009 S02L015 S01P006 S01I013 S02C016 S02B009 S01O015 (c) −10 −8 −6 −4 −2 0 −0.010 −0.005 0.000 Log Lambda Coefficients 334 249 29 0 0 0 Degrees of Freedom WT for PLING11 S02N020 S02E007 S01M005 S01E018 S02P005 S01G005 S01K017 S01B013 S01F009 S01F020 −10 −8 −6 −4 −2 0 −0.015 −0.005 0.005 Log Lambda Coefficients 369 294 120 0 0 0 Degrees of Freedom MUT for PLING11 S01L016 S01K004 S02M012 S01A010 S01L021 S02G012 S02O017 S02G009 S02F021 S02L004 (d) Figure 6: F our profile plots. 26 in this small initial screen there are more than 100 hits identified, and if scaled to hundreds of thousands of comp ounds such a hit rate would place a tremendous burden on secondary screens. In order to further screen the initial hitlists, w e searc h among the p oten tial hit com- p ounds for those whose p o ols consistently exhibited substantial inhibition, using the metho d describ ed in Section 4.4. F or the ( n = 320 , k = 640 , c = 8) design, eac h comp ound app ears in four w ells, so w e lo ok ed for promising comp ounds with three or four p o ols more than 2 or 3 standard deviations aw ay from the median of the p o ols not containing the comp ound under consideration. T o estimate the standard deviation, w e used the robust estimator of 1 . 48 × M AD , where M AD is the median absolute deviation of the remaining points. This w as done in case other hit comp ounds exist in the remaining p o ols. T able 3 iden tifies 8 unique comp ounds, arbitrarily named, that meet the 2 SD sec- ondary criteria, which is that at least three of its four p o ols inhibit b y 2 or more SDs. It also includes, naturally , the t w o comp ounds that meet the 3 SD secondary criteria. The secondary criteria eliminates the issue of compounds c hosen b y the metho d due to one or t w o outliers. Of these 8, we sho w in Figures 7 and 8 t wo comp ounds for which all four w ells are 2 SD hits, as well as the tw o additional comp ounds which hav e three of four wells b ey ond 3 standard deviations. This metho dology allo ws the researchers to visually focus on the very most promising comp ounds. So far, none of the promising comp ounds ha ve b een v alidated as true hits, but this is not unexpected in a system an ticipated to be as sparse as this. 27 0.1 0.2 0.3 0.4 0.5 0.6 WT Controls WT P ools MUT P ools MUT Controls Growth (OD600) W ell T ype No Compounds Carbenicillin Only S01G022 Absent S01G022 Present (a) Promising comp ound PLING7-S01G022 with all four of its p o ols b eyond tw o standard deviations from the median for WT assa y , while not inhibiting the MUT assay . The solid line for the WT p o ols is an estimate of the median; the dashed line is an estimate of 2 SD b elo w the median. 0.1 0.2 0.3 0.4 0.5 0.6 WT Controls WT P ools MUT P ools MUT Controls Growth (OD600) W ell T ype No Compounds Carbenicillin Only S02N020 Absent S02N020 Present (b) Promising comp ound PLING11-S02N020 with all four of its p o ols b ey ond t w o standard deviations from the median for WT assay , while not ob viously inhibiting the MUT assa y . The solid line for the WT p o ols is an estimate of the median; the dashed line is an estimate of 2 SD b elow the median. Figure 7: Two promising 2-SD comp ounds. 28 0.1 0.2 0.3 0.4 0.5 WT Controls WT P ools MUT P ools MUT Controls Growth (OD600) W ell T ype No Compounds Carbenicillin Only S01E012 Absent S01E012 Present (a) Promising comp ound PLING5-S01E012 with three of its four p o ols b eyond three stan- dard deviations from the median, though is app ears that tw o of its po ols also inhibit the MUT assa y . The solid line for WT is an estimate of the median; the dashed line is an estimate of 3 SD b elow the median. 0.1 0.2 0.3 0.4 0.5 WT Controls WT P ools MUT P ools MUT Controls Growth (OD600) W ell T ype No Compounds Carbenicillin Only S02B003 Absent S02B003 Present (b) Promising comp ound PLING6-S02B003 with three of its four p o ols b eyond three stan- dard deviations from the median of the WT assa y . The solid line is an estimate of the median for WT; the dashed line is an estimate of 3 SD below the median. Figure 8: Two promising 3-SD comp ounds. 29 PLING Comp ound Num b er Bey ond 2 SDs Num b er Bey ond 3 SDs 1 S01I013 3 0 5 S01E012 3 3 5 S01L010 3 2 6 S02B003 3 3 7 S01G022 4 0 7 S02O006 3 0 8 S01K008 3 2 11 S02N020 4 2 T able 3: F or the initial screen, the eight compounds that met the 2 SD secondary criterion, including the t w o comp ounds that met the 3 SD secondary criterion. In this screen, eac h comp ound app ears in four p o ols. 6 Discussion In this work, one ma jor con tribution is to provide practitioners of p o oling in HTS guidance regarding p o ol construc tion methods and screen analysis methods. Along the w ay , w e in tro- duce t w o improv ements to previously-describ ed analysis metho ds that align with our goal of reducing false p ositives. The second con tribution of our work is to apply these metho ds to a real problem in search of a solution. This problem is the resistance to an tibiotics that has progressed and a p ossible solution is the inhibition of an enzyme that neutralizes the to xic accumulation of the comp ound mannitol-1-phosphate in many bacteria. W e show ho w the designs and analysis metho ds we studied can b e used in practice, with b oth an ex- tensiv e pilot study—in whic h controls are included—and a small-scale screening campaign. Suc h an extensive set of exp erimen ts provide s insights in to the challenges and compromises that m ust b e faced in real w orld data. Regarding the comparisons of design metho ds, w e compared several recen t p o ol con- struction strategies and concluded that the CRowS approach of [18] offers a go o d com- bination of effectiveness and flexibility . W e note that when the design parameters allo w balance in terms of p o ol sizes and comp ound replication, it app ears that all three metho ds 30 w e compared p erform similarly . Of course, we are limited in the generality of our conclusion b y the sp ecificity of our sim ulation scenarios. But we would tentativ ely state that for such balanced cases, there is likely little difference b etw een CRo wS, MAPS, and the randomly constructed p o ols. In fact, we conjecture that for those balanced cases, “random” p o ols are sufficien tly constrained to yield CRowS designs, though w e ha v e not pro v ed this. W e also compared a n um b er of metho ds to analyze po oled HTS data and concluded that for the presen t application—in whic h it is critical to reduce the n umber of false positives to near zero—a new λ -specific Gauss-Lasso metho d pro vides the b est balance of TPR and FPR. This metho d tak es adv an tage of the fact that w e kno w that any true effect will b e inhibitory . W e also describ ed an additional, secondary analysis method that further reduces the n umber of false positives. F or the analysis metho d comparisons, the preferred metho d strongly dep ends on the balance the exp erimen ter wishes to strik e b etw een true and false p ositiv e rate. The λ -sp ecific Gauss-Lasso, with its relatively low FPR but also lo wer TPR, aligns with the need of our application. But if it is important to identify more and smaller effects, the Elastic Net or the Gauss-Lasso with τ = 0 . 5 × max( | ˆ β λ =0 | ). A critical assumption in p o oling is that statistical in teractions b etw een comp ounds are not numerous nor large. [18] briefly inv estigated this and sho wed that in teractions can hav e a large impact on whether hits are detected. There is also concern regarding promiscuous aggregation [15] of comp ounds, in which molecules com bine in unpredictable and unhelpful w ays and preven t clean decon v olution of individual comp ound effects. It is b ey ond the scop e of this work to inv estigate these things further, but the present work has demonstrated at least that strong hits are detectable in real-w orld, sparse settings suc h as the one inv estigated herein. 31 Supplemen tary Material Supplemen tary Material consists of: A Supplemen tary do cument referred to in the main document. [Av ailable at the end of this do cumen t] B P o oled and control data from the pilot exp eriments of Section 5.1, along with p o oled and control data from the screening exp erimen ts of 5.2. [Una v ailable in arXiv v ersion] Ac knowledgemen ts ChatGPT/Co-Pilot was used to assist in writing analysis, simulation and/or figure-generation co de. W e thank the Drug Disco very Shared Resource High Throughput Screening Lab at The Ohio State Universit y Comprehensive Cancer Cen ter for tec hnical supp ort. F unding BRP , MW, and ZL thank the OSU Comprehensiv e Cancer Center (2P30 CA016058) for fi- nancial support. MW ac kno wledges the NIH support of R50 CA243786. ER was supp orted b y PHS gran t NIH T32 AI165391. Conflicts of In terest The authors declare no conflicts of in terest. Data Av ailabilit y Statemen t The p ooled data from Section 5 is av ailable in the Supplemen tary Material. Ra w data, con trol data, and/or co des for data processing and sim ulations are a v ailable upon request of the authors. 32 References 1. Evelina T acconelli. Global priorit y list of an tibiotic-resistan t bacteria to guide research, disco v ery , and dev elopment. 2017. 2. Abi Manesh and George M V arghese. Rising an timicrobial resistance: an evolving epidemic in a pandemic. The L anc et Micr ob e , 2(9):e419–e420, 2021. 3. T Co que, Da vid W Graham, Amy Pruden, A So, and Ed T opp. Bracing for sup er- bugs: Strengthening environmen tal action in the one health resp onse to an timicrobial resistance. 2023. 4. Vittal Mogasale, Brian Mask ery , R Leon Ochiai, Jung Seok Lee, Vijay alaxmi V Mo- gasale, Enusa Ramani, Y oung Eun Kim, Jin Kyung P ark, and Thomas F Wierzba. Burden of typhoid fev er in lo w-income and middle-income countries: a system- atic, literature-based up date with risk-factor adjustment. The L anc et Glob al He alth , 2(10):e570–e580, 2014. 5. Marina An till´ on, Josh ua L W arren, F orrest W Cra wford, Daniel M W ein b erger, Esra K ¨ ur ¨ um, Gi Deok Pak, Florian Marks, and Virginia E Pitzer. The burden of t yphoid fev er in low-and middle-income countries: a meta-regression approach. PL oS ne gle cte d tr opic al dise ases , 11(2):e0005376, 2017. 6. Elaine Scallan, Rob ert M Ho ekstra, F rederic k J Angulo, Robert V T auxe, Marc-Alain Widdo wson, Sharon L Ro y , Jeffery L Jones, and Patricia M Griffin. F o o db orne illness acquired in the united states—ma jor pathogens. Emer ging infe ctious dise ases , 17(1):7, 2011. 7. Karen L Kotloff, James P Nataro, William C Blackw elder, Dilruba Nasrin, T amer H 33 F arag, Sandra P anchalingam, Y ukun W u, Samba O Sow, Dipik a Sur, Rob ert F Breiman, et al. Burden and aetiology of diarrhoeal disease in infan ts and young chil- dren in developing countries (the global enteric multicen ter study , gems): a prosp ective, case-con trol study . The lanc et , 382(9888):209–222, 2013. 8. Sara M Pires, Christa L Fischer-W alk er, Claudio F Lanata, Brec h t Devleesschau wer, Aron J Hall, Mart yn D Kirk, Ana SR Duarte, Rob ert E Blac k, and F rederick J An- gulo. Aetiology-sp ecific estimates of the global and regional incidence and mortality of diarrho eal diseases commonly transmitted through fo o d. PloS one , 10(12):e0142927, 2015. 9. Andrew Sc h wieters, Allysa L Cole, Emily Rego, Chengyu Gao, Razieh Kebriaei, Vicki H Wyso c ki, John S Gunn, and Brian MM Ahmer. Mtld as a therap eutic target for in testinal and systemic bacterial infections. Journal of Bacteriolo gy , 207(1):e00480–24, 2025. 10. Laura Wilson-Lingardo, Peter W Davis, Da vid J Eck er, Normand Heb ert, Oscar Acev edo, Kelly Sprankle, Thomas Brennan, Leslie Sc h warcz, Susan M F reier, and Jacqueline R Wy att. Decon voluti on of combinatorial libraries for drug discov ery: exp er- imen tal comparison of po oling strategies. Journal of me dicinal chemistry , 39(14):2720– 2726, 1996. 11. Mic hael Snider. Screening of comp ound libraries... consomme or gum b o? Journal of Biomole cular Scr e ening , 3(3):169–170, 1998. 12. Nuzhat Motlek ar, Scott L Diamond, and Andrew D Napp er. Ev aluation of an orthogo- nal p o oling strategy for rapid high-throughput screening of proteases. Assay and drug development te chnolo gies , 6(3):395–405, 2008. 34 13. LL Elkin, DG Harden, S Saldanha, H F erguson, DL Cheney , SN Pieniazek, DP Mal- oney , J Zewinski, J O’Connell, and M Banks. Just-in-time comp ound p o oling in- creases primary screening capacity without compromising screening quality . Journal of Biomole cular Scr e ening , 20(5):577–587, 2015. 14. Thomas DY Chung. Screen comp ounds singly: why m uck it up? Journal of Biomole c- ular Scr e ening , 3(3):171–173, 1998. 15. Brian Y F eng and Brian K Shoic het. Synergy and an tagonism of promiscuous inhibi- tion in m ultiple-comp ound mixtures. Journal of me dicinal chemistry , 49(7):2151–2154, 2006. 16. Hongc hao Ji, Xue Lu, Shiji Zhao, Qiqi W ang, Bin Liao, Ludwig G Bauer, Kilian VM Hub er, Ray Luo, Ruijun Tian, and Chris So on Heng T an. T arget deconv olution with matrix-augmen ted po oling strategy rev eals cell-specific drug-protein in teractions. Cel l Chemic al Biolo gy , 30(11):1478–1487, 2023. 17. Nuo Liu, W alaa E Kattan, Benjamin E Mead, Conner Kummerlo we, Thomas Cheng, Sarah Ingabire, Jaime H Cheah, Christian K Soule, Anita V rcic, Jane K McIninch, et al. Scalable, compressed phenotypic screening using po oled p erturbations. Natur e Biote chnolo gy , pages 1–13, 2024. 18. Byran J Sm uck er, Stephen E W right, Isaac Williams, Richard C P age, Andor J Kiss, Surendra Bikram Silwal, Maria W eese, and Da vid J Edwards. Large ro w-constrained sup ersaturated designs for high-throughput screening. Biometrics , 81(4):ujaf160, 2025. 19. Ragh unandan M Kaink ary am and P eter J W o olf. P o oling in high-throughput drug screening. Curr ent opinion in drug disc overy & development , 12(3):339, 2009. 35 20. Ragh unandan M Kaink aryam and P eter J W o olf. p o olhits: A shifted transv ersal design based po oling strategy for high-throughput drug screening. BMC bioinformatics , 9:1– 11, 2008. 21. Hui Zou and T revor Hastie. Regularization and v ariable selection via the elastic net. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 67(2):301– 320, 2005. 22. Rob ert Tibshirani. Regression shrink age and selection via the lasso. Journal of the R oyal Statistic al So ciety: Series B (Metho dolo gic al) , 58(1):267–288, 1996. 23. Arth ur E Ho erl and Rob ert W Kennard. Ridge regression: Biased estimation for nonorthogonal problems. T e chnometrics , 12(1):55–67, 1970. 24. Jonathan W Stallrich, Kade Y oung, Maria L W eese, Byran J Smuc ker, and Da vid J Edw ards. An optimal design framew ork for lasso sign reco v ery . Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , page qk af026, 2025. 25. F rederick KH Phoa, Y u-Hui Pan, and Hongquan Xu. Analysis of sup ersaturated designs via the dantzig selector. Journal of Statistic al Planning and Infer enc e , 139(7):2362– 2372, 2009. 26. Maria L W eese, Jonathan W Stallric h, Byran J Smuc ker, and Da vid J Edwards. Strate- gies for sup ersaturated screening: Group orthogonal and constrained v ar(s) designs. T e chnometrics , 63(4):443–455, 2021. 27. W.W. Li and C. F. J. W u. Columnwise-pairwise alogrithims with applications to the construction of sup ersaturated designs. T e chnometrics , 1997. 36 Supplemen tary Do cumen t CRo wS vs. Orthogonal P o oling Here we rep ort results of CRo wS vs. Orthogonal Pooling (Figure S1, where CRowS is analyzed using the Gauss-Lasso with τ λ = max( ˆ β λ ) while the Orthogonal P o oling design is analyzed using a traditional, non-statistical mo deling metho d described b elow. k: 640 k: 1280 1 2 3 4 1 2 3 4 0.00 0.25 0.50 0.75 1.00 Eff ect Size T rue P ositive Rate k: 640 k: 1280 1 2 3 4 1 2 3 4 0.0010 0.0015 0.0020 0.0025 Eff ect Size F alse P ositive Rate Method CrowS OP Figure S1: Comparison of FPR and TPR b etw een CRowS and Orthogonal Pooling designs using the λ -sp ecific Gauss-Lasso with τ λ = max( ˆ β λ ) primary thresholding criterion for CRo wS. F or orthogonal p o oling designs, the data was generated in a manner identical to our primary simulations, with effect sizes β = (1 , 2 , 3 , 4) and error σ 2 = 1. The simulations w ere carried out with 10,000 iterations, and, at eac h iteration, a comp ound was designated a hit if the v alues of b oth observed wells ( y i , y j ) in whic h the comp ound was present w ere greater than the 95 th p ercen tile ( y 0 . 95 ) of all observed well v alues. 37 Pro of-of-concept Screening Results for ( n = 320 , k = 1280 , c = 8) Here we provide a plot (Figure S2) of the results of the ( n = 320 , k = 1280 , c = 8) proof- of-concept screen. Though there is some visual evidence that the screen identified the pseudo-hit and true hit, we w ere unable to v erify this using the Gauss-Lasso mo deling. 0.1 0.2 0.3 0.4 0.5 0.6 WT Controls WT P ools MUT P ools MUT Controls Gro wth (OD600) W ell T ype Chloramphenicol Present Carbenicillin Present Neither Present No Compounds Figure S2: Represen tation of the amoun t of cell gro wth for 32 WT Con trols, the 320 WT p o oled wells, the 320 MUT p o oled wells, and the 32 MUT Con trols. The p o oled w ells studied 1,280 compounds. The wells are colored b y whether they included the pseudo-hit (Carb enicillin) or the true hit (Chloramphenicol). F or the Con trols, the Carb enicillin wells included only Carb enicillin; for the p o oled w ells, each w ell included 8 differen t comp ounds. Screen Results In Figure S3 we provide b oxplots from the small screen describ ed in Section 5.2 in the pap er. 38 MUT WT 0.3 0.4 0.5 0.6 0.7 V alue T ype Boxplots f or PLING1 MUT WT 0.3 0.4 0.5 0.6 V alue T ype Boxplots f or PLING2 MUT WT 0.3 0.4 0.5 0.6 0.7 V alue T ype Boxplots f or PLING3 MUT WT 0.2 0.3 0.4 0.5 V alue T ype Boxplots f or PLING4 MUT WT 0.1 0.2 0.3 0.4 0.5 V alue T ype Boxplots f or PLING5 MUT WT 0.2 0.3 0.4 0.5 V alue T ype Boxplots f or PLING6 MUT WT 0.1 0.2 0.3 0.4 0.5 0.6 V alue T ype Boxplots f or PLING7 MUT WT 0.1 0.2 0.3 0.4 0.5 V alue T ype Boxplots f or PLING8 MUT WT 0.2 0.3 0.4 0.5 0.6 V alue T ype Boxplots f or PLING9 MUT WT 0.1 0.2 0.3 0.4 0.5 V alue T ype Boxplots f or PLING10 MUT WT 0.2 0.3 0.4 0.5 V alue T ype Boxplots f or PLING11 MUT WT 0.1 0.2 0.3 0.4 0.5 0.6 V alue T ype Boxplots f or PLING12 MUT WT 0.2 0.3 0.4 0.5 0.6 V alue T ype Boxplots f or PLING13 MUT WT 0.2 0.4 0.6 V alue T ype Boxplots f or PLING14 MUT WT 0.4 0.8 1.2 1.6 V alue T ype Boxplots f or PLING15 MUT WT 0.2 0.4 0.6 V alue T ype Boxplots f or PLING16 Figure S3: Raw data from the screen described in Section 5.2 of the do cument. 39

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment