Randomized Work-Competitive Scheduling for Cooperative Computing on $k$-partite Task Graphs

Randomized W ork-Competitiv e Schedul ing for Cooperati ve Computing on k -partite T ask Graphs Chadi Kari chadi@eng r .uconn.edu Alexander R ussell acr@cse.ucon n.edu Narasimha Shashidhar kar po or@cse.ucon n.edu Department of Computer Science and Engineering Univ ersity of Connecticut, Storrs, CT 1 Intr oduction A f undamental problem in distributed comp uting is the p roblem of coo perati vely executing a giv en set o f tasks in a dynam ic setting. The challen ge is to mini- mize the tota l work done an d to maintain efﬁciency in the face of dynam ically changin g pro cess or co nnecti v- ity . In th is setting, work is deﬁned as th e to tal nu mber of tasks performe d (coun ting multiplicities) b y all the processors during the course of the computation . In this scenar io, we are given a set of t tasks that m ust be c ompleted in a distributed setting b y a set of p p roces- sors where the commu nication medium is s ubject to fail- ures. W e assume that the t tasks are similar, in that they require the sam e nu mber of c omputation step s to ﬁnish execution. W e fu rther assume that the tasks are i dempo - tent - executing a task mu ltiple times has the same effect as a sing le execution of th e task. T he task s have a de- penden c y relationship deﬁned amo ng them cap tured by a task depend enc y grap h. The dyn amics of th e co mmunication med ium deter- mine a pro cessor’ s ability to commun icate with other processors. Effecti vely , th is partitions the proc es sors into group s . Processors that can commun i cate with each o ther are said to belong to the same g roup. No commun ication is possible be t ween processors in dif- ferent g roups. E ach processor of a g roup is aware of all the tasks comp leted by the memb ers o f th e grou p. The dynamic changes in the communication mediu m lea ds to a reconﬁgur ation, i.e. a new partition o f p rocessors into group s. This new group of processors share knowl- edge of all the task s that have been completed among them so far and then pro ceed to c ontinue executing the remaining tasks from their po ol of incomplete tasks until the next reconﬁguratio n. This processor g roup reco nﬁguration and task e xecu- tion may be treated as if they were d etermined b y an adversary . T hus, the adversary in our mod el perfo rms two basic operations: reco nﬁgures the pro cessors into group s and also allocates the work quota for each group of processors befo re the n e xt reconﬁguratio n. The work quota is the nu mber of tasks that can be completed by the group befor e the next recon ﬁ guration takes place. While the adversary con trols the numb er of tasks that a gr oup can perf orm, he d oes not dictate which tasks (the iden- tity of the tasks) the group can perform. In this setting, the tasks have d ependencies deﬁned among th em cap tured by a directed acyclic task g raph ( t -D A G) which is a k -partite task gr aph. Gi ven a gr oup of proce s sors and the tasks known to be c ompleted by them, an algo rithm in th is setting decides on the n e xt incomplete task to be com pleted by this gro up. Each processor continue s to execute tasks fr om the g i ven set of t tasks un til it is aware that all tasks have been co m- pleted o r r uns o ut of it’ s allo cated work limit. Hence , giv en p proce s sors and t tasks, any alg orithm must exe- cute at least Ω ( t · p ) tasks in the scen ario whe re all th e processors are d is connected for the entire co mputation while any reasonab le algor i thm would only incur O ( t ) work in th e completely connected case. He nce, we treat this problem in an on-line setting and pursue compet- iti ve analysis where the performa nce o f ou r algo rithm is comp ared against th at of the omniscient ofﬂine algo- rithm which has complete knowledge of all the future changes to the communicatio n m edium. Our setting is a generalizatio n of the p roblem in [3, 2] since th e tasks are no lo nger ind ependent but have d ependencies de- ﬁned among th em. W e show that fo r th is setting mo re pessimistic boun ds ho ld. 1 2 Our Results Georgiou, Russell, an d Shvartsman [2] p erformed competitive analy s is and showed a simple randomized scheduling algorith m RS (Random Select) whose competitive ratio is tight. Their work also introd uced a notion of computa tion width , wh ich ass ociates a natu ral number with a history of cha nges in the commu nication medium, and shows bo th upper and lower bou nds on competitiveness in term s of this quantity . Speciﬁcally , they showed that their simple ran domized scheduling algorithm obtains the co mpetiti ve r atio (1 + cw / e ) , where cw is th e compu tation width of the compu tation pattern determined by the d ynamics of th e commu ni- cation medium . W e follow on the work done in [2]. W e stud y a natu ral generalization of th e problem wher e the tasks to be com pleted are not indepe ndent o f each other b ut have a k -p artite depen dency re l ationship deﬁned amon g the m. Each par titi on of the vertices (tasks) o f th e k -partite task gra ph is said to belong to a lev el. Independ ent task s b elong to the ﬁr s t level, ta s ks depend ent o n the ﬁrst level tasks ar e at the second level and so on. The k -partite task grap hs that we c onsider in o ur p roblem are a special kind o f task graphs where ev ery task at level l i +1 is d ependent on ev ery task at lev el l i , i = 1 , . . . , k − 1 (i.e, com plete set of directed edges fro m level l i to level l i +1 , i = 1 , . . . , k − 1 ). W e present a simple ra ndomized algorithm for p pro cessors cooper ating to perform t k no wn tasks where th e depen- dencies be tween th em are deﬁned by a k -partite task depend enc y g raph with processors subject to a dynam ic commun ication med ium. W e pursue compe ti tiv e analy- sis and show t hat pessimistic bound s hold in this case. Our algorithm Modiﬁed- R S extends the algorithm Rando m Selec t ( RS ) presen t ed in [2]. Modiﬁed- RS is a simple randomized schedu ling algorithm who s e co m- petitiv e ratio de pends on the compu tation width [2 ] and the natur e of depen dencies amo ng the tasks captu red by the task graph . W e show in section 4.1 .1 that algorith m Modiﬁed- R S is  1 + cw  1 − α + α e 1 − α α c +1  - competitive for any computation al ( p, t ) -DA G an d for a 2-level task t -DA G wher e, cw is the computation width of the c omputational pattern , α ∈ (0 , 1] de notes the fractio n o f tasks in the ﬁrst level l 1 and c = 1 1 e + o (1) . This competitive ratio matches the lower bound we show in section 4 and the refore is tight. W e th en exten d our analysis to any k -le vel task t -D A G. W e show that Modiﬁed- R S is  1 + cw  (1 − α 1 ) + α 1 e α k α 1 c a k + a k  - competitive for any comp utational p attern and for any k -le vel task t -DA G where, α i ∈ (0 , 1] and c = 1 1 e +1 and where a i , i = 1 ..k is a sequ ence deﬁned as follows, a 1 = 1 , a i +1 = α i α 1 c a i + a i . Here, α i ∈ (0 , 1] is the fraction of tasks a t level l i , i = 1 , . . . , k . cw stand s for the com putation width of th e compu tational pattern and c i > 0 . W e also show that this result is tight as it ma t ches the lowerbound we show before. When all the tasks given are in dependent i.e. the task t -D A G has only one level ( α = 1 ) the co mpetiti ve ratio collapses to (1 + cw /e ) , the bound offered by [2 ]. Hence, our results subsume the results of [2]. 3 Model and Deﬁnitions The problem is deﬁned in terms o f p asynchr onous processors and t tasks with u nique id entiﬁers, initially known to a l l pr ocessors. For o ur pu rposes the tasks are idempoten t an d similar, i.e., each task requires the same number of computatio n steps. Deﬁnition 1. A t -DA G is a dir ected acyclic k -partite graph G = ( V , E ) , wher e V = ˙ S k l =1 V l = [ t ] = { 1 . . . t } . Ed g e e = ( t l i , t l +1 j ) ∈ E , l = 1 , . . . , k − 1 , i 6 = j if and only if task t l +1 j depend s on task t l i . W e write t l i < t l +1 j if task t l +1 j depend s on task t l i . Her e , ˙ S stands for disjoint union. W e only consider task g raphs where a task on lev el l i +1 depend s on all tasks o f level l i . The comp utation pat- tern i.e., th e co mputational ( p, t ) - D AG deﬁned b elo w captures the behavior of the adversary that determin es both the partitioning an d the numbe r o f tasks allo cated to each group of the partition. Deﬁnition 2 . A comp utational ( p, t ) -DA G is a directed acyclic graph C = ( V , E ) au gmented with a weight function h : V → [ t ] ∪ { 0 } an d a la beling g : V → 2 [ p ] \ {∅} so that: 1) F o r any maxima l path P = ( v 1 , v 2 , . . . , v k ) in C , P k i =1 h ( v i ) ≥ t . (This guaran- tees that an y algorithm terminates durin g the co mputa- tion described by the D A G.) 2) g p ossess es the following “initial con ditions”: [ p ] = S . v : in ( v )=0 g ( v ) . 3) g r espects the fo l lowing “conservation law”: Ther e is a fu nction φ : E → 2 [ p ] \{∅} so tha t fo r each v ∈ V with in ( v ) > 0 , g ( v ) = S . ( u,v ) ∈ E φ (( u, v )) , an d for each v ∈ V with out ( v ) > 0 , g ( v ) = S . ( v, u ) ∈ E φ (( v, u )) . In the ab o ve deﬁnitio n, i n ( v ) and out ( v ) den ote the in-degree an d out- de gree of v respectiv ely . Finally , for the two vertices u, v ∈ V , we write u ≤ v if ther e is a 2 directed path fro m u to v ; we then write u < v if u ≤ v and u and v are distinct. Deﬁnition 3 . Given a comp utational D A G C = ( V , E ) and a verte x v ∈ V , we deﬁne the p redecessor g r aph at v , den oted P C ( v ) , to be the subgraph of C that is formed b y th e union of all paths in C terminating at v . Likew ise, the successor g r aph at v , deno ted S C ( v ) , is the sub gr aph of C tha t is formed by the union of a ll the paths in C o ri ginating at v . Associated with any d irected acyclic grap h (DA G) C = ( V , E ) is the natur al vertex poset ( V , ≤ ) where u ≤ v if and on ly if there is a directed path f rom u to v . Then the width of C , deno ted w ( C ) , is the width of the poset ( V , ≤ ) . Deﬁnition 4. The com putation width o f a comp uta- tional D A G C = ( V , E ) , den oted cw ( C ) , is deﬁned as cw ( C ) = max v ∈ V w ( S ( v )) . Let OPT deno te the optim al (off-line) algorithm. W OPT ( C ) and W R ( C ) is th e work done b y the o ptimal algorithm and a ran domized algorithm R . W e treat ran- domized algorithm s as distributions over deterministic algorithm s ; for a set Ω and a family of determ i nistic al- gorithms { D r | r ∈ Ω } we let R = R ( { D r | r ∈ Ω } ) denote the random ized algor ithm w here r is selected unifor mly at ran dom from Ω and sched uling is do ne ac- cording to D r . For a real-valued ran dom variable X , we let E [ X ] d enote its expected value. W e let OPT deno te the o ptimal (off-line) algorithm . Speciﬁcally , for ea ch C we deﬁne W OPT ( C ) = min D W D ( C ) . 4 Lower bounds and Al gorithm Mod iﬁed- RS In this section we giv e a lo wer bou nd on our prob - lem for 2 -level task graph s and we presen t the algorithm Modiﬁed- R S . W e then show that for 2 -level task graphs the competitive ratio of Mod iﬁ ed- RS is tight. Theorem 1. Let A be a scheduling algorithm for 2 -level task graphs, α be the fraction of tasks at level l 1 . Then , W A ≥  1 + cw  (1 − α ) + α e 1 − α α e +1  W OP T Pr oof. Con s ider the 2 level task t -D A G where G 1 is the set of tasks at level l 1 and G 2 is the set tasks at lev el l 2 and the compu tation p attern described as fol- lows . Initially , the compu tati on pattern has w group s each consisting o f a sing le processor . Let t > > w an d t mo d w = 0 . Each processor com pletes αt w tasks bef ore they are merged into a single group g ( S ) and allo wed to exchange infor mation about comp l eted tasks bef ore be- ing split again into w proce s sors where eac h p rocessor is allo wed to complete (1 − α ) t w tasks, at this po int th e y are m er ged again into a single g roup g ( U ) and then split into w p rocessors. For th is com putation pattern the o p- timal off-line algor i thm completes all the t task s at the formation of the g roup g ( U ) an d accru es exactly t work . Let P i ⊂ G 1 denote the set o f αt w tasks for proc es sor i . W e an alyze A when the tuple P = ( P 1 , . . . , P w ) is se- lected unif ormly at rand om among all such tup l es. W e will sh o w that f or any algo rithm A there is a conﬁgu ra- tion of the P i such that W A ≥  1 + (1 − o (1 )) cw  (1 − α ) + α e 1 − α α e +1  t Due to space restrictions we only give a sketch of the proof and we omit the details. W e refer the reader to [5] for all th e de tail s. W e ﬁr s t show that E [ | L S | ] ≥ αt  1 − 1 w  w Where L S is the random v ariable whose value is the number of tasks of G 1 left undo ne at the formation o f gro up g ( S ) .W e th en proceed b y bou nding the actual n umber of tasks left und one T using Azuma’ s inequality and we show t hat E [ | L U | ] ≥ (1 − o (1)) αt e 1 e 1 − α α e (1 − o (1)) where L U is the rando m variable whose value is the number of tasks of G 1 left undone at the fo rmation of gro up g ( U ) . In par ticular we show there mu s t exist selectio n of the P i which ach ie v es this bo und. Note th at after g ( U ) the p rocessors are split again in to w pro cessors where they will complete the remainin g (1 − o (1)) αt e 1 e 1 − α α e (1 − o (1)) tasks of G 1 and the (1 − α ) t tasks of G 2 . This will give us th e desired result. Note that when the tasks are ind ependent ( α = 1 ) the lo wer bound i s 1 + (1 − o (1)) cw e which matches the result o f [2] but the lower boun d gets mo re pessimistic as the fraction of indepen dent tasks ge ts s maller . 4.1 Description and Analys is of Mo diﬁed- RS In the fo llo wing l ( t ) = i, i = 1 . . . k deno tes that task t belongs to level l i . W e are now ready to deﬁne Modiﬁed- R S (m- R S ) where a processor with knowl- edge that tasks in a set K ⊂ V ha ve been completed chooses the next task τ to be comp leted at rando m from 3 V \ K if and o nly if ∀ t ∈ V \ K , l ( τ ) ≤ l ( t ) . In the fol- lowing we analy ze the comp etit iv e ratio of Modiﬁed- RS and we show it’ s tight by obtaining th e uppe r boun d of the work per formed by o ur algorithm o n any comp uta- tion pattern ( p, t ) -D A G and a 2 -level task t - D AG which matches the lower bound o f the pr e vious section . 4.1.1 Upper Bound f or m- RS on a 2 -level task DA G Theorem 2. Algorithm Modiﬁ ed- RS is  1 + cw  1 − α + α e 1 − α α c +1  -competitive for any computatio nal ( p, t ) -DA G a nd for a 2-level task t -D A G. Her e, cw stand s for the co mputation width of the computatio nal ( p, t ) -DA G, α ∈ (0 , 1 ] ( α is the fraction of tasks at level l 1 ) and c = 1 1 e + o (1) . Pr oof. Du e to space co nstraint we g i ve an overvie w of the pro of, we r efer the read er to [5] f or fu l l details. W e say a vertex v in unsatur ated if P u

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment