Hadoop Performance Models

Hado op P erformance Mo dels Hero dotos Hero dotou hero @cs.duke .edu T ec hnica l Rep ort, CS-201 1-05 Comput er Science Depart men t Duk e Uni v ersity Abstract Hado op MapReduce is now a p opular choice for p erforming large-sca le data a nalytics. This techn ical rep ort describ es a detailed set of mathematical p erformance mo dels for de s cribing the execution of a MapReduce job on Hado op. The mo d e ls describ e data ﬂow and cost information at the ﬁne g ranularity of phases within the map and reduce tasks of a job execution. The mo dels can b e us e d to estimate the p erforma nce of MapReduce jobs as well as to ﬁnd the optimal conﬁgur ation settings to use when running the jobs. The execution of a MapRedu ce job is b rok en down into map tasks and red uce tasks. Subsequ ently , map task execution is divided into the p hases: R e ad (reading map inputs), M ap (map fun ction pro- cessing), Col le ct (serializing to buﬀer and p artitioning), Spil l (sorting, com binin g, compressin g, and writing map o u tputs to local disk), and Mer ge (merging sorted spill ﬁles). Reduce task execution is d ivid ed int o the ph ases: Shuﬄe (transferring map outputs to reduce tasks, w ith decompression if needed), M er ge (merging sorted map outputs), R e duc e (reduce fu nction pro cessing), and Write (writing redu ce outputs to the d istributed ﬁle-system). E ach phase r ep resen ts an imp ortant part of the job’s o verall execution in Hado op. W e ha ve deve lop ed p erform ance mo dels for eac h task p hase, whic h are then combined to form the o v erall Map-Reduce Job mo del. 1 Mo d el P arameters The p erforman ce mo dels rely on a set of parameters to estimate the cost of a Map-Reduce job. W e separate the parameters int o three catego r ies: 1. Hado op Par ameters : A set of Hado op-d eﬁ ned conﬁ guration parameters that eﬀect the execu- tion of a job 2. Pr oﬁle Statistics : A set of statistic s sp ecifying p rop erties of the in p ut data and the user- deﬁned functions (Map, Reduce, Com b ine) 3. Pr oﬁle Cost F actors : A set of parameters that deﬁne the I/O, C PU, and net wo r k cost of a job execution 1 V ariable Hado op Parameter Default V alue Eﬀect pNumNodes Num b er of Nodes System pT askMem mapred.child.ja v a.opts -Xmx200m System pMaxMapsPerNo de mapred.tasktra ck er.map.tas ks.max 2 System pMaxRedPerNode mapred.tasktra ck er.re duce.tasks.max 2 System pNumMapper s mapred.map.tasks Job pSortMB io.sort.mb 100 MB Job pSpillPerc io.sort.spill.p er cent 0.8 Job pSortRecPerc io.sort.r e cord.p ercent 0.05 Job pSortF actor io.sort.facto r 10 Job pNumSpillsF o rComb min.n um.spills.for.combine 3 Job pNumReducers mapred.reduce.tas ks Job pInMemMergeThr mapred.inmem.merge.thre s hold 1000 Job pSh uﬄeInBufPerc mapred.job.shuﬄe.input.buﬀer.percent 0.7 Job pSh uﬄeMer gePerc mapred.job.shuﬄe.merge.p e rcent 0.66 Job pReducerInBufPerc mapred.job.reduce.input.buﬀer.p erce nt 0 Job pUseCombine mapred.combine.class or mapre duce.combine.class nu ll Job pIsIntermCompressed mapred.compres s.map.output false Job pIsOutCompresse d mapred.output.compress false Job pReduceSlowstart mapred.reduce.slowstart.completed.ma ps 0.05 Job pIsInCompressed Whether the input is compres s ed or no t Input pSplitSize The size of the input split Input T able 1: V ariables for Hado op P arameters T able 1 d eﬁnes the v ariables that are asso ciated with Hadoop parameters. T able 2 d eﬁnes the necessary proﬁle statistics sp eciﬁc to a job and the data it is p ro cessing. V ariable Description sInputPairWidth The av er age width of the input K-V pairs sMapSizeSel The selectivity o f the mapp er in terms of size sMapPairsSel The selectivity o f the mapp er in terms of num b er of K-V pair s sReduceSizeSel The selectivity o f the re ducer in terms of size sReducePairsSel The selectivity o f the re ducer in terms of num b er of K - V pairs sCombineSizeSel The selectivity o f the co mbine function in terms of s iz e sCombineP air sSel The selectivity o f the co mbine function in num b er of K- V pairs sInputCompressRatio The ratio of compre s sion fo r the input data sIntermCompressRatio The ratio of compress ion for the in termedia te map output sOutCompressRa tio The ratio of compre s sion fo r the ﬁnal output of the job T able 2: V ariables for Proﬁle Statistics T able 3 deﬁ nes system sp eciﬁc p arameters needed for calculating I/O, C PU, and net work costs. The IO costs and CPU costs r elated to compression are deﬁned in terms of time p er byt e. The rest CPU costs are deﬁned in terms of time p er K-V pair. The netw ork cost is deﬁned in terms of transferring time p er byte . 2 V ariable Description cHdfsReadCost The cost for rea ding from HDFS cHdfsW r iteCost The cost for writing to HDFS cLo calIOCo st The cost for p erforming I/ O fro m the lo cal disk cNet workCos t The netw or k tr a nsferring cost cMapCPUCost The CPU cost for executing the map function cReduceCPUCost The CPU cost for executing the reduce function cCombineCPUCost The CPU cost for executing the combine function cPartitionCPUCost The CPU cost for partitioning cSerdeCPUCost The CPU cost for seria lization cSortCPUCost The CPU cos t for so rting on keys cMergeCPUCo st The CPU cost for merging cInUncomprCPUCost The CPU cost for uncompress ing the input data cInt e rmUncomprCPUCos t The CPU c o st for uncompressing the intermediate data cInt e rmComprCPUCo s t The CPU cost for compres s ing the in termedia te data cOutComprCPUCos t The CPU cost for compres s ing the output data T able 3: V ariables for Proﬁle Cost F actors Let’s deﬁne the iden tity function I as: I ( x ) = ( 1 if x exists or equals true 0 otherwise (1) Initializations: In an eﬀort present concise formulas and av oid the use of conditionals as m u c h as p ossible, w e mak e the follo wing initializatio n s: If ( pUseCombine == F ALSE) sCombineSizeSel = 1 sCombinePairsSel = 1 cCombineCPUCost = 0 If ( pIsInComp r esse d == F ALSE) sInputCompr essR atio = 1 cInUnc omprCPUCost = 0 If ( pIsIntermCom pr esse d == F ALSE) sIntermCompr essR atio = 1 cIntermUnc omprCPUCost = 0 cIntermComprCPUCost = 0 If ( pIsOutCompr esse d == F ALSE) sOutCompr essR atio = 1 cOutComprCPUCost = 0 3 2 P erformance Mo dels for the Map T ask Phases The Map T ask execution is divided in to ﬁve p hases: 1. R e ad : Reading the inp ut s plit and creating the k ey-v alue p airs. 2. Map : Executing the user-provided map fu nction. 3. Col le ct : Collecting the map output into a bu ﬀer and p artitioning. 4. Spil l : S orting, using the combiner if any , p erforming compr ession if asked, and ﬁ nally spilling to disk, creating ﬁle spil ls . 5. Mer ge : Merging the ﬁ le spills in to a single map outpu t ﬁle. Merging migh t b e p erform ed in m ultiple r ounds. 2.1 Mo deling the Read and Map Phases During this ph ase, the inpu t split is read, uncompressed if necessary , the k ey-v alue pairs are created, and passed an input to the user-deﬁned map function. inputMapSize = pSplitSize sInputCompr essR atio (2) inputMapPairs = inputMapSize sInputPairWidth (3) The costs of this phase are: IOCost R e ad = pSplitSize × cHdfsR e adCost CPUCost R e ad = pSplitSize × cInUnc omprCPUCost + inp u tMapPairs × cMapCPUCost (4) If the MR job consists only of mapp ers (i.e. pNumR e duc ers = 0 ), then the spilling and merging phases will not b e executed and the map output will b e written directly to HDFS. outMapSize = inputMapSize × sMapSizeSel (5) IOCost MapWrite = outMapSize × sOutCompr essR atio × cHdfsWriteCost (6) CPUCost MapWrite = outMapSize × cOutComprCPUCost (7) 4 2.2 Mo deling the Collect and Spill Phases The map fun ction generates output k ey-v alue (K-V) pairs that are placed in the map-side memory buﬀer. The form ulas regarding the map output are: outMapSize = inputMapSize × sMapSizeSel (8) outMapPairs = inputMapPairs × sMapPairsSel (9) outPairWidth = outMapSize outMapPairs (10) The memory buﬀer is split into tw o parts: the serialization part that stores the k ey-v alue p airs, and the ac c ounting part that stores metadata p er pair. When either of these t wo parts ﬁ lls up (based on the threshold v alue pSpil lPer c ), the pairs are partitioned, sorted, and spilled to disk. The maxim um num b er of pairs for the ser ialization buﬀer is: maxSerPairs =  pSortMB × 2 20 × ( 1 − pSortR e cPer c ) × pSpil lPe r c outPairWidth  (11) The maxim um num b er of pairs for the accounti n g bu ﬀer is: maxA c cPairs =  pSortMB × 2 20 × pSortR e c P er c × pSpil lPer c 16  (12) Hence, the n u m b er of pairs and s ize of the buﬀer b efore a spill will b e: spil lBuﬀerPairs = M in { maxSerPairs , max A c cPairs , outMapPairs } (13) spil lBuﬀerSize = spil lBuﬀerPairs × outPairWidth (14) The o v er all num b er of spills will b e: numSpil ls =  outMapPairs spil lBuﬀerPairs  (15) The n umb er of pairs and size of eac h spill dep ends on th e width of eac h K-V pair, the use of the com bine function, and the u se of int erm ediate data compr ession. Note that sIntermCompr essR atio is set to 1 by d efault, if inte r mediate compr ession is disabled. Note that sCombinePairsSel an d sCombinePairsSel are set to 1 by d efault, if no com bine fun ction is used. spil lFilePairs = spil lBuﬀerPairs × sCombinePairsSel (16) spil lFileSize = spil lBuﬀerSize × sCombineSizeSel × sIntermCompr essR atio (17) 5 The costs of this phase are: IOCost Spil l = numSpil ls × spil lFileSize × cL o c alIOCost (18) C P U C ost S p ill = numSpil ls × [ spil lBuﬀe rPairs × cPartitionCPUCost + spil lBuﬀerPairs × cSer deCPUCost + spil lBuﬀerPairs × log 2 ( spil lBuﬀerPairs pNumR e duc ers ) × cSortCPUCost + spil lBuﬀerPairs × cCombineCPU Cost + spil lBuﬀerSize × sCombineSizeSel × cIntermComprCPUCost ] (19) 2.3 Mo deling the Merge Phase The goal of the merge phase is to merge all the spill ﬁles into a single output ﬁle, whic h is written to lo cal disk. The merge ph ase will o ccur only if more that one spill ﬁle is created. Multiple merge passes migh t o ccur, dep ending on the pSortF actor parameter. W e deﬁne a mer ge p ass to b e the merging of at most pSortF actor spill ﬁles. W e d eﬁne a mer ge r ound to b e one or more merge passes that merge only spills p ro du ced b y the spill p hase or a previous merge round . F or example, sup p ose numSpil ls = 30 and pSortF actor = 10 . T hen, 3 merge passes will b e p erform ed to create 3 new ﬁles. This is the ﬁ rst m erge round. Th en, th e 3 new ﬁ les will b e m erged together forming the 2nd and ﬁnal merge round. The ﬁ nal merge pass is unique in the sense that if the num b er of spills to b e merged is greater than or equal to pNumSpil lsF orComb , the combiner will b e used again. Hence, we treat the int er m ediate merge r ounds and the ﬁ nal merge separately . F or the in termediate merge p asses, w e calculate how man y times (on av erage) a single spill will b e read. Note that the remaining section assumes numSpils ≤ pSortF actor 2 . In the opp osite case, we m u st use a sim ulation-based approac h in order to calculate the num b er of spills merged during the in - termediate merge round s as wel l as the total n u m b er of m erge passes. The ﬁrst merge pass is also u nique b ecause Hadoop will calculate the optimal num b er of sp ill ﬁles to merge so that all other merge passes will merge exactly pSortF actor ﬁ les. Since the Reduce task also con tains a similar Merge Phase, we deﬁne the follo wing three metho d s to reuse later: c alcNumSpil lsFirstPass ( N , F ) =      N , if N ≤ F F , if ( N − 1) MO D ( F − 1) = 0 ( N − 1) MOD ( F − 1) + 1 , otherwise (20) c alcNumSpil lsIntermMer ge ( N , F ) = ( 0 , if N ≤ F P +  N − P F  ∗ F , if N ≤ F 2 , wher e P = c alcNumSpil lsFirstPass ( N , F ) (21) 6 c alcNumSpil lsFinalMer ge ( N , F ) = ( N , if N ≤ F 1 +  N − P F  + ( N − S ) , if N ≤ F 2 , wher e P = c alcNumSpil lsFirstPass ( N , F ) , wher e S = c alcNumSpil lsIntermMer ge ( N , F ) (22) The n u m b er of sp ills read durin g the ﬁrst merge pass is: numSpil lsFirstPass = c alcNumSpil lsFirstPass ( numSpil ls , pSortF actor ) (23) The n u m b er of sp ills read durin g the int erm ediate merging is: numSpil lsIntermMer ge = c alcNumSpil lsIntermMer ge ( numSpil ls , pSortF actor ) (24) The total n u m b er of merge passes will b e: numMer g e Passes =        0 , if numS pil l s = 1 1 , if numS pil l s ≤ pS or tF actor 2 + j numS p ill s − numS pill sF ir stP ass pS or tF actor k , if numS pil l s ≤ pS or tF actor 2 (25) The num b er of sp ill ﬁles f or the ﬁnal mer ge round is (ﬁr s t pass + intermediate passes + remaining ﬁle spills): numSpil lsFinalMer ge = c alcNu mSpil lsFinalMer ge ( numSpil ls , pSortF actor ) (26) The total n u m b er of records spilled is: numR e cSpil le d = spil lFilePairs × [ numSpil ls + numSpil lsIntermMer ge + numSpil ls × sCombinePairsSel ] (27) The ﬁnal map output size and n u m b er of K -V pairs are: useCombInMer ge = ( numSpil ls > 1 ) AND ( pUseCombine ) AND ( numSpil lsFinalMer ge ≥ pNu mSpil lsF orComb ) (28) intermDataSize = numSpil ls × spil lFileSize × ( sC ombineS iz eS el if useC ombI nM er g e 1 otherwise (29) intermDataPairs = numSpil ls × spil lFilePairs × ( sCombinePairsSel if useCombInMer ge 1 otherwise (30) 7 The costs of this phase are: IOCost Mer ge = 2 × numSpil lsIntermMer ge × spil lFileSize × cL o c alIOCost // interm merges + numSpil ls × spil lFileSize × cL o c alIOCost // read ﬁnal merge + intermDataSize × cL o c alIOCost // write ﬁnal merge (31) CPUCost Mer ge = numSpil lsIntermMer ge × [ spil lFileSize × cIntermUnc omprCPUCost + spil lFilePairs × cM er ge CP UCost + spil lFileSize sIntermCompr essR atio × cIntermComprCPUCost ] + numSpil ls × [ spil lFileSize × cIntermUnc omprCPUCost + spil lFilePairs × cM er ge CP UCost + spil lFilePairs × cCombineCPU Cost ] + intermDataSize sIntermCompr essR atio × cIntermComprCPUCost (32) 2.4 Mo deling the Overal l Map T ask The ab o ve mo dels corresp ond to the execution of a single map task. The o v erall costs for a single map task are: IOCost Map = ( I O C ost Read + I O C os t M apW r ite if pN umReducer s = 0 I O C ost Read + I O C os t S p ill + I O C os t M er g e if pN umReducer s > 0 (33) CPUCost Map = ( C P U C ost Read + C P U C ost M apW r ite if pN umReducer s = 0 C P U C ost Read + C P U C ost S p ill + C P U C ost M er g e if pN umReducer s > 0 (34) 3 P erformance Mo dels for the Reduce T ask Phases The Reduce T ask is divid ed in to four p h ases: 1. Shuﬄe : Copying the map ou tp ut from the mapp er n o des to a reducer’s no d e and d ecom- pressing, if needed. P artial merging m ay also o ccur during this phase. 2. Mer ge : Merging the sorted f ragmen ts from the diﬀerent mapp ers to form the input to the reduce function. 3. R e duc e : Executing the u ser-pro vid ed r educe function. 4. Write : W riting the (compressed) output to HDFS. 8 3.1 Mo deling the Sh uﬄe P hase The follo wing discussion r efers to the execution of a single redu ce task. In the S h u ﬄe phase, th e framew ork fetc hes the relev ant map outpu t partition from eac h m app er (called se gment ) and copies it to the redu cer’s no de. If th e map output is compressed, it will b e uncompressed. F or eac h map segmen t that reac h es the redu ce side w e hav e: se gmentComprSize = intermDataSize pNumR e duc ers (35) se gmentUnc omprSize = se gmentComprSize sIntermCompr essR atio (36) se gmentPairs = intermDataPairs pNumR e duc ers (37) where intermDataSize and intermDataPairs are the size and num b er of pairs pro du ced as in ter- mediate output b y a single mapp er (see Section 2.3). The data fetc hed to a sin gle red ucer will b e: totalShuﬄeSize = pNumMapp ers ∗ se gmentComprSize (38) totalShuﬄePairs = pNumMapp e rs ∗ se gmentPairs (39) As the data is copied to the reducer, they are placed in the shuﬄe buﬀer in memory with size: shuﬄeBuﬀerSize = pShuﬄeInBufPer c × pT askMem (40) When the in-memory buﬀer reac hes a threshold size or the num b er of segmen ts b ecomes greater than th e pInMemMer geThr , the segments are merged an d sp illed to disk creating a new lo cal ﬁle (called shuﬄeFile ). The merge size threshold is: mer geSizeThr = pShuﬄeMer gePer c × shuﬄeBuﬀerSize (41) Ho w ev er, when the segmen t s ize is greater that 25% of the shuﬄeBuﬀ erSize , the segment w ill go straigh t to disk instead of passing through memory (hence, no in-memory merging will o ccur). Case 1: se gmentUnc omprSize < 0 . 25 × shuﬄeBuﬀerSize numSe gInShuﬄeFile = mer geSizeThr se gmentUnc omprSize (42) If ( ⌈ numSe gInShuﬄeFile ⌉ × se gmentUnc omprSize ≤ shuﬄeBuﬀerSize ) numSe gInShuﬄeFile = ⌈ numSe gInShuﬄeFile ⌉ else numSe gInShuﬄeFile = ⌊ numSe gInShuﬄeFile ⌋ 9 If ( numSe gInShuﬄeFile > pInMemMer geThr ) numSe gInShuﬄeFile = pInMemMe r geThr (43) A sh uﬄ e ﬁle is the merging on numSe gInShuﬄeFile segments. If a com bine fu nction is sp eciﬁed, then it is applied d uring this merging. Note that if numSe gInShuﬄeFile > numM app ers , then merging will not happ en . shuﬄeFileSize = numSe gInShuﬄeFile × se gmentComprSize × sCombineSizeSel (44) shuﬄeFilePairs = numSe gInShuﬄeFile × se gmentPairs × sCombinePairsSel (45) numShuﬄeFiles =  pNumMapp ers numSe gInShuﬄeFile  (46) A t the end of the mer ging, some segmen ts might remain in memory . numSe gmentsInMem = pNu mMapp ers MOD nu mSe gInShuﬄeFile (47) Case 2: se gmentUnc omprSize ≥ 0 . 25 × shuﬄeBuﬀerSize numSe gInShuﬄeFile = 1 (48) shuﬄeFileSize = se gmentComprSize (49) shuﬄeFilePairs = se gmentPairs (50) numShuﬄeFiles = pNumMapp ers (51) numSe gmentsInMem = 0 (52) Either case w ill create a set of sh u ﬄe ﬁles on disk. When the num b er of shuﬄe ﬁles on d isk in creases ab o ve a certain threshold ( 2 × pSortF actor − 1 ), a new merge thread is triggered and pSortF actor sh u ﬄe ﬁ les are merged into a new larger sorted one. The Com b in er is not used du ring this disk merging. T he total n umb er of suc h merges are: numShuﬄeMer ges = ( 0 , if numS huf f l eF iles < 2 × pS or tF actor − 1 j numS h uf f l eF iles − 2 × p S or tF actor +1 pS or tF actor k + 1, otherwise (53) A t the end of the S h u ﬄe p h ase, a set of merged and unmerged sh uﬄ e ﬁles will exist on d isk. numMer g ShufFiles = numShuﬄeMer ges (54) 10 mer gShufFileSize = pSortF actor × shuﬄeFileSize (55) mer gShufFilePairs = pSortF actor × shuﬄeFilePairs (56) numUnmer gShufFiles = numShuﬄeFiles − pSortF actor × numShuﬄeMer ges (57) unmer gShufFileSize = shuﬄeFileSize (58) unmer gShufFilePairs = shuﬄeFilePairs (59) The cost of the Sh u ﬄing phase is: IOCost Shuﬄe = numShuﬄeFiles × shuﬄeFileSize × cL o c alIOCost + numMer gShufFiles × mer gShufFileSize × 2 × cL o c alIOCost (60) C P U C ost S h uf f l e = [ totalShuﬄeSize × cIntermUnc omprCPUCost + numShuﬄeFiles × shuﬄeFilePairs × cM er geCP UCost + numShuﬄeFiles × shuﬄeFilePairs × cCombineCPU Cost + numShuﬄeFiles × shuﬄeFileSize sIntermCompr essR atio × cIntermComprCPUCost ] × I ( se gmentUnc omprSize < 0 . 25 × shuﬄeBuﬀerSize ) + numMer gShufFiles × mer gShufFileSize × cIntermUnc omprCPUCost + numMer gShufFiles × mer gShufFilePairs × c M er ge CPUCost + numMer gShufFiles × mer gShufFileSize sIntermCompr essR atio × cIntermComprCPUCost (61) 3.2 Mo deling the Merge Phase After all the map outputs hav e b een su ccessful copied in memory and /or on disk, the sort- ing/merging phase b egins. This phase will merge all data in to a single stream that is fed to the reducer. Similar to the Map Merge phase (see Section 2.3), this phase may o ccur it multiple rounds, bu t durin g the ﬁnal merging, instead of creating a single output ﬁle, it will send the data directly to the reducer. The sh uﬄe phase pro du ced a set of merged and u nmerged shuﬄe ﬁles on disk, and p erhaps a set of segmen ts in memory . The merging is done in three steps. Step 1: Some segments migh t b e evicted f rom memory and merged int o a single shuﬄe ﬁle to satisfy the memory constrain t enforced b y pR e duc erInBufP er c . (This parameter sp eciﬁes th e amount of memory allo w ed to b e o ccup ied b y segment s b efore the reducer b egins.) 11 maxSe gmentBuﬀer = pR e duc e rInBuf Per c × pT askMem (62) currSe gmentBuﬀe r = numSe gmentsInMem × se gmentUnc omprSize (63) If ( currSe gmentBuﬀer > maxSe gmentBuﬀer ) numSe gmentsEvic te d =  currSe gmentBuﬀe r − maxSe gmentBuﬀer se gmentUnc omprSize  else numSe gmentsEvic te d = 0 (64) numSe gmentsR emainMem = numSe gmentsInMem − numSe gmentsEvicte d (65) The ab o ve merging will only o ccur if the num b er of existing shuﬄe ﬁ les on disk are less than the pSortF actor . If not, then the sh uﬄ e ﬁles w ould hav e to b e merged, and the in-memory segmen ts that are supp osed to b e evicted are left to b e merge with the sh u ﬄe ﬁles on disk. numFilesOnDisk = numMer gShufFiles + numUnmer gShufFiles (66) If ( numFilesOnDisk < pSortF actor ) numFilesF r omMem = 1 ﬁlesF r omMemSize = numSe gmentsEvicte d × se gmentComprSize ﬁlesF r omMemPairs = numSe gmentsEvi cte d × se gmentPairs step1Mer gingSize = ﬁlesF r omMemSize step1Mer gingPairs = ﬁlesF r omMemPairs else numFilesF r omMem = numSe g mentsEv i cte d ﬁlesF r omMemSize = se gmentComprSize ﬁlesF r omMemPairs = se gmentPairs step1Mer gingSize = 0 step1Mer gingPairs = 0 (67) ﬁlesT oMer geStep2 = numFilesOnDisk + numFilesF r omMem (68) Step 2: Any ﬁles on disk will go through a merging phase in m u ltiple rounds (similar to the pro cess in Section 2.3. Th is step will happ en only if numFilesOnDisk > 0 (whic h imp lies ﬁlesT oMer geStep2 > 0 ). The n u m b er of intermediate reads (and writes) are: intermMer geR e ads = c alcNumSpil lsIntermMer ge ( ﬁlesT oMer geStep2 , pSortF actor ) (69) 12 The main d iﬀerence from S ection 2.3 is that the m erged ﬁles hav e diﬀeren t sizes. W e accoun t for this b y attributing m er ging costs p rop ortionally . step2Mer gingSize = intermMer geR e ads ﬁlesT oMer geStep2 × [ numMer gShufFiles × mer gShufFileSize + numUnmer gShufFiles × unmer g ShufFileSize + numFilesF r omMem × ﬁlesF r omMemSize ] (70) step2Mer gingPairs = intermMer geR e ads ﬁlesT oMer geStep2 × [ numMer gShufFiles × mer gShufFilePairs + numUnmer gShufFiles × unmer gShufFilePairs + numFilesF r omMem × ﬁlesF r omMemPairs ] (71) ﬁlesR emainF r omStep2 = c alcNumSpil lsFinalMer ge ( ﬁlesT oMer geStep2 , pSortF actor ) (72) Step 3: All ﬁles on disk and in memory will go through merging. ﬁlesT oMer geStep3 = ﬁlesR emainF r omStep2 + numSe g mentsR emainMem (73) The pro cess is iden tical to step 2 ab o ve. intermMer geR e ads = c alcNumSpil lsIntermMer ge ( ﬁlesT oMer geStep3 , pSortF actor ) (74) step3Mer gingSize = intermMer geR e ads ﬁlesT oMer geStep3 × totalShuﬄeSize (75) step3Mer gingPairs = intermMer geR e ads ﬁlesT oMer geStep3 × totalShuﬄePairs (76) ﬁlesR emainF r omStep3 = c alcNumSpil lsFinalMer ge ( ﬁlesT oMer geStep3 , pSortF actor ) (77) totalMer gingSize = step1Mer gingSize + step2Mer gingSize + step3Mer gingSize (78) The cost of the Sorting phase is: IOCost Sort = totalMer gingSize × cL o c alIOCost (79) CPUCost Sort = totalMer gingSize × c Mer g eCPUCost [ totalMer gingSize sIntermCompr essR atio ] × cIntermComprCPUCost [ step2Mer gingSize + step3Mer gingSize ] × cIntermUnompr CP UCost (80) 13 3.3 Mo deling the Reduce and W rite Phases Finally , the u ser-pro vid ed redu ce fun ction will b e executed and the output will b e w ritten to HDFS. inR e duc eSize = numShuﬄeFiles × shuﬄeFileSize sIntermCompr essR atio + numSe gmentsInMem × se gmentComprSize sIntermCompr essR atio (81) inR e duc ePairs = numShuﬄeFiles × shuﬄeFilePairs + numSe g mentsInMe m × se gmentComprPairs (82) outR e duc eSi ze = inR e duc eSize × sR e duc eSize Se l (83) outR e duc eP airs = i nR e duc eP airs × sR e duc ePairsSel (84) The input to the reduce f unction resides in memory and/or in the sh uﬄe ﬁles pr o duced by the Shuﬄing and Sorting phases. inR e dSizeDiskSize = numM e r gShufFiles × mer gShufFileSize + numUnmer gShufFiles × unmer g ShufFileSize + numFilesF r omMem × ﬁlesF r omMemSize (85) The cost of the W rite p hase is: IOCost Write = inR e dSizeDiskSize × cL o c alIOCost + outR e duc eSize × sOutCompr essR atio × cHdfsWriteCost (86) CPUCost Write = inR e duc ePairs × cR e duc eCP UCost + inR e dSizeDiskSize × cIntermUnc ompCPUCost + outR e duc eSize × cOutComprCPUCost (87) 3.4 Mo deling the Overal l Reduce T ask The ab ov e mo d els corresp ond to the execution of a single reduce task. The ov erall costs for a sin gle reduce task, excluding net work transfers, are: IOCost R e duc e = IOCost Shuﬄe + IOCost Sort + IOCost Write (88) CPUCost R e duc e = CPUCost Shuﬄe + CPUCost Sort + CPUCost Write (89) 14 4 P erformance Mo dels for the Net w ork T ransfer During the shuﬄe phase, all th e data pro du ced by the map tasks is copied o ver to the no d es run ning the reduce tasks (except for the data that is lo cal). The ov erall data transf erred in the net wo r k is: netT r ansferSize = ﬁnalOutMapSize × pNu mMapp ers × pNumNo des − 1 pNumNo des (90) where ﬁnalOutMapSize is the size of d ata pro du ced by a single map tasks. The o v er all cost for transferrin g data o ver the net work is: NETCost Job = netT r ansferSize × networkCost (91) 5 P erformance Mo dels for the Map-Reduce Job The MapReduce j ob consists of sev eral m ap and reduce tasks executing in p arallel and in w a ves. There are t wo primary w a ys to estimating the total costs of the job: (i) simulate the task execution using a T ask Sc he duler Simulator , and (ii) calculate the exp ected total costs analytica lly . Sim u lation in volv es s cheduling and s imulating the execution of in dividual tasks on a virtual Cluster. T he cost for eac h task is calculated u sing the prop osed p erformance mo dels. The second approac h inv olv es using the follo wing analytica l costs: IOCost AllMaps = pNumMapp ers × IOCost Map pNumNo des × pM axMapsPerNo de (92) CPUCost AllMaps = pNumMapp ers × CPUCost Map pNumNo des × pMaxMapsPerNo de (93) IOCost AllR ed ucers = pNumR e duc ers × IOCost R e duc e pNumNo des × pMaxR e dPerNo de (94) CPUCost AllR ed ucers = pNumR e duc ers × CPU Cost R e duc e pNumNo des × pM axR e dPerNo de (95) The o v er all job cost is simply the sum of the costs from all the map and the reduce tasks. IOCost Job = ( I O C ost All M aps if pN umReducer s = 0 I O C ost All M aps + I O C ost All Reducer s if pN umReducer s > 0 (96) 15 CPUCost Job = ( C P U C ost All M aps if pN umReducer s = 0 C P U C ost All M aps + C P U C os t All Reducer s if pN umReducer s > 0 (97) With appropriate system parameters that allo w for equal comparisons among th e I/O, C PU, and net work costs, the o v erall cost is: Cost Job = IOCost Job + CPUCost Job + NETCost Job (98) 16

Hadoop Performance Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment