Hadoop Performance Models

Hadoop MapReduce is now a popular choice for performing large-scale data analytics. This technical report describes a detailed set of mathematical performance models for describing the execution of a MapReduce job on Hadoop. The models describe dataf…

Authors: Herodotos Herodotou

Hado op P erformance Mo dels Hero dotos Hero dotou hero @cs.duke .edu T ec hnica l Rep ort, CS-201 1-05 Comput er Science Depart men t Duk e Uni v ersity Abstract Hado op MapReduce is now a p opular choice for p erforming large-sca le data a nalytics. This techn ical rep ort describ es a detailed set of mathematical p erformance mo dels for de s cribing the execution of a MapReduce job on Hado op. The mo d e ls describ e data flow and cost information at the fine g ranularity of phases within the map and reduce tasks of a job execution. The mo dels can b e us e d to estimate the p erforma nce of MapReduce jobs as well as to find the optimal configur ation settings to use when running the jobs. The execution of a MapRedu ce job is b rok en down into map tasks and red uce tasks. Subsequ ently , map task execution is divided into the p hases: R e ad (reading map inputs), M ap (map fun ction pro- cessing), Col le ct (serializing to buffer and p artitioning), Spil l (sorting, com binin g, compressin g, and writing map o u tputs to local disk), and Mer ge (merging sorted spill files). Reduce task execution is d ivid ed int o the ph ases: Shuffle (transferring map outputs to reduce tasks, w ith decompression if needed), M er ge (merging sorted map outputs), R e duc e (reduce fu nction pro cessing), and Write (writing redu ce outputs to the d istributed file-system). E ach phase r ep resen ts an imp ortant part of the job’s o verall execution in Hado op. W e ha ve deve lop ed p erform ance mo dels for eac h task p hase, whic h are then combined to form the o v erall Map-Reduce Job mo del. 1 Mo d el P arameters The p erforman ce mo dels rely on a set of parameters to estimate the cost of a Map-Reduce job. W e separate the parameters int o three catego r ies: 1. Hado op Par ameters : A set of Hado op-d efi ned confi guration parameters that effect the execu- tion of a job 2. Pr ofile Statistics : A set of statistic s sp ecifying p rop erties of the in p ut data and the user- defined functions (Map, Reduce, Com b ine) 3. Pr ofile Cost F actors : A set of parameters that define the I/O, C PU, and net wo r k cost of a job execution 1 V ariable Hado op Parameter Default V alue Effect pNumNodes Num b er of Nodes System pT askMem mapred.child.ja v a.opts -Xmx200m System pMaxMapsPerNo de mapred.tasktra ck er.map.tas ks.max 2 System pMaxRedPerNode mapred.tasktra ck er.re duce.tasks.max 2 System pNumMapper s mapred.map.tasks Job pSortMB io.sort.mb 100 MB Job pSpillPerc io.sort.spill.p er cent 0.8 Job pSortRecPerc io.sort.r e cord.p ercent 0.05 Job pSortF actor io.sort.facto r 10 Job pNumSpillsF o rComb min.n um.spills.for.combine 3 Job pNumReducers mapred.reduce.tas ks Job pInMemMergeThr mapred.inmem.merge.thre s hold 1000 Job pSh uffleInBufPerc mapred.job.shuffle.input.buffer.percent 0.7 Job pSh uffleMer gePerc mapred.job.shuffle.merge.p e rcent 0.66 Job pReducerInBufPerc mapred.job.reduce.input.buffer.p erce nt 0 Job pUseCombine mapred.combine.class or mapre duce.combine.class nu ll Job pIsIntermCompressed mapred.compres s.map.output false Job pIsOutCompresse d mapred.output.compress false Job pReduceSlowstart mapred.reduce.slowstart.completed.ma ps 0.05 Job pIsInCompressed Whether the input is compres s ed or no t Input pSplitSize The size of the input split Input T able 1: V ariables for Hado op P arameters T able 1 d efines the v ariables that are asso ciated with Hadoop parameters. T able 2 d efines the necessary profile statistics sp ecific to a job and the data it is p ro cessing. V ariable Description sInputPairWidth The av er age width of the input K-V pairs sMapSizeSel The selectivity o f the mapp er in terms of size sMapPairsSel The selectivity o f the mapp er in terms of num b er of K-V pair s sReduceSizeSel The selectivity o f the re ducer in terms of size sReducePairsSel The selectivity o f the re ducer in terms of num b er of K - V pairs sCombineSizeSel The selectivity o f the co mbine function in terms of s iz e sCombineP air sSel The selectivity o f the co mbine function in num b er of K- V pairs sInputCompressRatio The ratio of compre s sion fo r the input data sIntermCompressRatio The ratio of compress ion for the in termedia te map output sOutCompressRa tio The ratio of compre s sion fo r the final output of the job T able 2: V ariables for Profile Statistics T able 3 defi nes system sp ecific p arameters needed for calculating I/O, C PU, and net work costs. The IO costs and CPU costs r elated to compression are defined in terms of time p er byt e. The rest CPU costs are defined in terms of time p er K-V pair. The netw ork cost is defined in terms of transferring time p er byte . 2 V ariable Description cHdfsReadCost The cost for rea ding from HDFS cHdfsW r iteCost The cost for writing to HDFS cLo calIOCo st The cost for p erforming I/ O fro m the lo cal disk cNet workCos t The netw or k tr a nsferring cost cMapCPUCost The CPU cost for executing the map function cReduceCPUCost The CPU cost for executing the reduce function cCombineCPUCost The CPU cost for executing the combine function cPartitionCPUCost The CPU cost for partitioning cSerdeCPUCost The CPU cost for seria lization cSortCPUCost The CPU cos t for so rting on keys cMergeCPUCo st The CPU cost for merging cInUncomprCPUCost The CPU cost for uncompress ing the input data cInt e rmUncomprCPUCos t The CPU c o st for uncompressing the intermediate data cInt e rmComprCPUCo s t The CPU cost for compres s ing the in termedia te data cOutComprCPUCos t The CPU cost for compres s ing the output data T able 3: V ariables for Profile Cost F actors Let’s define the iden tity function I as: I ( x ) = ( 1 if x exists or equals true 0 otherwise (1) Initializations: In an effort present concise formulas and av oid the use of conditionals as m u c h as p ossible, w e mak e the follo wing initializatio n s: If ( pUseCombine == F ALSE) sCombineSizeSel = 1 sCombinePairsSel = 1 cCombineCPUCost = 0 If ( pIsInComp r esse d == F ALSE) sInputCompr essR atio = 1 cInUnc omprCPUCost = 0 If ( pIsIntermCom pr esse d == F ALSE) sIntermCompr essR atio = 1 cIntermUnc omprCPUCost = 0 cIntermComprCPUCost = 0 If ( pIsOutCompr esse d == F ALSE) sOutCompr essR atio = 1 cOutComprCPUCost = 0 3 2 P erformance Mo dels for the Map T ask Phases The Map T ask execution is divided in to five p hases: 1. R e ad : Reading the inp ut s plit and creating the k ey-v alue p airs. 2. Map : Executing the user-provided map fu nction. 3. Col le ct : Collecting the map output into a bu ffer and p artitioning. 4. Spil l : S orting, using the combiner if any , p erforming compr ession if asked, and fi nally spilling to disk, creating file spil ls . 5. Mer ge : Merging the fi le spills in to a single map outpu t file. Merging migh t b e p erform ed in m ultiple r ounds. 2.1 Mo deling the Read and Map Phases During this ph ase, the inpu t split is read, uncompressed if necessary , the k ey-v alue pairs are created, and passed an input to the user-defined map function. inputMapSize = pSplitSize sInputCompr essR atio (2) inputMapPairs = inputMapSize sInputPairWidth (3) The costs of this phase are: IOCost R e ad = pSplitSize × cHdfsR e adCost CPUCost R e ad = pSplitSize × cInUnc omprCPUCost + inp u tMapPairs × cMapCPUCost (4) If the MR job consists only of mapp ers (i.e. pNumR e duc ers = 0 ), then the spilling and merging phases will not b e executed and the map output will b e written directly to HDFS. outMapSize = inputMapSize × sMapSizeSel (5) IOCost MapWrite = outMapSize × sOutCompr essR atio × cHdfsWriteCost (6) CPUCost MapWrite = outMapSize × cOutComprCPUCost (7) 4 2.2 Mo deling the Collect and Spill Phases The map fun ction generates output k ey-v alue (K-V) pairs that are placed in the map-side memory buffer. The form ulas regarding the map output are: outMapSize = inputMapSize × sMapSizeSel (8) outMapPairs = inputMapPairs × sMapPairsSel (9) outPairWidth = outMapSize outMapPairs (10) The memory buffer is split into tw o parts: the serialization part that stores the k ey-v alue p airs, and the ac c ounting part that stores metadata p er pair. When either of these t wo parts fi lls up (based on the threshold v alue pSpil lPer c ), the pairs are partitioned, sorted, and spilled to disk. The maxim um num b er of pairs for the ser ialization buffer is: maxSerPairs =  pSortMB × 2 20 × ( 1 − pSortR e cPer c ) × pSpil lPe r c outPairWidth  (11) The maxim um num b er of pairs for the accounti n g bu ffer is: maxA c cPairs =  pSortMB × 2 20 × pSortR e c P er c × pSpil lPer c 16  (12) Hence, the n u m b er of pairs and s ize of the buffer b efore a spill will b e: spil lBufferPairs = M in { maxSerPairs , max A c cPairs , outMapPairs } (13) spil lBufferSize = spil lBufferPairs × outPairWidth (14) The o v er all num b er of spills will b e: numSpil ls =  outMapPairs spil lBufferPairs  (15) The n umb er of pairs and size of eac h spill dep ends on th e width of eac h K-V pair, the use of the com bine function, and the u se of int erm ediate data compr ession. Note that sIntermCompr essR atio is set to 1 by d efault, if inte r mediate compr ession is disabled. Note that sCombinePairsSel an d sCombinePairsSel are set to 1 by d efault, if no com bine fun ction is used. spil lFilePairs = spil lBufferPairs × sCombinePairsSel (16) spil lFileSize = spil lBufferSize × sCombineSizeSel × sIntermCompr essR atio (17) 5 The costs of this phase are: IOCost Spil l = numSpil ls × spil lFileSize × cL o c alIOCost (18) C P U C ost S p ill = numSpil ls × [ spil lBuffe rPairs × cPartitionCPUCost + spil lBufferPairs × cSer deCPUCost + spil lBufferPairs × log 2 ( spil lBufferPairs pNumR e duc ers ) × cSortCPUCost + spil lBufferPairs × cCombineCPU Cost + spil lBufferSize × sCombineSizeSel × cIntermComprCPUCost ] (19) 2.3 Mo deling the Merge Phase The goal of the merge phase is to merge all the spill files into a single output file, whic h is written to lo cal disk. The merge ph ase will o ccur only if more that one spill file is created. Multiple merge passes migh t o ccur, dep ending on the pSortF actor parameter. W e define a mer ge p ass to b e the merging of at most pSortF actor spill files. W e d efine a mer ge r ound to b e one or more merge passes that merge only spills p ro du ced b y the spill p hase or a previous merge round . F or example, sup p ose numSpil ls = 30 and pSortF actor = 10 . T hen, 3 merge passes will b e p erform ed to create 3 new files. This is the fi rst m erge round. Th en, th e 3 new fi les will b e m erged together forming the 2nd and final merge round. The fi nal merge pass is unique in the sense that if the num b er of spills to b e merged is greater than or equal to pNumSpil lsF orComb , the combiner will b e used again. Hence, we treat the int er m ediate merge r ounds and the fi nal merge separately . F or the in termediate merge p asses, w e calculate how man y times (on av erage) a single spill will b e read. Note that the remaining section assumes numSpils ≤ pSortF actor 2 . In the opp osite case, we m u st use a sim ulation-based approac h in order to calculate the num b er of spills merged during the in - termediate merge round s as wel l as the total n u m b er of m erge passes. The first merge pass is also u nique b ecause Hadoop will calculate the optimal num b er of sp ill files to merge so that all other merge passes will merge exactly pSortF actor fi les. Since the Reduce task also con tains a similar Merge Phase, we define the follo wing three metho d s to reuse later: c alcNumSpil lsFirstPass ( N , F ) =      N , if N ≤ F F , if ( N − 1) MO D ( F − 1) = 0 ( N − 1) MOD ( F − 1) + 1 , otherwise (20) c alcNumSpil lsIntermMer ge ( N , F ) = ( 0 , if N ≤ F P +  N − P F  ∗ F , if N ≤ F 2 , wher e P = c alcNumSpil lsFirstPass ( N , F ) (21) 6 c alcNumSpil lsFinalMer ge ( N , F ) = ( N , if N ≤ F 1 +  N − P F  + ( N − S ) , if N ≤ F 2 , wher e P = c alcNumSpil lsFirstPass ( N , F ) , wher e S = c alcNumSpil lsIntermMer ge ( N , F ) (22) The n u m b er of sp ills read durin g the first merge pass is: numSpil lsFirstPass = c alcNumSpil lsFirstPass ( numSpil ls , pSortF actor ) (23) The n u m b er of sp ills read durin g the int erm ediate merging is: numSpil lsIntermMer ge = c alcNumSpil lsIntermMer ge ( numSpil ls , pSortF actor ) (24) The total n u m b er of merge passes will b e: numMer g e Passes =        0 , if numS pil l s = 1 1 , if numS pil l s ≤ pS or tF actor 2 + j numS p ill s − numS pill sF ir stP ass pS or tF actor k , if numS pil l s ≤ pS or tF actor 2 (25) The num b er of sp ill files f or the final mer ge round is (fir s t pass + intermediate passes + remaining file spills): numSpil lsFinalMer ge = c alcNu mSpil lsFinalMer ge ( numSpil ls , pSortF actor ) (26) The total n u m b er of records spilled is: numR e cSpil le d = spil lFilePairs × [ numSpil ls + numSpil lsIntermMer ge + numSpil ls × sCombinePairsSel ] (27) The final map output size and n u m b er of K -V pairs are: useCombInMer ge = ( numSpil ls > 1 ) AND ( pUseCombine ) AND ( numSpil lsFinalMer ge ≥ pNu mSpil lsF orComb ) (28) intermDataSize = numSpil ls × spil lFileSize × ( sC ombineS iz eS el if useC ombI nM er g e 1 otherwise (29) intermDataPairs = numSpil ls × spil lFilePairs × ( sCombinePairsSel if useCombInMer ge 1 otherwise (30) 7 The costs of this phase are: IOCost Mer ge = 2 × numSpil lsIntermMer ge × spil lFileSize × cL o c alIOCost // interm merges + numSpil ls × spil lFileSize × cL o c alIOCost // read final merge + intermDataSize × cL o c alIOCost // write final merge (31) CPUCost Mer ge = numSpil lsIntermMer ge × [ spil lFileSize × cIntermUnc omprCPUCost + spil lFilePairs × cM er ge CP UCost + spil lFileSize sIntermCompr essR atio × cIntermComprCPUCost ] + numSpil ls × [ spil lFileSize × cIntermUnc omprCPUCost + spil lFilePairs × cM er ge CP UCost + spil lFilePairs × cCombineCPU Cost ] + intermDataSize sIntermCompr essR atio × cIntermComprCPUCost (32) 2.4 Mo deling the Overal l Map T ask The ab o ve mo dels corresp ond to the execution of a single map task. The o v erall costs for a single map task are: IOCost Map = ( I O C ost Read + I O C os t M apW r ite if pN umReducer s = 0 I O C ost Read + I O C os t S p ill + I O C os t M er g e if pN umReducer s > 0 (33) CPUCost Map = ( C P U C ost Read + C P U C ost M apW r ite if pN umReducer s = 0 C P U C ost Read + C P U C ost S p ill + C P U C ost M er g e if pN umReducer s > 0 (34) 3 P erformance Mo dels for the Reduce T ask Phases The Reduce T ask is divid ed in to four p h ases: 1. Shuffle : Copying the map ou tp ut from the mapp er n o des to a reducer’s no d e and d ecom- pressing, if needed. P artial merging m ay also o ccur during this phase. 2. Mer ge : Merging the sorted f ragmen ts from the different mapp ers to form the input to the reduce function. 3. R e duc e : Executing the u ser-pro vid ed r educe function. 4. Write : W riting the (compressed) output to HDFS. 8 3.1 Mo deling the Sh uffle P hase The follo wing discussion r efers to the execution of a single redu ce task. In the S h u ffle phase, th e framew ork fetc hes the relev ant map outpu t partition from eac h m app er (called se gment ) and copies it to the redu cer’s no de. If th e map output is compressed, it will b e uncompressed. F or eac h map segmen t that reac h es the redu ce side w e hav e: se gmentComprSize = intermDataSize pNumR e duc ers (35) se gmentUnc omprSize = se gmentComprSize sIntermCompr essR atio (36) se gmentPairs = intermDataPairs pNumR e duc ers (37) where intermDataSize and intermDataPairs are the size and num b er of pairs pro du ced as in ter- mediate output b y a single mapp er (see Section 2.3). The data fetc hed to a sin gle red ucer will b e: totalShuffleSize = pNumMapp ers ∗ se gmentComprSize (38) totalShufflePairs = pNumMapp e rs ∗ se gmentPairs (39) As the data is copied to the reducer, they are placed in the shuffle buffer in memory with size: shuffleBufferSize = pShuffleInBufPer c × pT askMem (40) When the in-memory buffer reac hes a threshold size or the num b er of segmen ts b ecomes greater than th e pInMemMer geThr , the segments are merged an d sp illed to disk creating a new lo cal file (called shuffleFile ). The merge size threshold is: mer geSizeThr = pShuffleMer gePer c × shuffleBufferSize (41) Ho w ev er, when the segmen t s ize is greater that 25% of the shuffleBuff erSize , the segment w ill go straigh t to disk instead of passing through memory (hence, no in-memory merging will o ccur). Case 1: se gmentUnc omprSize < 0 . 25 × shuffleBufferSize numSe gInShuffleFile = mer geSizeThr se gmentUnc omprSize (42) If ( ⌈ numSe gInShuffleFile ⌉ × se gmentUnc omprSize ≤ shuffleBufferSize ) numSe gInShuffleFile = ⌈ numSe gInShuffleFile ⌉ else numSe gInShuffleFile = ⌊ numSe gInShuffleFile ⌋ 9 If ( numSe gInShuffleFile > pInMemMer geThr ) numSe gInShuffleFile = pInMemMe r geThr (43) A sh uffl e file is the merging on numSe gInShuffleFile segments. If a com bine fu nction is sp ecified, then it is applied d uring this merging. Note that if numSe gInShuffleFile > numM app ers , then merging will not happ en . shuffleFileSize = numSe gInShuffleFile × se gmentComprSize × sCombineSizeSel (44) shuffleFilePairs = numSe gInShuffleFile × se gmentPairs × sCombinePairsSel (45) numShuffleFiles =  pNumMapp ers numSe gInShuffleFile  (46) A t the end of the mer ging, some segmen ts might remain in memory . numSe gmentsInMem = pNu mMapp ers MOD nu mSe gInShuffleFile (47) Case 2: se gmentUnc omprSize ≥ 0 . 25 × shuffleBufferSize numSe gInShuffleFile = 1 (48) shuffleFileSize = se gmentComprSize (49) shuffleFilePairs = se gmentPairs (50) numShuffleFiles = pNumMapp ers (51) numSe gmentsInMem = 0 (52) Either case w ill create a set of sh u ffle files on disk. When the num b er of shuffle files on d isk in creases ab o ve a certain threshold ( 2 × pSortF actor − 1 ), a new merge thread is triggered and pSortF actor sh u ffle fi les are merged into a new larger sorted one. The Com b in er is not used du ring this disk merging. T he total n umb er of suc h merges are: numShuffleMer ges = ( 0 , if numS huf f l eF iles < 2 × pS or tF actor − 1 j numS h uf f l eF iles − 2 × p S or tF actor +1 pS or tF actor k + 1, otherwise (53) A t the end of the S h u ffle p h ase, a set of merged and unmerged sh uffl e files will exist on d isk. numMer g ShufFiles = numShuffleMer ges (54) 10 mer gShufFileSize = pSortF actor × shuffleFileSize (55) mer gShufFilePairs = pSortF actor × shuffleFilePairs (56) numUnmer gShufFiles = numShuffleFiles − pSortF actor × numShuffleMer ges (57) unmer gShufFileSize = shuffleFileSize (58) unmer gShufFilePairs = shuffleFilePairs (59) The cost of the Sh u ffling phase is: IOCost Shuffle = numShuffleFiles × shuffleFileSize × cL o c alIOCost + numMer gShufFiles × mer gShufFileSize × 2 × cL o c alIOCost (60) C P U C ost S h uf f l e = [ totalShuffleSize × cIntermUnc omprCPUCost + numShuffleFiles × shuffleFilePairs × cM er geCP UCost + numShuffleFiles × shuffleFilePairs × cCombineCPU Cost + numShuffleFiles × shuffleFileSize sIntermCompr essR atio × cIntermComprCPUCost ] × I ( se gmentUnc omprSize < 0 . 25 × shuffleBufferSize ) + numMer gShufFiles × mer gShufFileSize × cIntermUnc omprCPUCost + numMer gShufFiles × mer gShufFilePairs × c M er ge CPUCost + numMer gShufFiles × mer gShufFileSize sIntermCompr essR atio × cIntermComprCPUCost (61) 3.2 Mo deling the Merge Phase After all the map outputs hav e b een su ccessful copied in memory and /or on disk, the sort- ing/merging phase b egins. This phase will merge all data in to a single stream that is fed to the reducer. Similar to the Map Merge phase (see Section 2.3), this phase may o ccur it multiple rounds, bu t durin g the final merging, instead of creating a single output file, it will send the data directly to the reducer. The sh uffle phase pro du ced a set of merged and u nmerged shuffle files on disk, and p erhaps a set of segmen ts in memory . The merging is done in three steps. Step 1: Some segments migh t b e evicted f rom memory and merged int o a single shuffle file to satisfy the memory constrain t enforced b y pR e duc erInBufP er c . (This parameter sp ecifies th e amount of memory allo w ed to b e o ccup ied b y segment s b efore the reducer b egins.) 11 maxSe gmentBuffer = pR e duc e rInBuf Per c × pT askMem (62) currSe gmentBuffe r = numSe gmentsInMem × se gmentUnc omprSize (63) If ( currSe gmentBuffer > maxSe gmentBuffer ) numSe gmentsEvic te d =  currSe gmentBuffe r − maxSe gmentBuffer se gmentUnc omprSize  else numSe gmentsEvic te d = 0 (64) numSe gmentsR emainMem = numSe gmentsInMem − numSe gmentsEvicte d (65) The ab o ve merging will only o ccur if the num b er of existing shuffle fi les on disk are less than the pSortF actor . If not, then the sh uffl e files w ould hav e to b e merged, and the in-memory segmen ts that are supp osed to b e evicted are left to b e merge with the sh u ffle files on disk. numFilesOnDisk = numMer gShufFiles + numUnmer gShufFiles (66) If ( numFilesOnDisk < pSortF actor ) numFilesF r omMem = 1 filesF r omMemSize = numSe gmentsEvicte d × se gmentComprSize filesF r omMemPairs = numSe gmentsEvi cte d × se gmentPairs step1Mer gingSize = filesF r omMemSize step1Mer gingPairs = filesF r omMemPairs else numFilesF r omMem = numSe g mentsEv i cte d filesF r omMemSize = se gmentComprSize filesF r omMemPairs = se gmentPairs step1Mer gingSize = 0 step1Mer gingPairs = 0 (67) filesT oMer geStep2 = numFilesOnDisk + numFilesF r omMem (68) Step 2: Any files on disk will go through a merging phase in m u ltiple rounds (similar to the pro cess in Section 2.3. Th is step will happ en only if numFilesOnDisk > 0 (whic h imp lies filesT oMer geStep2 > 0 ). The n u m b er of intermediate reads (and writes) are: intermMer geR e ads = c alcNumSpil lsIntermMer ge ( filesT oMer geStep2 , pSortF actor ) (69) 12 The main d ifference from S ection 2.3 is that the m erged files hav e differen t sizes. W e accoun t for this b y attributing m er ging costs p rop ortionally . step2Mer gingSize = intermMer geR e ads filesT oMer geStep2 × [ numMer gShufFiles × mer gShufFileSize + numUnmer gShufFiles × unmer g ShufFileSize + numFilesF r omMem × filesF r omMemSize ] (70) step2Mer gingPairs = intermMer geR e ads filesT oMer geStep2 × [ numMer gShufFiles × mer gShufFilePairs + numUnmer gShufFiles × unmer gShufFilePairs + numFilesF r omMem × filesF r omMemPairs ] (71) filesR emainF r omStep2 = c alcNumSpil lsFinalMer ge ( filesT oMer geStep2 , pSortF actor ) (72) Step 3: All files on disk and in memory will go through merging. filesT oMer geStep3 = filesR emainF r omStep2 + numSe g mentsR emainMem (73) The pro cess is iden tical to step 2 ab o ve. intermMer geR e ads = c alcNumSpil lsIntermMer ge ( filesT oMer geStep3 , pSortF actor ) (74) step3Mer gingSize = intermMer geR e ads filesT oMer geStep3 × totalShuffleSize (75) step3Mer gingPairs = intermMer geR e ads filesT oMer geStep3 × totalShufflePairs (76) filesR emainF r omStep3 = c alcNumSpil lsFinalMer ge ( filesT oMer geStep3 , pSortF actor ) (77) totalMer gingSize = step1Mer gingSize + step2Mer gingSize + step3Mer gingSize (78) The cost of the Sorting phase is: IOCost Sort = totalMer gingSize × cL o c alIOCost (79) CPUCost Sort = totalMer gingSize × c Mer g eCPUCost [ totalMer gingSize sIntermCompr essR atio ] × cIntermComprCPUCost [ step2Mer gingSize + step3Mer gingSize ] × cIntermUnompr CP UCost (80) 13 3.3 Mo deling the Reduce and W rite Phases Finally , the u ser-pro vid ed redu ce fun ction will b e executed and the output will b e w ritten to HDFS. inR e duc eSize = numShuffleFiles × shuffleFileSize sIntermCompr essR atio + numSe gmentsInMem × se gmentComprSize sIntermCompr essR atio (81) inR e duc ePairs = numShuffleFiles × shuffleFilePairs + numSe g mentsInMe m × se gmentComprPairs (82) outR e duc eSi ze = inR e duc eSize × sR e duc eSize Se l (83) outR e duc eP airs = i nR e duc eP airs × sR e duc ePairsSel (84) The input to the reduce f unction resides in memory and/or in the sh uffle files pr o duced by the Shuffling and Sorting phases. inR e dSizeDiskSize = numM e r gShufFiles × mer gShufFileSize + numUnmer gShufFiles × unmer g ShufFileSize + numFilesF r omMem × filesF r omMemSize (85) The cost of the W rite p hase is: IOCost Write = inR e dSizeDiskSize × cL o c alIOCost + outR e duc eSize × sOutCompr essR atio × cHdfsWriteCost (86) CPUCost Write = inR e duc ePairs × cR e duc eCP UCost + inR e dSizeDiskSize × cIntermUnc ompCPUCost + outR e duc eSize × cOutComprCPUCost (87) 3.4 Mo deling the Overal l Reduce T ask The ab ov e mo d els corresp ond to the execution of a single reduce task. The ov erall costs for a sin gle reduce task, excluding net work transfers, are: IOCost R e duc e = IOCost Shuffle + IOCost Sort + IOCost Write (88) CPUCost R e duc e = CPUCost Shuffle + CPUCost Sort + CPUCost Write (89) 14 4 P erformance Mo dels for the Net w ork T ransfer During the shuffle phase, all th e data pro du ced by the map tasks is copied o ver to the no d es run ning the reduce tasks (except for the data that is lo cal). The ov erall data transf erred in the net wo r k is: netT r ansferSize = finalOutMapSize × pNu mMapp ers × pNumNo des − 1 pNumNo des (90) where finalOutMapSize is the size of d ata pro du ced by a single map tasks. The o v er all cost for transferrin g data o ver the net work is: NETCost Job = netT r ansferSize × networkCost (91) 5 P erformance Mo dels for the Map-Reduce Job The MapReduce j ob consists of sev eral m ap and reduce tasks executing in p arallel and in w a ves. There are t wo primary w a ys to estimating the total costs of the job: (i) simulate the task execution using a T ask Sc he duler Simulator , and (ii) calculate the exp ected total costs analytica lly . Sim u lation in volv es s cheduling and s imulating the execution of in dividual tasks on a virtual Cluster. T he cost for eac h task is calculated u sing the prop osed p erformance mo dels. The second approac h inv olv es using the follo wing analytica l costs: IOCost AllMaps = pNumMapp ers × IOCost Map pNumNo des × pM axMapsPerNo de (92) CPUCost AllMaps = pNumMapp ers × CPUCost Map pNumNo des × pMaxMapsPerNo de (93) IOCost AllR ed ucers = pNumR e duc ers × IOCost R e duc e pNumNo des × pMaxR e dPerNo de (94) CPUCost AllR ed ucers = pNumR e duc ers × CPU Cost R e duc e pNumNo des × pM axR e dPerNo de (95) The o v er all job cost is simply the sum of the costs from all the map and the reduce tasks. IOCost Job = ( I O C ost All M aps if pN umReducer s = 0 I O C ost All M aps + I O C ost All Reducer s if pN umReducer s > 0 (96) 15 CPUCost Job = ( C P U C ost All M aps if pN umReducer s = 0 C P U C ost All M aps + C P U C os t All Reducer s if pN umReducer s > 0 (97) With appropriate system parameters that allo w for equal comparisons among th e I/O, C PU, and net work costs, the o v erall cost is: Cost Job = IOCost Job + CPUCost Job + NETCost Job (98) 16

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment