A Framework for QoS-aware Execution of Workflows over the Cloud

The Cloud Computing paradigm is providing system architects with a new powerful tool for building scalable applications. Clouds allow allocation of resources on a "pay-as-you-go" model, so that additional resources can be requested during peak loads …

Authors: Moreno Marzolla, Raffaela Mir, ola

A Framework for QoS-aware Execution of Workflows over the Cloud
A Frame work for QoS-a ware Ex ecution of W orkflo ws o ver the Cloud Moreno Marzolla Univ ersit ` a di Bologna Dipartimento di Scienze dell’Informazione Mura A. Zamboni 7, I-40127 Bologna, Italy Email: marzolla@cs.unibo.it Raf faela Mirandola Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da V inci, I-20133 Milano, Italy Email: mirandola@elet.polimi.it Abstract —The Cloud Computing paradigm is pr oviding system architects with a new powerful tool for building scalable applica- tions. Clouds allow allocation of resour ces on a ”pay-as-you-go” model, so that additional resources can be requested during peak loads and released after that. Howev er , this flexibility asks for appropriate dynamic reconfiguration strategies. In this paper we describe SA VER (qoS-A war e workflows oVER the Cloud), a QoS- aware algorithm for executing workflows involving W eb Services hosted in a Cloud en vironment. SA VER allows execution of arbitrary workflows subject to response time constraints. SA VER uses a passive monitor to identify workload fluctuations based on the observed system response time. The information collected by the monitor is used by a planner component to identify the minimum number of instances of each W eb Service which should be allocated in order to satisfy the response time constraint. SA VER uses a simple Queueing Network (QN) model to identify the optimal resour ce allocation. Specifically , the QN model is used to identify bottlenecks, and pr edict the system performance as Cloud resources are allocated or released. The parameters used to evaluate the model are those collected by the monitor , which means that SA VER does not require any particular knowledge of the W eb Services and workflows being executed. Our approach has been validated through numerical simulations, whose results are reported in this paper . I . I N T R O D U C T I O N The emerging Cloud computing paradigm is rapidly gaining consensus as an alternativ e to traditional IT systems, as ex emplified by the Amazon EC2 [1], Xen [2], IBM Cloud [3], and Microsoft Cloud [4]. Informally , the Cloud computing paradigm allo ws computing resources to be seen as a utility , av ailable on demand. The term “resource” may represent infrastructure, platforms, software, services, or storage. In this vision, the Cloud provider is responsible to make the resources av ailable to the users as they request it. Cloud services can be grouped into three cate gories [5]: Infrastructure as a Service (IaaS), providing low-le vel re- sources such as V irtual Machines (VMs) (e.g., Amazon EC2 [1]); Platform as a Service (PaaS), pro viding soft- ware de velopment frame works (e.g., Microsoft Azure [4]); and Software as a Service (SaaS), providing applications (e.g., Salesforce.com [6]). The Cloud provider has the responsibility to manage the resources it provides (being them VM instances, programming framew orks or applications) so that the user requirements and the desired Quality of Service (QoS) are satisfied. Cloud users are usually char ged according to the amount of resources they consume (e.g., some amount of money per hour of CPU usage). In this way , customers can av oid capital expenditures by using Cloud resources on a “pay-as-you-go” model. Users QoS requirements (e.g., timeliness, a vailability , secu- rity) are usually the result of a negotiation process engaged between the resource provider and the user , which culminates in the definition of a Service Lev el Agreement (SLA) concern- ing their respecti ve obligations and expectations. Guarantee- ing SLAs under v ariable workloads for dif ferent application and service models is extremely challenging: Clouds are char- acterized by high load v ariance, and users have heterogeneous and competing QoS requirements. In this paper we present SA VER (qoS-A ware workflows oVER the Cloud), a workflo w engine provided as a SaaS. The engine allows different types of workflo ws to be executed ov er a set of W eb Services (WSs). W orkflows are described using some appropriate notations (e.g., using the WS-BPEL [7] workflo w description language). The workflo w engine takes care of interacting with the appropriate WSs as described in the workflo w . In our scenario, users can negotiate QoS requirements with the service provider; specifically , for each type c of workflow , the user may request that the average execution time of the whole workflo w should not exceed a threshold R + c . Once the QoS requirements have been negotiated, the user can submit any number of workflows of the dif ferent types. Both the submission rate and the time spent by the workflows on each WS can fluctuate ov er time. T raditionally , when deciding the amount of resources to be dedicated to applications, service providers considered worst-case scenarios, resulting in resource over -provisioning. Since the worst-case scenario rarely happens, a static system deployment results in a processing infrastructure which is largely under-utilized. T o increase the utilization of resources while meeting the requested SLA, SA VER uses an underlying IaaS Cloud to provide computational power on demand. The Cloud hosts multiple instances of each WS, so that the workload can be balanced across the instances. If a WS is heavily used, SA VER will increase the number of instances by requesting new resources from the Cloud. In this way , the response time Fig. 1. Illustration of the bottleneck shift issue of that WS can be reduced, reducing the total execution time of workflo ws as well. SA VER monitors the workflow engine and detects when some constraints are being violated. System reconfigurations are triggered periodically , when instances are added or remov ed where necessary . Despite its conceptual simplicity , the idea above is quite challenging to implement in practice. T o better illustrate the problem, let us consider the situation shown in Fig, 1, which is modeled upon a similar example from [8]. W e have three W eb Services W 1 , W 2 , W 3 which are used by two types of workflo ws. Instances of the first type arri ve at a rate of 2 r e q /s , and ex ecute operations on W 1 , W 2 and W 3 . Instances of the second workflo w type arri ve at a rate of 1 r e q /s and only use W 1 and W 3 . Each WS has a maximum capacity , which corresponds to the maximum request rate it can handle. W eb Services 1 and 3 hav e a maximum capacity of 2 r e q /s , while WS 2 has a capacity of 3 r e q /s . In Fig. 1(a) the capacity of W 1 is exceeded, because the aggregate arriv al rate (3 r e q /s ) is greater than its processing capacity . Thus, a queue of unprocessed in vocations of W 1 builds up, until requests start to timeout and are dropped at a rate of 1 r e q /s . T o eliminate the bottleneck, a possible solution is to create multiple instances of the bottleneck WS on dif ferent servers, and balance the load across all instances. If we apply this strategy and create two instances of W 1 , we get the situation shown in Fig. 1(b): the aggregate processing capacity of W 1 is no w 4 r e q /s , and thus W eb Service 1 is no longer the bottleneck. Ho we ver , the bottleneck shifts to W 3 , which now sees an aggregate arriv al rate of 3 r e q /s and has a capacity of 2 r e q /s . The situation above demonstrates the bottlenec k shift phe- nomenon: fixing a bottleneck may create another bottleneck at a different place. Thus, satisfying QoS constraints on systems subject to v ariable workloads is challenging, because identifying the system configuration which satisfies all con- straints might inv olve multiple reconfigurations of indi vidual components (in our scenario, adding WS instances). If the reconfiguration is implemented in a purely reactiv e manner , each step must be applied sequentially in order to monitor its impact and plan for the ne xt step. This is clearly inef ficient because adaptation would be exceedingly slow . In general, the response time at a specific WS depends both on the number of instances of that W eb Service, and also on the intensity of other workload classes (workflo w types). Thus, a suitable system performance model must be used in order to predict the response time of a gi ven configuration. The performance model can be used to dri ve the reconfiguration process proactiv ely: dif ferent system configurations can be ev aluated quickly , and multiple reconfiguration steps can be planned in adv ance. SA VER uses a open, multiclass Queueing Network (QN) model to represent resource contention by multiple independent request flows, which is crucial in our scenario. The parameters which are needed to ev aluate the QN model can be easily obtained by passi vely monitoring the running system. The performance model is used within a greedy strategy which identifies an approximate solution to the optimization problem minimizing the number of WS instances while respecting the SLA. Structur e of this paper: The remainder of this paper is organized as follows. In Section II we re view the scientific literature and compare SA VER with related works. In Sec- tion III we giv e a precise formulation of the problem we are addressing. In Section IV we describe the Queueing Network performance model of the Cloud-based workflow engine. SA VER will be fully described in Section V, including the high-lev el architecture and the details of the reconfiguration algorithms. The effecti veness of SA VER hav e been ev aluated by means of simulation experiments, whose results will be discussed in Section VI. Finally , conclusions and future works are presented in Section VII. In order to mak e this paper self-contained without sacrificing clarify , we relegated the mathematical details of the analysis of the performance model in a separate Appendix. I I . R E L AT ED W O R K S Sev eral research contributions hav e previously addressed the issue of optimizing the resource allocation in cluster-based service centers. Recently , with the emerging of virtualiza- tion approaches and Cloud computing, additional research on automatic resource management has been conducted. In this section we briefly revie w some recent results; some of them take advantage of control theory-based feedback loops [9], [10], machine learning techniques [11], [12], or utility-based optimization techniques [13], [14]. When moving to virtualized en vironments the resource allocation problem becomes even more complex because of the introduction of virtual resources [14]. Se veral approaches hav e been proposed for QoS and resource management at run- time [9], [15]–[19]. The approach presented in [15] describes a method for achieving optimization in Clouds by using performance mod- els all along the dev elopment and operation of the applications running in the Cloud. The proposed optimization aims at max- imizing profits in the Cloud by guaranteeing the QoS agreed in the SLAs taking into account a large variety of workloads. A layered Cloud architecture taking into account different stakeholders is presented in [9]. The architecture supports self- management based on adapti ve feedback control loops, present at each layer , and on a coordination activity between the different loops. Mistral [16] is a resource managing frame work with a multi-lev el resource allocation algorithm considering reallocation actions based mainly on adding, removing and/or migrating virtual machines, and shutdown or restart of hosts. This approach is based on the usage of Layered Queuing Network (LQN) performance model. It tries to maximize the ov erall utility taking into account several aspects lik e power consumption, performance and transient costs in its reconfig- uration process. In [18] the authors present an approach to self-adaptiv e resource allocation in virtualized environments based on online architecture-level performance models. The online performance prediction allow estimation of the effects of changes in user workloads and of possible reconfiguration actions. Y azir et al. [19] introduces a distrib uted approach for dynamic autonomous resource management in computing Clouds, performing resource configuration using through Mul- tiple Criteria Decision Analysis. W ith respect to these works, SA VER lies in the same research line fostering the usage of models at runtime to driv e the QoS-based system adaptation. SA VER uses an efficient modeling and analysis technique that can then be used at runtime without undermining the system beha vior and its ov erall performance. Ferretti et al. propose in [17] a middle ware architecture enabling a SLA-dri ven dynamic configuration, management and optimization of Cloud resources and services. The ap- proach makes use of a load balancer that distributes the workload among the av ailable resources. When the perceived QoS de viates from the SLA, the platform is dynamically reconfigured by acquiring new resources from the Cloud. On the other hand, if resources under -utilization is detected, the system triggers a reconfiguration to release those unused resources. This approach is purely reactiv e and considers a single-tier application, while SA VER works for an arbitrary number of WSs and uses a performance model to plan comple x reconfigurations in a single step. Canfora et al. [20] describe a QoS-aware service discov ery and late-binding mechanism which is able to automatically adapt to changes of QoS attributes in order to meet the SLA. The authors consider the execution of workflows ov er a set of WSs, such that each WS has multiple functionally equiv alent implementations. Genetic Algorithms are use to bind each WS to one of the a vailable implementations, so that a fitness function is maximized. The binding is done at run-time, and depends on the values of QoS attributes which are monitored by the system. It should be observed that in SA VER we consider a dif ferent scenario, in which each WS has just one implementation which ho wever can be instantiated multiple times. The goal of SA VER is to satisfy a specific QoS requirement (mean execution time of workflo ws below a gi ven threshold) with the minimum number of instances. I I I . P R O B L E M F O R M U L A T I O N SA VER is a workflow engine whose general structure is depicted in Fig. 2: it recei ves workflows from external clients, and executes them over a set of K WS W 1 , . . . , W K . W ork- flows can be of C different types (or classes); for each class c = 1 , . . . , C , clients define a maximum allowed completion time R + c . This means that an instance of class c workflow must be completed, on a verage, in time less than R + c . Ne w workflo w classes can be created at any time; when a ne w class is created, its maximum response time is negotiated with the workflo w service provider . W e denote with λ c the av erage arriv al rate of class c workflo ws. Arriv al rates can change over time 1 . Since all WSs are shared between the workflows, the completion time of a workflo w depends both on arriv al rates λ = ( λ 1 , . . . , λ C ) , and on the utilization of each WS. In order to satisfy the response time constraints, the system must adapt to cope with fluctuations of the workload. T o do so, SA VER relies on a IaaS Cloud which maintains multiple instances of each WS. Run-time monitoring information is sent by all WSs back to the workflow engine to dri ve the adaptation process. W e denote with N k the number of instances of WS W k ; a system configuration N = ( N 1 , . . . , N K ) is an inte- ger vector representing the number of allocated instances of each WS. When a workflo w interacts with W k , it is bound to one of the N k instances so that the requests are e venly distributed. When the workload intensity increases, additional instances are created to eliminate the bottlenecks; when the workload decreases, surplus instances are shut down and released. The goal of SA VER is to minimize the total number of WS instances while maintaining the mean ex ecution time of type c workflo ws below the threshold R + c , c = 1 , . . . , C . Formally , we want to solve the following optimization problem: minimize f ( N ) = K X k =1 N k (1) subject to R c ( N ) ≤ R + c for all c = 1 , 2 , . . . , C N i ∈ { 1 , 2 , 3 , . . . } 1 In order to simplify the notation, we write λ c instead of λ c ( t ) . In general, we will omit explicit reference to t for all time-dependent parameters. Fig. 2. System model where R c ( N ) is the mean ex ecution time of type c workflows when the system configuration is N = ( N 1 , . . . , N K ) . If the IaaS Cloud which hosts WS instances is managed by some third-party organization, then reducing the number of activ e instances reduces the cost of the workflow engine. I V . S Y S T E M P E R F O R M A N C E M O D E L Before illustrating the details of SA VER , it is important to describe the QN performance model which is used to plan a system reconfiguration. W e model the system of Fig. 2 using the open, multiclass QN model [21] shown in Fig. 3. A QN model is a set of queueing centers, which in our case are FIFO queues attached to a single server . Each server represents a single WS instance; thus, W k is represented by N k queueing centers, for each k = 1 , . . . , K . N k can change over time, as resources are added or remov ed from the system. In our QN model there are C different classes of requests, which are generated outside the system. Each request repre- sents a workflo w , thus workflow types are directly mapped to QN request classes. In order to simplify the analysis of the model, we make the simplifying assumption that the inter - arriv al time of class c requests is exponentially distrib uted with arriv al rate λ c . This means that a ne w workflo w of type c is submitted, on av erage, every 1 /λ c time units. The interaction of a type c workflo w with WS W k is mod- eled as a visit of a class c request to one of the N k queueing centers representing W k . W e denote with R ck ( N ) the total time ( residence time ) spent by type c workflows on one of the N k instances of W k for a gi ven configuration N . The residence time is the sum of two terms: the service demand D ck ( N ) (average time spent by a WS instance executing the request) and queueing delay (time spent by a request in the waiting queue). The QN model allo ws multiple visits to the same queueing center , because the same WS can be executed multiple times by the same workflo w . The residence time and service demands are the sum of residence and service time of all in vocations of the same WS instance. The utilization U k ( N ) of an instance of W k is the fraction of time the instance is busy processing requests. If the workload is evenly balanced, then both the residence time R ck ( N ) and the utilization U k ( N ) are almost the same for all N k instances of W k . Fig. 3. Performance model based on an open, multiclass Queueing Network T ABLE I S Y MB O L S U S E D I N T H IS PA PE R C Number of workflow types K Number of W eb Services λ V ector of per-class Arriv al rates M Current system configuration N , N 0 Arbitrary system configurations R ck ( N ) Residence time of type c workflows on an instance of W k D ck ( N ) Service demand of type c workflows on an instance of W k R c ( N ) Response time of type c workflows U k ( N ) Utilization of an instance of W k R + c Maximum allo wed response time for type c workflows T able I summarizes the symbols used in this paper . V . A R C H I T E C TU R A L O V E RV I E W O F S AV E R SA VER is a reacti ve system based on the Monitor-Analyze- Plan-Execute (MAPE) control loop shown in Fig. 4. During the Monitor step, SA VER collects operational parameters by observing the running system. The parameters are ev aluate during the Analyze step; if the system needs to be reconfigured (e.g., because the observed response time of class c workflo ws exceeds the threshold R + c , for some c ), a new configuration is identified in the Plan step. W e use the QN model described in Section IV to ev aluate different configurations and identify an optimal server allocation such that all QoS constraints are sat- isfied. Finally , during the Execute step, the new configuration is applied to the system: WS instances are created or destroyed as needed by le veraging the IaaS Cloud. Unlike other reacti ve systems, SA VER can plan comple x reconfigurations, inv olving multiple additions/remov als of resources, in a single step. A. Monitoring System P arameters The QN model is used to estimate the execution time of workflo w types for different system configurations. T o analyze the QN it is necessary to know two parameters: ( i ) the arriv al rate of type c workflows, λ c , and ( ii ) the service demand D ck ( M ) of type c workflo ws on an instance of WS W k , for the current configuration M . The parameters abov e can be computed by monitoring the system over a suitable period of time. The arri v al rates λ c can be estimated by counting the number A c or arriv als of type c workflo ws which are submitted over the observation period of length T . Then λ c can be defined as λ c = A c /T . Fig. 4. SA VER Control Loop T ABLE II E QU A T I O NS F O R T H E Q N M O D E L O F F I G . 3 U k ( N ) = C X c =1 λ c D ck ( N ) (2) R ck ( N ) = D ck ( N ) 1 − U k ( N ) (3) R c ( N ) = K X k =1 N k R ck ( N ) (4) Measuring the service demands D ck ( M ) is a bit more difficult because they must not include the time spent by a request waiting to start service. If the WSs do not provide detailed timing information (e.g., via their ex ecution logs), it is possible to estimate D ck ( M ) from parameters which can be easily observed by the workflo w engine, that are the measured residence time R ck ( M ) and utilization U k ( M ) . W e use the equations shown in T able II, which hold for the open multiclass QN model in Fig. 3. These equations describe well known properties of open QN models, so they are gi ven here without any proof. The interested reader is referred to [21] for details. The residence time is the total time spent by a type c workflo w with one instance of WS W k , including waiting time and service time. The workflow engine can measure R ck ( M ) as the time elapsed from the instant a type c workflo w sends a request to one of the N k instances of W k , to the time the request is completed. The utilization U k ( M ) of an instance of W k can be obtained by the Cloud service dashboard (or measured on the computing nodes themselves). Using (3) the service demands can be expressed as D ck ( M ) = R ck ( M ) (1 − U k ( M )) (5) B. F inding a new configuration In order to find an approximate solution to the optimization problem (1), SA VER starts from the current configuration M , which may violate some response time constraints, and ex ecutes Algorithm 1. After collecting device utilizations, response times and arriv al rates, SA VER estimates the service demands D ck using Eq. (5). Then, SA VER identifies a ne w configuration N ≥ M 2 by calling the function A C Q U I R E (). The new configuration N is computed by greedily adding new instances to bottleneck WSs. 2 N ≥ M iff N k ≥ M k for all k = 1 , . . . , K Algorithm 1 The SA VER Algorithm Require: R + c : Maximum response time of type c workflo ws 1: Let M be the initial configuration 2: loop 3: Monitor R ck ( M ) , U k ( M ) , λ c 4: for all c := 1 , . . . , C ; k := 1 , . . . , K do 5: Compute D ck ( M ) using Eq. (5) 6: N := Acquire ( M , λ , D ( M ) , U ( M )) 7: for all c := 1 , . . . , C ; k := 1 , . . . , K do 8: Compute D ck ( N ) and U k ( N ) using Eq. (7) and (8) 9: N 0 := Release ( N , λ , D ( N ) , U ( N )) 10: Apply the new configuration N 0 to the system 11: M := N 0 { Set N 0 as the current configuration M } The QN model is used to estimate response times as instances are added: no actual resources are instantiated from the Cloud service at this time. The configuration N returned by the function A C QU I R E () does not violate any constraint, but might contain too many WS instances. Thus, SA VER inv okes the function R E L E A S E () which computes another configuration N 0 ≤ N by removing redundant instances, ensuring that no constraint is violated. T o call procedure R E L E A S E () we need to estimate the service demands D ck ( N ) and utilizations U k ( N ) with configuration N . These can be easily computed from the measured values for the current configuration M . After both steps abo ve, N 0 becomes the new current con- figuration: WS instances are created or terminated where necessary by acquiring or releasing hosts from the Cloud infrastructure. Let us illustrate the functions A C Q U I R E () and R E L E A S E () in detail. a) Adding instances: Function A C Q U I R E () is described by Algorithm 2. Gi ven the system parameters and config- uration N , which might violate some or all response time constraints, the function returns a new configuration N 0 which is estimated not to violate any constraint. At each iteration, we identify the class b whose workflo ws have the maximum relativ e violation of the response time limit (line 2); response times are estimated using Eq. (9) in the Appendix. Then, we identify the WS W j such that adding one more instance to it produces the maximum reduction in the class b response time (line 3). The configuration N is then updated by adding one instance to W j (line 4); the updated configuration is N + 1 j 3 . The loop terminates when no workload type is estimated to violate its response time constraint. T ermination of Algorithm 2 is guaranteed by the fact that function R c ( N ) is monotonically decreasing (Lemma 1 in the Appendix). Thus, R c ( N + 1 j ) < R c ( N ) for all c . b) Removing instances: The function R EL E A S E (), de- scribed by Algorithm 3, is used to deallocate (release) WS instances from an initial configuration N which does not 3 1 j is a vector with K elements, whose j -th element is one and all others are set to zero Algorithm 2 Acquire ( N , λ , D ( N ) , U ( N )) → N 0 Require: N System configuration Require: λ Current arri v al rates of workflows Require: D ( N ) Service demands at configuration N Require: U ( N ) Utilizations at configuration N Ensure: N New system configuration 1: while  R c ( N ) > R + c for any c  do 2: b := arg max c  R c ( N ) − R + c R + c     c = 1 , . . . , C  3: j := arg max k { R b ( N ) − R b ( N + 1 k ) | k = 1 , . . . , K } 4: N := N + 1 j 5: Return N violate any response time constraint. The function implements a greedy strategy , in which a WS W j is selected at each step, and its number of instances is reduced by one. Reducing the number of instances N j of W j is not possible if, either ( i ) the reduction would violate some constraint, or ( ii ) the reduction would cause the utilization of some WS instances to become greater than one (see Eq. (11) in the Appendix). W e start by defining the set S containing the index of WSs whose number of instances can be reduced without exceed- ing the processing capacity (line 3). Then, we identify the workflo w class d with the maximum (relative) response time (line 5). Finally , we identify the value j ∈ S such that removing one instance of W j produces the minimum increase in the response time of class d workflows (line 6). The rationale is the follo wing. T ype d workflows are the most likely to be affected by the removal of one WS instance, because their relativ e response time (before the removal) is the highest among all workflow types. Once the “critical” class d has been identified, we try to remov e an instance from the WS j which causes the smallest increase of class d response time. Since response time increments are additi ve (see Appendix), if the remov al of an instance of W j violates some constraints, no further attempt should be done to consider W j , and we remove j from the candidate set S . From the discussion abov e, we observe that function R E - L E A S E () computes a P ar eto-optimal solution N . This means that there exists no solution N 0 ≤ N such that R c ( N 0 ) ≤ R + c . V I . N U M E R I C A L R E S U L T S W e performed a set of numerical simulation experiments to assess the effecti veness of SA VER ; results will be described in this section. W e implemented Algorithms 1, 2 and 3 using GNU Octav e [22], an interpreted language for numerical computations. In the first experiment we considered K = 10 W eb Services and C = 5 workflow types. Service demands D ck hav e been randomly generated, in such a way that class c workflo ws have service demands which are uniformly distributed in [0 , c/C ] . Thus, class 1 workflo ws hav e lowest average service demands, while type C workflo ws hav e highest demands. The system has been simulated for T = 200 discrete steps t = 1 , . . . , T ; Algorithm 3 Release ( N , λ , D ( N ) , U ( N )) → N 0 Require: N System configuration Require: λ Current arri v al rates of workflows Require: D ( N ) Service demands at configuration N Require: U ( N ) Utilizations at configuration N Ensure: N 0 New system configuration 1: for all k := 1 , . . . , K do 2: Nmin k := N k P C c =1 λ c D ck ( N ) 3: S := { k | N k > Nmin k } 4: while ( S 6 = ∅ ) do 5: d := arg min c  R + c − R c ( N ) R + c     c = 1 , . . . , C  6: j := arg min k  R c ( N − 1 k ) − R + c   k ∈ S  7: if  R c ( N − 1 j ) > R + c for any c  then 8: S := S \ { j } { No instance of W j can be removed } 9: else 10: N := N − 1 j 11: if ( N j = Nmin j ) then 12: S := S \ { j } 13: Return N Fig. 5. Simulation results each step corresponds to a time interval of length W , long enough to amortize the reconfiguration costs. Arriv al rates λ ( t ) at step t hav e been generated according to a fractal model, starting from a randomly perturbed sinusoidal pattern to mimic periodic fluctuations. Each workflo w type has a different period. Figure 5 shows the results of the simulation. The top part of the figure shows the estimated response time R c ( N ) (thick lines) and upper limit R + c (thin horizontal lines) for each class c = 1 , . . . , C . The middle part of the figure shows the arriv al rates λ c ( t ) for each class c = 1 , . . . , C ; note that arriv al rates hav e been stacked for clarity , such that the height of each individual band corresponds to the value λ c ( t ) from c = 1 (bottom) to c = 5 (top). The total height of the middle graph is the total arriv al rate of all workflo w types. Finally , each band of the bottom part of Figure 5 shows the number N k of instances of WS W k , from k = 1 (bottom) to k = 10 (top); again, the height of all areas represents the total number of resources which are allocated at each simulation step. As can be seen, the number of allocated resources closely follows the workload pattern. W e performed additional experiments in order to assess the ef ficiency of allocations produced by SA VER . In par - ticular , we are interested in estimating the reduction in the number of allocated instances produced by SA VER . T o do so, we considered different scenarios for all combinations of C ∈ { 10 , 15 , 20 } workflo w types and K ∈ { 20 , 40 , 60 } W eb Services. Each simulation has been executed for T = 200 steps; ev erything else (requests arriv al rates, service demands) hav e been generated as described above. Results are reported in T able III. Columns labeled C and K show the number of workflo w types and W eb Services, respec- tiv ely . Columns labeled Iter . A C QU I R E () contain the maximum and a verage number of iterations performed by procedure A C Q U I R E () (Algorithm 2); columns labeled Iter . R E L E A S E () contain the same information for procedure R E L E A S E () (Al- gorithm 3). Then, we report the minimum, maximum and total number of resources allocated by SA VER during the simulation run. Formally , let S t denote the total number of WS instances allocated at simulation step t ; then Min. instances = min t { S t } Max. instances = max t { S t } T otal instances = X t S t Column labeled WS Instances (static) shows the number of instances that would have been allocated by provisioning for the worst case scenario; this value is simply T × max t { S t } . The last column shows the ratio between the total number of WS instances allocated by SA VER , and the number of instances that would have been allocated by a static algorithm to satisfy the worst-case scenario; lower values are better . The results show that SA VER allocates between 64%– 72% of the instances required by the worst-case scenario. As previously observed, if the IaaS provider charges a fixed price for each instance allocated at each simulation step, then SA VER allows a consistent reduction of the total cost, while still maintaining the negotiated SLA. V I I . C O N C L U S I O N S A N D F U T U R E W O R K S In this paper we presented SA VER , a QoS-aware algorithm for executing workflo ws in volving W eb Services hosted in a Cloud en vironment. The idea underlying SA VER is to selec- tiv ely allocate and deallocate Cloud resources to guarantee that the response time of each class of workflows is less than a negotiated threshold. The capability of dri ving the dynamic resource allocation is achie ved though the use of a feedback control loop. A passive monitor collects information that is used to identify the minimum number of instances of each WS which should be allocated to satisfy the response time constraints. The system performance at dif ferent config- urations is estimated using a QN model; the estimates are used to feed a greedy optimization strategy which produces the new configuration which is finally applied to the system. Simulation experiments show that SA VER can ef fecti vely react to workload fluctuations by acquiring/releasing resources as needed. The methodology proposed in this paper can be improv ed along se veral directions. In particular , in this paper we as- sumed that all requests of all classes are evenly distributed across the WS instances. While this assumption makes the system easier to analyze and implement, more effecti ve allo- cations could be produced if we allow individual workflo w classes to be routed to specific WS instances. This extension would add another lev el of complexity to SA VER , which at the moment is under in vestigation. W e are also exploring the use of forecasting techniques as a mean to trigger resource al- location and deallocation proactiv ely . Finally , we are working on the implementation of our methodology on a real testbed, to assess its effecti veness through a more comprehensive set of real experiments. A P P E N D I X Let M be the current system configuration; let us assume that, under configuration M , the observed arriv al rates are λ = ( λ 1 , . . . , λ C ) and service demands are D ck ( M ) . Then, for an arbitrary configuration N , we can combine Equations (3) and (4) to get: R c ( N ) = K X k =1 N k D ck ( N ) 1 − U k ( N ) (6) The current total class c service demand on all instances of W k is M k D ck ( M ) , hence we can express service demands and utilizations of individual instances for an arbitrary config- uration N as: D ck ( N ) = M k N k D ck ( M ) (7) U k ( N ) = M k N k U k ( M ) (8) Thus, we can rewrite (6) as R c ( N ) = K X k =1 D ck ( M ) M k N k N k − U k ( M ) M k (9) which allows us to estimate the response time R c ( N ) of class c workflows, giv en information collected by the monitor for the current configuration M . From (2) and (7) we get: U k ( N ) = M k N k C X c =1 λ c D ck ( M ) (10) T ABLE III S I MU L A T IO N R E SU LT S F O R D I FF E RE N T S C E NA R IO S Iter . A C Q UI R E () Iter . R EL E A S E () WS Instances (dynamic) C K max avg max avg min max tot WS Instances (static) Dynamic/Static 10 20 14 1.30 15 2.53 36 127 16589 25400 0.65 10 40 22 2.43 19 3.81 76 257 33103 51400 0.64 10 60 35 3.54 35 5.12 122 378 50211 75600 0.66 15 20 10 1.27 13 2.56 78 178 23536 35600 0.66 15 40 23 2.20 26 3.68 138 340 44843 68000 0.66 15 60 34 3.20 44 5.04 239 526 68253 105200 0.65 20 20 9 1.19 13 2.50 114 206 28792 41200 0.70 20 40 24 2.33 29 4.00 215 408 57723 81600 0.71 20 60 21 3.00 30 4.89 347 602 86684 120400 0.72 Since by definition the utilization of any WS instance must be less than one, we can use (10) to define a lower bound on the number N k of instances of W k as: N k ≥ M k C X c =1 λ c D ck ( M ) (11) The following lemma can be easily prov ed: Lemma 1: The response time function R c ( N ) is mono- tonically decreasing: for any two configurations N 0 and N 00 such that N 0 k ≤ N 00 k for all k = 1 , . . . , K , we hav e that R c ( N 0 ) ≥ R c ( N 00 ) Pr oof: If we extend R c ( N ) to be a continuous function, its partial deriv ativ e is ∂ R c ∂ N k = − M 2 k U k ( M ) D ck ( M ) ( N k − U k ( M ) M k ) 2 (12) which is less than zero for any k for which the utilization U k ( M ) and service demand D ck ( M ) are nonzero. Hence, function R c ( N ) is decreasing. Note that, according to Eq. (9), response time increments are additiv e. This means that R c ( N ) − R c ( N + 1 j ) = ∆ j and R c ( N ) − R c ( N + 1 i ) = ∆ i imply R c ( N ) − R c ( N + 1 i + 1 j ) = ∆ i + ∆ j R E F E R E N C E S [1] “Amazon Elastic Compute Cloud (Amazon EC2). ” [Online]. A v ailable: http://aws.amazon.com/ec2/ [2] “Xen Cloud Platform. ” [Online]. A vailable: http://www .xen.org/ [3] “IBM Smart Cloud. ” [Online]. A vailable: http://www .ibm.com/ibm/ cloud/ [4] “Windo ws Azure. ” [Online]. A vailable: http://www .microsoft.com/azure [5] Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: state-of-the-art and research challenges, ” Journal of Internet Services and Applications , vol. 1, pp. 7–18, 2010. [6] “Salesforce CRM. ” [Online]. A vailable: http://www .salesforce.com/ platform/ [7] A. Alves, A. Arkin, S. Askary , C. Barreto, B. Bolch, F . Curbera, M. Ford, Y . Goland, A. Guizar , N. Kartha, C. K. Lui, R. Khalaf, D. K ¨ onig, M. Marin, V . Mehta, S. Thatte, D. van der Rijn, P . Y endluri, and A. Y iu, “W eb services business process execution language version 2.0, ” OASIS Standard, Apr .7 2007. [Online]. A vailable: http://docs.oasis- open.org/wsbpel/2.0/wsbpel- v2.0.pdf [8] B. Urgaonkar , P . Shenoy , A. Chandra, P . Goyal, and T . W ood, “ Agile dynamic provisioning of multi-tier internet applications, ” ACM T rans. Auton. Adapt. Syst. , vol. 3, no. 1, pp. 1–39, 2008. [9] M. Litoiu, M. W oodside, J. W ong, J. Ng, and G. Iszlai, “ A business driv en cloud optimization architecture, ” in Proc. of the 2010 ACM Symp. on Applied Computing , ser . SA C ’10. A CM, 2010, pp. 380–385. [10] E. Kalyvianaki, T . Charalambous, and S. Hand, “Self-adaptiv e and self-configured cpu resource provisioning for virtualized servers using kalman filters, ” in ICAC . A CM, 2009, pp. 117–126. [11] J. O. Kephart, H. Chan, R. Das, D. W . Levine, G. T esauro, F . L. R. III, and C. Lefurgy , “Coordinating multiple autonomic managers to achieve specified po wer-performance tradeoffs, ” in ICA C . IEEE Computer Society , 2007, p. 24. [12] R. Calinescu, “Resource-definition policies for autonomic computing, ” in ICAS . IEEE Computer Society , 2009, pp. 111–116. [13] B. Urgaonkar , G. Pacifici, P . J. Shenoy , M. Spreitzer, and A. N. T antawi, “ Analytic modeling of multitier internet applications, ” TWEB , vol. 1, no. 1, 2007. [14] X. Zhu, D. Y oung, B. J. W atson, Z. W ang, J. Rolia, S. Singhal, B. Mc- Kee, C. Hyser , D. Gmach, R. Gardner, T . Christian, and L. Cherkasov a, “1000 islands: an integrated approach to resource management for virtualized data centers, ” Cluster Computing , vol. 12, no. 1, pp. 45–57, 2009. [15] J. Li, J. Chinneck, M. W oodside, M. Litoiu, and G. Iszlai, “Performance model driv en qos guarantees and optimization in clouds, ” in Pr oceedings of the 2009 ICSE W orkshop on Software Engineering Challenges of Cloud Computing , ser . CLOUD ’09. W ashington, DC, USA: IEEE Computer Society , 2009, pp. 15–22. [16] G. Jung, M. A. Hiltunen, K. R. Joshi, R. D. Schlichting, and C. Pu, “Mistral: Dynamically managing power , performance, and adaptation cost in cloud infrastructures, ” in ICDCS . IEEE Computer Society , 2010, pp. 62–73. [17] S. Ferretti, V . Ghini, F . Panzieri, M. Pellegrini, and E. T urrini, “Qos- aware clouds, ” in Pr oceedings of the 2010 IEEE 3r d International Confer ence on Cloud Computing , ser . CLOUD ’10. IEEE Computer Society , 2010, pp. 321–328. [18] N. Huber , F . Brosig, and S. K ounev , “Model-based self-adaptiv e resource allocation in virtualized environments, ” in SEAMS ’11 . IEEE Computer Society , 2011. [19] Y . O. Y azir, C. Matthe ws, R. Farahbod, S. Ne ville, A. Guitouni, S. Ganti, and Y . Coady , “Dynamic resource allocation in computing clouds using distributed multiple criteria decision analysis, ” in Pr oceedings of the 2010 IEEE 3rd International Confer ence on Cloud Computing , ser. CLOUD ’10. IEEE Computer Society , 2010, pp. 91–98. [20] G. Canfora, M. Di Penta, R. Esposito, and M. L. V illani, “Qos-aware replanning of composite web services, ” in Pr oceedings of the IEEE International Conference on W eb Services , ser. ICWS ’05. W ashington, DC, USA: IEEE Computer Society , 2005, pp. 121–129. [Online]. A vailable: http://dx.doi.org/10.1109/ICWS.2005.96 [21] E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik, Quanti- tative System P erformance: Computer System Analysis Using Queueing Network Models . Prentice Hall, 1984. [22] J. W . Eaton, GNU Octave Manual . Network Theory Limited, 2002.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment