A Framework for QoS-aware Execution of Workflows over the Cloud

A Frame work for QoS-a ware Ex ecution of W orkﬂo ws o ver the Cloud Moreno Marzolla Univ ersit ` a di Bologna Dipartimento di Scienze dell’Informazione Mura A. Zamboni 7, I-40127 Bologna, Italy Email: marzolla@cs.unibo.it Raf faela Mirandola Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da V inci, I-20133 Milano, Italy Email: mirandola@elet.polimi.it Abstract —The Cloud Computing paradigm is pr oviding system architects with a new powerful tool for building scalable applica- tions. Clouds allow allocation of resour ces on a ”pay-as-you-go” model, so that additional resources can be requested during peak loads and released after that. Howev er , this ﬂexibility asks for appropriate dynamic reconﬁguration strategies. In this paper we describe SA VER (qoS-A war e workﬂows oVER the Cloud), a QoS- aware algorithm for executing workﬂows involving W eb Services hosted in a Cloud en vironment. SA VER allows execution of arbitrary workﬂows subject to response time constraints. SA VER uses a passive monitor to identify workload ﬂuctuations based on the observed system response time. The information collected by the monitor is used by a planner component to identify the minimum number of instances of each W eb Service which should be allocated in order to satisfy the response time constraint. SA VER uses a simple Queueing Network (QN) model to identify the optimal resour ce allocation. Speciﬁcally , the QN model is used to identify bottlenecks, and pr edict the system performance as Cloud resources are allocated or released. The parameters used to evaluate the model are those collected by the monitor , which means that SA VER does not require any particular knowledge of the W eb Services and workﬂows being executed. Our approach has been validated through numerical simulations, whose results are reported in this paper . I . I N T R O D U C T I O N The emerging Cloud computing paradigm is rapidly gaining consensus as an alternativ e to traditional IT systems, as ex empliﬁed by the Amazon EC2 [1], Xen [2], IBM Cloud [3], and Microsoft Cloud [4]. Informally , the Cloud computing paradigm allo ws computing resources to be seen as a utility , av ailable on demand. The term “resource” may represent infrastructure, platforms, software, services, or storage. In this vision, the Cloud provider is responsible to make the resources av ailable to the users as they request it. Cloud services can be grouped into three cate gories [5]: Infrastructure as a Service (IaaS), providing low-le vel re- sources such as V irtual Machines (VMs) (e.g., Amazon EC2 [1]); Platform as a Service (PaaS), pro viding soft- ware de velopment frame works (e.g., Microsoft Azure [4]); and Software as a Service (SaaS), providing applications (e.g., Salesforce.com [6]). The Cloud provider has the responsibility to manage the resources it provides (being them VM instances, programming framew orks or applications) so that the user requirements and the desired Quality of Service (QoS) are satisﬁed. Cloud users are usually char ged according to the amount of resources they consume (e.g., some amount of money per hour of CPU usage). In this way , customers can av oid capital expenditures by using Cloud resources on a “pay-as-you-go” model. Users QoS requirements (e.g., timeliness, a vailability , secu- rity) are usually the result of a negotiation process engaged between the resource provider and the user , which culminates in the deﬁnition of a Service Lev el Agreement (SLA) concern- ing their respecti ve obligations and expectations. Guarantee- ing SLAs under v ariable workloads for dif ferent application and service models is extremely challenging: Clouds are char- acterized by high load v ariance, and users have heterogeneous and competing QoS requirements. In this paper we present SA VER (qoS-A ware workﬂows oVER the Cloud), a workﬂo w engine provided as a SaaS. The engine allows different types of workﬂo ws to be executed ov er a set of W eb Services (WSs). W orkﬂows are described using some appropriate notations (e.g., using the WS-BPEL [7] workﬂo w description language). The workﬂo w engine takes care of interacting with the appropriate WSs as described in the workﬂo w . In our scenario, users can negotiate QoS requirements with the service provider; speciﬁcally , for each type c of workﬂow , the user may request that the average execution time of the whole workﬂo w should not exceed a threshold R + c . Once the QoS requirements have been negotiated, the user can submit any number of workﬂows of the dif ferent types. Both the submission rate and the time spent by the workﬂows on each WS can ﬂuctuate ov er time. T raditionally , when deciding the amount of resources to be dedicated to applications, service providers considered worst-case scenarios, resulting in resource over -provisioning. Since the worst-case scenario rarely happens, a static system deployment results in a processing infrastructure which is largely under-utilized. T o increase the utilization of resources while meeting the requested SLA, SA VER uses an underlying IaaS Cloud to provide computational power on demand. The Cloud hosts multiple instances of each WS, so that the workload can be balanced across the instances. If a WS is heavily used, SA VER will increase the number of instances by requesting new resources from the Cloud. In this way , the response time Fig. 1. Illustration of the bottleneck shift issue of that WS can be reduced, reducing the total execution time of workﬂo ws as well. SA VER monitors the workﬂow engine and detects when some constraints are being violated. System reconﬁgurations are triggered periodically , when instances are added or remov ed where necessary . Despite its conceptual simplicity , the idea above is quite challenging to implement in practice. T o better illustrate the problem, let us consider the situation shown in Fig, 1, which is modeled upon a similar example from [8]. W e have three W eb Services W 1 , W 2 , W 3 which are used by two types of workﬂo ws. Instances of the ﬁrst type arri ve at a rate of 2 r e q /s , and ex ecute operations on W 1 , W 2 and W 3 . Instances of the second workﬂo w type arri ve at a rate of 1 r e q /s and only use W 1 and W 3 . Each WS has a maximum capacity , which corresponds to the maximum request rate it can handle. W eb Services 1 and 3 hav e a maximum capacity of 2 r e q /s , while WS 2 has a capacity of 3 r e q /s . In Fig. 1(a) the capacity of W 1 is exceeded, because the aggregate arriv al rate (3 r e q /s ) is greater than its processing capacity . Thus, a queue of unprocessed in vocations of W 1 builds up, until requests start to timeout and are dropped at a rate of 1 r e q /s . T o eliminate the bottleneck, a possible solution is to create multiple instances of the bottleneck WS on dif ferent servers, and balance the load across all instances. If we apply this strategy and create two instances of W 1 , we get the situation shown in Fig. 1(b): the aggregate processing capacity of W 1 is no w 4 r e q /s , and thus W eb Service 1 is no longer the bottleneck. Ho we ver , the bottleneck shifts to W 3 , which now sees an aggregate arriv al rate of 3 r e q /s and has a capacity of 2 r e q /s . The situation above demonstrates the bottlenec k shift phe- nomenon: ﬁxing a bottleneck may create another bottleneck at a different place. Thus, satisfying QoS constraints on systems subject to v ariable workloads is challenging, because identifying the system conﬁguration which satisﬁes all con- straints might inv olve multiple reconﬁgurations of indi vidual components (in our scenario, adding WS instances). If the reconﬁguration is implemented in a purely reactiv e manner , each step must be applied sequentially in order to monitor its impact and plan for the ne xt step. This is clearly inef ﬁcient because adaptation would be exceedingly slow . In general, the response time at a speciﬁc WS depends both on the number of instances of that W eb Service, and also on the intensity of other workload classes (workﬂo w types). Thus, a suitable system performance model must be used in order to predict the response time of a gi ven conﬁguration. The performance model can be used to dri ve the reconﬁguration process proactiv ely: dif ferent system conﬁgurations can be ev aluated quickly , and multiple reconﬁguration steps can be planned in adv ance. SA VER uses a open, multiclass Queueing Network (QN) model to represent resource contention by multiple independent request ﬂows, which is crucial in our scenario. The parameters which are needed to ev aluate the QN model can be easily obtained by passi vely monitoring the running system. The performance model is used within a greedy strategy which identiﬁes an approximate solution to the optimization problem minimizing the number of WS instances while respecting the SLA. Structur e of this paper: The remainder of this paper is organized as follows. In Section II we re view the scientiﬁc literature and compare SA VER with related works. In Sec- tion III we giv e a precise formulation of the problem we are addressing. In Section IV we describe the Queueing Network performance model of the Cloud-based workﬂow engine. SA VER will be fully described in Section V, including the high-lev el architecture and the details of the reconﬁguration algorithms. The effecti veness of SA VER hav e been ev aluated by means of simulation experiments, whose results will be discussed in Section VI. Finally , conclusions and future works are presented in Section VII. In order to mak e this paper self-contained without sacriﬁcing clarify , we relegated the mathematical details of the analysis of the performance model in a separate Appendix. I I . R E L AT ED W O R K S Sev eral research contributions hav e previously addressed the issue of optimizing the resource allocation in cluster-based service centers. Recently , with the emerging of virtualiza- tion approaches and Cloud computing, additional research on automatic resource management has been conducted. In this section we brieﬂy revie w some recent results; some of them take advantage of control theory-based feedback loops [9], [10], machine learning techniques [11], [12], or utility-based optimization techniques [13], [14]. When moving to virtualized en vironments the resource allocation problem becomes even more complex because of the introduction of virtual resources [14]. Se veral approaches hav e been proposed for QoS and resource management at run- time [9], [15]–[19]. The approach presented in [15] describes a method for achieving optimization in Clouds by using performance mod- els all along the dev elopment and operation of the applications running in the Cloud. The proposed optimization aims at max- imizing proﬁts in the Cloud by guaranteeing the QoS agreed in the SLAs taking into account a large variety of workloads. A layered Cloud architecture taking into account different stakeholders is presented in [9]. The architecture supports self- management based on adapti ve feedback control loops, present at each layer , and on a coordination activity between the different loops. Mistral [16] is a resource managing frame work with a multi-lev el resource allocation algorithm considering reallocation actions based mainly on adding, removing and/or migrating virtual machines, and shutdown or restart of hosts. This approach is based on the usage of Layered Queuing Network (LQN) performance model. It tries to maximize the ov erall utility taking into account several aspects lik e power consumption, performance and transient costs in its reconﬁg- uration process. In [18] the authors present an approach to self-adaptiv e resource allocation in virtualized environments based on online architecture-level performance models. The online performance prediction allow estimation of the effects of changes in user workloads and of possible reconﬁguration actions. Y azir et al. [19] introduces a distrib uted approach for dynamic autonomous resource management in computing Clouds, performing resource conﬁguration using through Mul- tiple Criteria Decision Analysis. W ith respect to these works, SA VER lies in the same research line fostering the usage of models at runtime to driv e the QoS-based system adaptation. SA VER uses an efﬁcient modeling and analysis technique that can then be used at runtime without undermining the system beha vior and its ov erall performance. Ferretti et al. propose in [17] a middle ware architecture enabling a SLA-dri ven dynamic conﬁguration, management and optimization of Cloud resources and services. The ap- proach makes use of a load balancer that distributes the workload among the av ailable resources. When the perceived QoS de viates from the SLA, the platform is dynamically reconﬁgured by acquiring new resources from the Cloud. On the other hand, if resources under -utilization is detected, the system triggers a reconﬁguration to release those unused resources. This approach is purely reactiv e and considers a single-tier application, while SA VER works for an arbitrary number of WSs and uses a performance model to plan comple x reconﬁgurations in a single step. Canfora et al. [20] describe a QoS-aware service discov ery and late-binding mechanism which is able to automatically adapt to changes of QoS attributes in order to meet the SLA. The authors consider the execution of workﬂows ov er a set of WSs, such that each WS has multiple functionally equiv alent implementations. Genetic Algorithms are use to bind each WS to one of the a vailable implementations, so that a ﬁtness function is maximized. The binding is done at run-time, and depends on the values of QoS attributes which are monitored by the system. It should be observed that in SA VER we consider a dif ferent scenario, in which each WS has just one implementation which ho wever can be instantiated multiple times. The goal of SA VER is to satisfy a speciﬁc QoS requirement (mean execution time of workﬂo ws below a gi ven threshold) with the minimum number of instances. I I I . P R O B L E M F O R M U L A T I O N SA VER is a workﬂow engine whose general structure is depicted in Fig. 2: it recei ves workﬂows from external clients, and executes them over a set of K WS W 1 , . . . , W K . W ork- ﬂows can be of C different types (or classes); for each class c = 1 , . . . , C , clients deﬁne a maximum allowed completion time R + c . This means that an instance of class c workﬂow must be completed, on a verage, in time less than R + c . Ne w workﬂo w classes can be created at any time; when a ne w class is created, its maximum response time is negotiated with the workﬂo w service provider . W e denote with λ c the av erage arriv al rate of class c workﬂo ws. Arriv al rates can change over time 1 . Since all WSs are shared between the workﬂows, the completion time of a workﬂo w depends both on arriv al rates λ = ( λ 1 , . . . , λ C ) , and on the utilization of each WS. In order to satisfy the response time constraints, the system must adapt to cope with ﬂuctuations of the workload. T o do so, SA VER relies on a IaaS Cloud which maintains multiple instances of each WS. Run-time monitoring information is sent by all WSs back to the workﬂow engine to dri ve the adaptation process. W e denote with N k the number of instances of WS W k ; a system conﬁguration N = ( N 1 , . . . , N K ) is an inte- ger vector representing the number of allocated instances of each WS. When a workﬂo w interacts with W k , it is bound to one of the N k instances so that the requests are e venly distributed. When the workload intensity increases, additional instances are created to eliminate the bottlenecks; when the workload decreases, surplus instances are shut down and released. The goal of SA VER is to minimize the total number of WS instances while maintaining the mean ex ecution time of type c workﬂo ws below the threshold R + c , c = 1 , . . . , C . Formally , we want to solve the following optimization problem: minimize f ( N ) = K X k =1 N k (1) subject to R c ( N ) ≤ R + c for all c = 1 , 2 , . . . , C N i ∈ { 1 , 2 , 3 , . . . } 1 In order to simplify the notation, we write λ c instead of λ c ( t ) . In general, we will omit explicit reference to t for all time-dependent parameters. Fig. 2. System model where R c ( N ) is the mean ex ecution time of type c workﬂows when the system conﬁguration is N = ( N 1 , . . . , N K ) . If the IaaS Cloud which hosts WS instances is managed by some third-party organization, then reducing the number of activ e instances reduces the cost of the workﬂow engine. I V . S Y S T E M P E R F O R M A N C E M O D E L Before illustrating the details of SA VER , it is important to describe the QN performance model which is used to plan a system reconﬁguration. W e model the system of Fig. 2 using the open, multiclass QN model [21] shown in Fig. 3. A QN model is a set of queueing centers, which in our case are FIFO queues attached to a single server . Each server represents a single WS instance; thus, W k is represented by N k queueing centers, for each k = 1 , . . . , K . N k can change over time, as resources are added or remov ed from the system. In our QN model there are C different classes of requests, which are generated outside the system. Each request repre- sents a workﬂo w , thus workﬂow types are directly mapped to QN request classes. In order to simplify the analysis of the model, we make the simplifying assumption that the inter - arriv al time of class c requests is exponentially distrib uted with arriv al rate λ c . This means that a ne w workﬂo w of type c is submitted, on av erage, every 1 /λ c time units. The interaction of a type c workﬂo w with WS W k is mod- eled as a visit of a class c request to one of the N k queueing centers representing W k . W e denote with R ck ( N ) the total time ( residence time ) spent by type c workﬂows on one of the N k instances of W k for a gi ven conﬁguration N . The residence time is the sum of two terms: the service demand D ck ( N ) (average time spent by a WS instance executing the request) and queueing delay (time spent by a request in the waiting queue). The QN model allo ws multiple visits to the same queueing center , because the same WS can be executed multiple times by the same workﬂo w . The residence time and service demands are the sum of residence and service time of all in vocations of the same WS instance. The utilization U k ( N ) of an instance of W k is the fraction of time the instance is busy processing requests. If the workload is evenly balanced, then both the residence time R ck ( N ) and the utilization U k ( N ) are almost the same for all N k instances of W k . Fig. 3. Performance model based on an open, multiclass Queueing Network T ABLE I S Y MB O L S U S E D I N T H IS PA PE R C Number of workﬂow types K Number of W eb Services λ V ector of per-class Arriv al rates M Current system conﬁguration N , N 0 Arbitrary system conﬁgurations R ck ( N ) Residence time of type c workﬂows on an instance of W k D ck ( N ) Service demand of type c workﬂows on an instance of W k R c ( N ) Response time of type c workﬂows U k ( N ) Utilization of an instance of W k R + c Maximum allo wed response time for type c workﬂows T able I summarizes the symbols used in this paper . V . A R C H I T E C TU R A L O V E RV I E W O F S AV E R SA VER is a reacti ve system based on the Monitor-Analyze- Plan-Execute (MAPE) control loop shown in Fig. 4. During the Monitor step, SA VER collects operational parameters by observing the running system. The parameters are ev aluate during the Analyze step; if the system needs to be reconﬁgured (e.g., because the observed response time of class c workﬂo ws exceeds the threshold R + c , for some c ), a new conﬁguration is identiﬁed in the Plan step. W e use the QN model described in Section IV to ev aluate different conﬁgurations and identify an optimal server allocation such that all QoS constraints are sat- isﬁed. Finally , during the Execute step, the new conﬁguration is applied to the system: WS instances are created or destroyed as needed by le veraging the IaaS Cloud. Unlike other reacti ve systems, SA VER can plan comple x reconﬁgurations, inv olving multiple additions/remov als of resources, in a single step. A. Monitoring System P arameters The QN model is used to estimate the execution time of workﬂo w types for different system conﬁgurations. T o analyze the QN it is necessary to know two parameters: ( i ) the arriv al rate of type c workﬂows, λ c , and ( ii ) the service demand D ck ( M ) of type c workﬂo ws on an instance of WS W k , for the current conﬁguration M . The parameters abov e can be computed by monitoring the system over a suitable period of time. The arri v al rates λ c can be estimated by counting the number A c or arriv als of type c workﬂo ws which are submitted over the observation period of length T . Then λ c can be deﬁned as λ c = A c /T . Fig. 4. SA VER Control Loop T ABLE II E QU A T I O NS F O R T H E Q N M O D E L O F F I G . 3 U k ( N ) = C X c =1 λ c D ck ( N ) (2) R ck ( N ) = D ck ( N ) 1 − U k ( N ) (3) R c ( N ) = K X k =1 N k R ck ( N ) (4) Measuring the service demands D ck ( M ) is a bit more difﬁcult because they must not include the time spent by a request waiting to start service. If the WSs do not provide detailed timing information (e.g., via their ex ecution logs), it is possible to estimate D ck ( M ) from parameters which can be easily observed by the workﬂo w engine, that are the measured residence time R ck ( M ) and utilization U k ( M ) . W e use the equations shown in T able II, which hold for the open multiclass QN model in Fig. 3. These equations describe well known properties of open QN models, so they are gi ven here without any proof. The interested reader is referred to [21] for details. The residence time is the total time spent by a type c workﬂo w with one instance of WS W k , including waiting time and service time. The workﬂow engine can measure R ck ( M ) as the time elapsed from the instant a type c workﬂo w sends a request to one of the N k instances of W k , to the time the request is completed. The utilization U k ( M ) of an instance of W k can be obtained by the Cloud service dashboard (or measured on the computing nodes themselves). Using (3) the service demands can be expressed as D ck ( M ) = R ck ( M ) (1 − U k ( M )) (5) B. F inding a new conﬁguration In order to ﬁnd an approximate solution to the optimization problem (1), SA VER starts from the current conﬁguration M , which may violate some response time constraints, and ex ecutes Algorithm 1. After collecting device utilizations, response times and arriv al rates, SA VER estimates the service demands D ck using Eq. (5). Then, SA VER identiﬁes a ne w conﬁguration N ≥ M 2 by calling the function A C Q U I R E (). The new conﬁguration N is computed by greedily adding new instances to bottleneck WSs. 2 N ≥ M iff N k ≥ M k for all k = 1 , . . . , K Algorithm 1 The SA VER Algorithm Require: R + c : Maximum response time of type c workﬂo ws 1: Let M be the initial conﬁguration 2: loop 3: Monitor R ck ( M ) , U k ( M ) , λ c 4: for all c := 1 , . . . , C ; k := 1 , . . . , K do 5: Compute D ck ( M ) using Eq. (5) 6: N := Acquire ( M , λ , D ( M ) , U ( M )) 7: for all c := 1 , . . . , C ; k := 1 , . . . , K do 8: Compute D ck ( N ) and U k ( N ) using Eq. (7) and (8) 9: N 0 := Release ( N , λ , D ( N ) , U ( N )) 10: Apply the new conﬁguration N 0 to the system 11: M := N 0 { Set N 0 as the current conﬁguration M } The QN model is used to estimate response times as instances are added: no actual resources are instantiated from the Cloud service at this time. The conﬁguration N returned by the function A C QU I R E () does not violate any constraint, but might contain too many WS instances. Thus, SA VER inv okes the function R E L E A S E () which computes another conﬁguration N 0 ≤ N by removing redundant instances, ensuring that no constraint is violated. T o call procedure R E L E A S E () we need to estimate the service demands D ck ( N ) and utilizations U k ( N ) with conﬁguration N . These can be easily computed from the measured values for the current conﬁguration M . After both steps abo ve, N 0 becomes the new current con- ﬁguration: WS instances are created or terminated where necessary by acquiring or releasing hosts from the Cloud infrastructure. Let us illustrate the functions A C Q U I R E () and R E L E A S E () in detail. a) Adding instances: Function A C Q U I R E () is described by Algorithm 2. Gi ven the system parameters and conﬁg- uration N , which might violate some or all response time constraints, the function returns a new conﬁguration N 0 which is estimated not to violate any constraint. At each iteration, we identify the class b whose workﬂo ws have the maximum relativ e violation of the response time limit (line 2); response times are estimated using Eq. (9) in the Appendix. Then, we identify the WS W j such that adding one more instance to it produces the maximum reduction in the class b response time (line 3). The conﬁguration N is then updated by adding one instance to W j (line 4); the updated conﬁguration is N + 1 j 3 . The loop terminates when no workload type is estimated to violate its response time constraint. T ermination of Algorithm 2 is guaranteed by the fact that function R c ( N ) is monotonically decreasing (Lemma 1 in the Appendix). Thus, R c ( N + 1 j ) < R c ( N ) for all c . b) Removing instances: The function R EL E A S E (), de- scribed by Algorithm 3, is used to deallocate (release) WS instances from an initial conﬁguration N which does not 3 1 j is a vector with K elements, whose j -th element is one and all others are set to zero Algorithm 2 Acquire ( N , λ , D ( N ) , U ( N )) → N 0 Require: N System conﬁguration Require: λ Current arri v al rates of workﬂows Require: D ( N ) Service demands at conﬁguration N Require: U ( N ) Utilizations at conﬁguration N Ensure: N New system conﬁguration 1: while  R c ( N ) > R + c for any c  do 2: b := arg max c  R c ( N ) − R + c R + c     c = 1 , . . . , C  3: j := arg max k { R b ( N ) − R b ( N + 1 k ) | k = 1 , . . . , K } 4: N := N + 1 j 5: Return N violate any response time constraint. The function implements a greedy strategy , in which a WS W j is selected at each step, and its number of instances is reduced by one. Reducing the number of instances N j of W j is not possible if, either ( i ) the reduction would violate some constraint, or ( ii ) the reduction would cause the utilization of some WS instances to become greater than one (see Eq. (11) in the Appendix). W e start by deﬁning the set S containing the index of WSs whose number of instances can be reduced without exceed- ing the processing capacity (line 3). Then, we identify the workﬂo w class d with the maximum (relative) response time (line 5). Finally , we identify the value j ∈ S such that removing one instance of W j produces the minimum increase in the response time of class d workﬂows (line 6). The rationale is the follo wing. T ype d workﬂows are the most likely to be affected by the removal of one WS instance, because their relativ e response time (before the removal) is the highest among all workﬂow types. Once the “critical” class d has been identiﬁed, we try to remov e an instance from the WS j which causes the smallest increase of class d response time. Since response time increments are additi ve (see Appendix), if the remov al of an instance of W j violates some constraints, no further attempt should be done to consider W j , and we remove j from the candidate set S . From the discussion abov e, we observe that function R E - L E A S E () computes a P ar eto-optimal solution N . This means that there exists no solution N 0 ≤ N such that R c ( N 0 ) ≤ R + c . V I . N U M E R I C A L R E S U L T S W e performed a set of numerical simulation experiments to assess the effecti veness of SA VER ; results will be described in this section. W e implemented Algorithms 1, 2 and 3 using GNU Octav e [22], an interpreted language for numerical computations. In the ﬁrst experiment we considered K = 10 W eb Services and C = 5 workﬂow types. Service demands D ck hav e been randomly generated, in such a way that class c workﬂo ws have service demands which are uniformly distributed in [0 , c/C ] . Thus, class 1 workﬂo ws hav e lowest average service demands, while type C workﬂo ws hav e highest demands. The system has been simulated for T = 200 discrete steps t = 1 , . . . , T ; Algorithm 3 Release ( N , λ , D ( N ) , U ( N )) → N 0 Require: N System conﬁguration Require: λ Current arri v al rates of workﬂows Require: D ( N ) Service demands at conﬁguration N Require: U ( N ) Utilizations at conﬁguration N Ensure: N 0 New system conﬁguration 1: for all k := 1 , . . . , K do 2: Nmin k := N k P C c =1 λ c D ck ( N ) 3: S := { k | N k > Nmin k } 4: while ( S 6 = ∅ ) do 5: d := arg min c  R + c − R c ( N ) R + c     c = 1 , . . . , C  6: j := arg min k  R c ( N − 1 k ) − R + c   k ∈ S  7: if  R c ( N − 1 j ) > R + c for any c  then 8: S := S \ { j } { No instance of W j can be removed } 9: else 10: N := N − 1 j 11: if ( N j = Nmin j ) then 12: S := S \ { j } 13: Return N Fig. 5. Simulation results each step corresponds to a time interval of length W , long enough to amortize the reconﬁguration costs. Arriv al rates λ ( t ) at step t hav e been generated according to a fractal model, starting from a randomly perturbed sinusoidal pattern to mimic periodic ﬂuctuations. Each workﬂo w type has a different period. Figure 5 shows the results of the simulation. The top part of the ﬁgure shows the estimated response time R c ( N ) (thick lines) and upper limit R + c (thin horizontal lines) for each class c = 1 , . . . , C . The middle part of the ﬁgure shows the arriv al rates λ c ( t ) for each class c = 1 , . . . , C ; note that arriv al rates hav e been stacked for clarity , such that the height of each individual band corresponds to the value λ c ( t ) from c = 1 (bottom) to c = 5 (top). The total height of the middle graph is the total arriv al rate of all workﬂo w types. Finally , each band of the bottom part of Figure 5 shows the number N k of instances of WS W k , from k = 1 (bottom) to k = 10 (top); again, the height of all areas represents the total number of resources which are allocated at each simulation step. As can be seen, the number of allocated resources closely follows the workload pattern. W e performed additional experiments in order to assess the ef ﬁciency of allocations produced by SA VER . In par - ticular , we are interested in estimating the reduction in the number of allocated instances produced by SA VER . T o do so, we considered different scenarios for all combinations of C ∈ { 10 , 15 , 20 } workﬂo w types and K ∈ { 20 , 40 , 60 } W eb Services. Each simulation has been executed for T = 200 steps; ev erything else (requests arriv al rates, service demands) hav e been generated as described above. Results are reported in T able III. Columns labeled C and K show the number of workﬂo w types and W eb Services, respec- tiv ely . Columns labeled Iter . A C QU I R E () contain the maximum and a verage number of iterations performed by procedure A C Q U I R E () (Algorithm 2); columns labeled Iter . R E L E A S E () contain the same information for procedure R E L E A S E () (Al- gorithm 3). Then, we report the minimum, maximum and total number of resources allocated by SA VER during the simulation run. Formally , let S t denote the total number of WS instances allocated at simulation step t ; then Min. instances = min t { S t } Max. instances = max t { S t } T otal instances = X t S t Column labeled WS Instances (static) shows the number of instances that would have been allocated by provisioning for the worst case scenario; this value is simply T × max t { S t } . The last column shows the ratio between the total number of WS instances allocated by SA VER , and the number of instances that would have been allocated by a static algorithm to satisfy the worst-case scenario; lower values are better . The results show that SA VER allocates between 64%– 72% of the instances required by the worst-case scenario. As previously observed, if the IaaS provider charges a ﬁxed price for each instance allocated at each simulation step, then SA VER allows a consistent reduction of the total cost, while still maintaining the negotiated SLA. V I I . C O N C L U S I O N S A N D F U T U R E W O R K S In this paper we presented SA VER , a QoS-aware algorithm for executing workﬂo ws in volving W eb Services hosted in a Cloud en vironment. The idea underlying SA VER is to selec- tiv ely allocate and deallocate Cloud resources to guarantee that the response time of each class of workﬂows is less than a negotiated threshold. The capability of dri ving the dynamic resource allocation is achie ved though the use of a feedback control loop. A passive monitor collects information that is used to identify the minimum number of instances of each WS which should be allocated to satisfy the response time constraints. The system performance at dif ferent conﬁg- urations is estimated using a QN model; the estimates are used to feed a greedy optimization strategy which produces the new conﬁguration which is ﬁnally applied to the system. Simulation experiments show that SA VER can ef fecti vely react to workload ﬂuctuations by acquiring/releasing resources as needed. The methodology proposed in this paper can be improv ed along se veral directions. In particular , in this paper we as- sumed that all requests of all classes are evenly distributed across the WS instances. While this assumption makes the system easier to analyze and implement, more effecti ve allo- cations could be produced if we allow individual workﬂo w classes to be routed to speciﬁc WS instances. This extension would add another lev el of complexity to SA VER , which at the moment is under in vestigation. W e are also exploring the use of forecasting techniques as a mean to trigger resource al- location and deallocation proactiv ely . Finally , we are working on the implementation of our methodology on a real testbed, to assess its effecti veness through a more comprehensive set of real experiments. A P P E N D I X Let M be the current system conﬁguration; let us assume that, under conﬁguration M , the observed arriv al rates are λ = ( λ 1 , . . . , λ C ) and service demands are D ck ( M ) . Then, for an arbitrary conﬁguration N , we can combine Equations (3) and (4) to get: R c ( N ) = K X k =1 N k D ck ( N ) 1 − U k ( N ) (6) The current total class c service demand on all instances of W k is M k D ck ( M ) , hence we can express service demands and utilizations of individual instances for an arbitrary conﬁg- uration N as: D ck ( N ) = M k N k D ck ( M ) (7) U k ( N ) = M k N k U k ( M ) (8) Thus, we can rewrite (6) as R c ( N ) = K X k =1 D ck ( M ) M k N k N k − U k ( M ) M k (9) which allows us to estimate the response time R c ( N ) of class c workﬂows, giv en information collected by the monitor for the current conﬁguration M . From (2) and (7) we get: U k ( N ) = M k N k C X c =1 λ c D ck ( M ) (10) T ABLE III S I MU L A T IO N R E SU LT S F O R D I FF E RE N T S C E NA R IO S Iter . A C Q UI R E () Iter . R EL E A S E () WS Instances (dynamic) C K max avg max avg min max tot WS Instances (static) Dynamic/Static 10 20 14 1.30 15 2.53 36 127 16589 25400 0.65 10 40 22 2.43 19 3.81 76 257 33103 51400 0.64 10 60 35 3.54 35 5.12 122 378 50211 75600 0.66 15 20 10 1.27 13 2.56 78 178 23536 35600 0.66 15 40 23 2.20 26 3.68 138 340 44843 68000 0.66 15 60 34 3.20 44 5.04 239 526 68253 105200 0.65 20 20 9 1.19 13 2.50 114 206 28792 41200 0.70 20 40 24 2.33 29 4.00 215 408 57723 81600 0.71 20 60 21 3.00 30 4.89 347 602 86684 120400 0.72 Since by deﬁnition the utilization of any WS instance must be less than one, we can use (10) to deﬁne a lower bound on the number N k of instances of W k as: N k ≥ M k C X c =1 λ c D ck ( M ) (11) The following lemma can be easily prov ed: Lemma 1: The response time function R c ( N ) is mono- tonically decreasing: for any two conﬁgurations N 0 and N 00 such that N 0 k ≤ N 00 k for all k = 1 , . . . , K , we hav e that R c ( N 0 ) ≥ R c ( N 00 ) Pr oof: If we extend R c ( N ) to be a continuous function, its partial deriv ativ e is ∂ R c ∂ N k = − M 2 k U k ( M ) D ck ( M ) ( N k − U k ( M ) M k ) 2 (12) which is less than zero for any k for which the utilization U k ( M ) and service demand D ck ( M ) are nonzero. Hence, function R c ( N ) is decreasing. Note that, according to Eq. (9), response time increments are additiv e. This means that R c ( N ) − R c ( N + 1 j ) = ∆ j and R c ( N ) − R c ( N + 1 i ) = ∆ i imply R c ( N ) − R c ( N + 1 i + 1 j ) = ∆ i + ∆ j R E F E R E N C E S [1] “Amazon Elastic Compute Cloud (Amazon EC2). ” [Online]. A v ailable: http://aws.amazon.com/ec2/ [2] “Xen Cloud Platform. ” [Online]. A vailable: http://www .xen.org/ [3] “IBM Smart Cloud. ” [Online]. A vailable: http://www .ibm.com/ibm/ cloud/ [4] “Windo ws Azure. ” [Online]. A vailable: http://www .microsoft.com/azure [5] Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: state-of-the-art and research challenges, ” Journal of Internet Services and Applications , vol. 1, pp. 7–18, 2010. [6] “Salesforce CRM. ” [Online]. A vailable: http://www .salesforce.com/ platform/ [7] A. Alves, A. Arkin, S. Askary , C. Barreto, B. Bolch, F . Curbera, M. Ford, Y . Goland, A. Guizar , N. Kartha, C. K. Lui, R. Khalaf, D. K ¨ onig, M. Marin, V . Mehta, S. Thatte, D. van der Rijn, P . Y endluri, and A. Y iu, “W eb services business process execution language version 2.0, ” OASIS Standard, Apr .7 2007. [Online]. A vailable: http://docs.oasis- open.org/wsbpel/2.0/wsbpel- v2.0.pdf [8] B. Urgaonkar , P . Shenoy , A. Chandra, P . Goyal, and T . W ood, “ Agile dynamic provisioning of multi-tier internet applications, ” ACM T rans. Auton. Adapt. Syst. , vol. 3, no. 1, pp. 1–39, 2008. [9] M. Litoiu, M. W oodside, J. W ong, J. Ng, and G. Iszlai, “ A business driv en cloud optimization architecture, ” in Proc. of the 2010 ACM Symp. on Applied Computing , ser . SA C ’10. A CM, 2010, pp. 380–385. [10] E. Kalyvianaki, T . Charalambous, and S. Hand, “Self-adaptiv e and self-conﬁgured cpu resource provisioning for virtualized servers using kalman ﬁlters, ” in ICAC . A CM, 2009, pp. 117–126. [11] J. O. Kephart, H. Chan, R. Das, D. W . Levine, G. T esauro, F . L. R. III, and C. Lefurgy , “Coordinating multiple autonomic managers to achieve speciﬁed po wer-performance tradeoffs, ” in ICA C . IEEE Computer Society , 2007, p. 24. [12] R. Calinescu, “Resource-deﬁnition policies for autonomic computing, ” in ICAS . IEEE Computer Society , 2009, pp. 111–116. [13] B. Urgaonkar , G. Paciﬁci, P . J. Shenoy , M. Spreitzer, and A. N. T antawi, “ Analytic modeling of multitier internet applications, ” TWEB , vol. 1, no. 1, 2007. [14] X. Zhu, D. Y oung, B. J. W atson, Z. W ang, J. Rolia, S. Singhal, B. Mc- Kee, C. Hyser , D. Gmach, R. Gardner, T . Christian, and L. Cherkasov a, “1000 islands: an integrated approach to resource management for virtualized data centers, ” Cluster Computing , vol. 12, no. 1, pp. 45–57, 2009. [15] J. Li, J. Chinneck, M. W oodside, M. Litoiu, and G. Iszlai, “Performance model driv en qos guarantees and optimization in clouds, ” in Pr oceedings of the 2009 ICSE W orkshop on Software Engineering Challenges of Cloud Computing , ser . CLOUD ’09. W ashington, DC, USA: IEEE Computer Society , 2009, pp. 15–22. [16] G. Jung, M. A. Hiltunen, K. R. Joshi, R. D. Schlichting, and C. Pu, “Mistral: Dynamically managing power , performance, and adaptation cost in cloud infrastructures, ” in ICDCS . IEEE Computer Society , 2010, pp. 62–73. [17] S. Ferretti, V . Ghini, F . Panzieri, M. Pellegrini, and E. T urrini, “Qos- aware clouds, ” in Pr oceedings of the 2010 IEEE 3r d International Confer ence on Cloud Computing , ser . CLOUD ’10. IEEE Computer Society , 2010, pp. 321–328. [18] N. Huber , F . Brosig, and S. K ounev , “Model-based self-adaptiv e resource allocation in virtualized environments, ” in SEAMS ’11 . IEEE Computer Society , 2011. [19] Y . O. Y azir, C. Matthe ws, R. Farahbod, S. Ne ville, A. Guitouni, S. Ganti, and Y . Coady , “Dynamic resource allocation in computing clouds using distributed multiple criteria decision analysis, ” in Pr oceedings of the 2010 IEEE 3rd International Confer ence on Cloud Computing , ser. CLOUD ’10. IEEE Computer Society , 2010, pp. 91–98. [20] G. Canfora, M. Di Penta, R. Esposito, and M. L. V illani, “Qos-aware replanning of composite web services, ” in Pr oceedings of the IEEE International Conference on W eb Services , ser. ICWS ’05. W ashington, DC, USA: IEEE Computer Society , 2005, pp. 121–129. [Online]. A vailable: http://dx.doi.org/10.1109/ICWS.2005.96 [21] E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik, Quanti- tative System P erformance: Computer System Analysis Using Queueing Network Models . Prentice Hall, 1984. [22] J. W . Eaton, GNU Octave Manual . Network Theory Limited, 2002.

A Framework for QoS-aware Execution of Workflows over the Cloud

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment