Much of today’s chatter in the cloud has been about Infrastructure as a Service and, more recently, Platform as a Service. We love the two cloud models because they are dynamic and efficient solutions for the need of application workloads. However, we can get so enamored by the response (of IaaS and PaaS) that we sometimes lose sight of the original question: What are the Service Levels required by the Application workloads? Steve Todd, an EMC fellow, writes about this in his recent blog, describing it as the biggest problem VMWare is addressing. I will go farther than Steve to say that this is the largest issue EMC Federation is addressing, at all the different levels of the platform. In this blog, I will elaborate on the need for Application characterization on service levels and how EMC is addressing this need.
Problem statement Enterprise and providers have many workloads in their data center, and these workloads have different requirements from the infrastructure they consume. However, there is a lack of semantics to define and characterize an application workload. There are times when end users are guessing what infrastructure is to be provisioned. Expensive benchmarks are needed to optimize the infrastructure, but most need to determine the infrastructure a priori. There are times when costly re-architecture needs to occur to align with the required service levels. We see this specifically with OpenStack, where users start off with commodity hardware only to revert back to reliable storage with costly reimplementation. Another facet of this problem occurs when users move to the cloud with no clear way of defining the application workload to the provider. This problem has become more severe today than ever before. The new kinds of application workloads are emerging with mobile computing (MBaaS), scale out and big data applications, etc… The platform or infrastructure itself is going through unprecedented evolution with the advent of what IDC describes as the third platform. Storage, for example, can be cached in a flash attached to PCIe, or ephemeral at the compute, or at the hot edge of shared storage with all flash arrays, or hybrid arrays or scale out arrays or cold edge of the array or glacier edge in the cloud…
We see this as an NxN problem. If there are N number of application workloads in a data center and N number of the types of infrastructure to provision them on, an IT administrator may have NxN possible combinations to evaluate for provisioning. The variable N is increasing every day, leading to unsolvable NxN combinations. There is no common semantic to describe the problem, let alone solve it.
Service Level Objectives
The path EMC has chosen to resolve the above NxN issue is to characterize and manage the applications with Service Level objectives. Each workload can be assessed on the dials of the service level objectives, like the ones shown in the picture. Now, rather than determining and optimizing exact infrastructure, the end user focuses only on the rating of the service level dials, bringing the NxN problem down to a manageable number of service level dials. Solution implementation will also become easier, as there are now discreet design targets to shoot for. Let us take the spider chart visualization of a few workloads to illustrate the point. Examples are derived from EMC Product Positioning guide and are meant to be representative and not exact.
The ERP OLTP workload tends to have critical requirements for Performance and Availability. For this reason, Service Levels for IOPS, Latency, QoS, and RAS are rated as Platinum+ (4-5 on the spider chart). Increasingly integrating the application and database layers into the overall IT stack is gaining momentum and deemed critical for this case. Cost is not a concern, hence the service levels for $/GB and $/IOPS are rated as Silver (2 in the spider chart).
I will take Big Data Hadoop as the next example to contrast a workload representing the newer 3rd platform. Typically, Hadoop workloads place high value on Cost ($/GB) and Elasticity (Scale out/Scale down) and associate lower priority on performance (IOPS) and availability (QoS, RAS). Of course, this is just an approximate depiction; I have seen some Hadoop implementations requiring higher performance and availability. We have two distinct spider charts, obviously leading to two different storage infrastructures with the closest match. This was a simple example to prove a point; in reality, you may have thousands of workloads, making such manual selection virtually impossible.
How will the solution work?
Management by Service Level objectives is elegant, but unless it could be automated, it is not a solution. We need an abstraction layer and open interface for automation. Software defined storage, with ViPR, is a perfect fit to be the arbiter between the service levels required by the workloads and the service levels provisioned by the storage. ViPR already provides the capability of policy based provisioning. In the future, it will incorporate the interface for service level objectives, and will provision based on those objectives from a virtual pool of heterogeneous arrays. If you are wondering how you can ease your infrastructure decision making before ViPR automations comes through, you may be able to organize your plans based on recommendations by our EMC Product positioning guide at http://www.emc.com/ppg/index.htm. EMC solutions aside, coming up with an industry accepted definition of service levels is also critical for end users to fairly assess various cloud services offered by the industry. To that end, Open Data Center Alliance – a global association of Enterprise and providers- has made recommendations for the standard definitions of service attributes for Infrastructure as a Service. The alliance definitely has the broad representation and muscle to make such an adoption successful, but only time will tell.
Much has been said about EMC federation’s cloud offerings, from storage (EMC II) to infrastructure (VMW) to platform (Pivotal). However, the key to its success lies in the fundamentals of understanding the workload and provisioning accordingly. You will hear more announcements along these lines in the months and years to come.