Multi-Cloud: Unveiling the Mystery

My last post came from Seattle airport, but this was written over a craft ale in a Las Vegas hotel following a spectacular re:Invent conference where AWS, once again, demonstrated their truly insane capability to innovate at a rapid pace. While in Vegas, I attended a few really interesting sessions relating to security in AWS, so I thought a quick followup post to the recent DevSecOps post was in order.

Those of you working with, or in, larger enterprises will inevitably have heard of multi-cloud. Perhaps you’re thinking about implementing it, or maybe you already are. For anyone else, what I really mean by 'multi-cloud' is a conscious, strategic choice by a company to deploy computing workloads into more than one cloud platform - typically with more than one public cloud and at least one fully-automated private cloud.

I’m not considering hybrid cloud in this debate, which I generally describe as just just one public cloud and at least one on-premise data centre deployment connected by technologies such AWS Direct Connect.

Possible approaches

There is a quite incredible amount of marketing fuzz around multi-cloud at the moment, so I’ll briefly try to cut through some of that. Fundamentally, I think there are a handful of approaches to multi-cloud commonly highlighted. I’ll list them below, starting with the ones I like the least….

Multi-cloud per workload

In theory, this is the holy grail of deployment, i.e. being able to take a specific application and deploy it within both AWS, Azure, GCE, etc, and be totally cloud-agnostic.

My view is that the overwhelming majority of people should not do this, as you will  end up using the lowest common denominator of technologies across the platforms. Whilst all of the platforms will have, to a degree, comparable features around at least compute and storage, truly vast variations exist beyond that - and if a given feature isn’t available on all platforms, you can’t take advantage of it if you want consistent deployments.

Will this change in the future? Possibly. As the major providers race to provide new features, and containerisation technologies become mainstream, this may become easier, but I would steer clear for 2015.

Split-tier deployments

The idea here is that the individual tiers of highly specialised applications are run within different platforms, typically with a backend database running on dedicated hardware and the front tiers being deployed to public clouds.

Again, this is not a recommended deployment pattern for almost all use cases. Where we have seen it prove useful is for what could be termed ‘application modernisation’, where the the core application data is highly sensitive and running on infrastructure so old it gives the word "legacy" a bad name. However, even this is only useful if there is a determined plan to keep the data tier “as is” due to substantial risk associated with moving it to a more modern platform.

Split-role deployments

I spoke with a vendor in the multi-cloud tooling space recently and they were excitedly espousing the benefits of running Dev/Test within public cloud, and then hosting the final Production application(s) within a private cloud. I’ve also heard the flipside of this discussed, with Prod in public cloud, and Dev/Test in private. While there are occasionally possible justifications for this, including legislative requirements around data or perhaps scalability needs, I would argue it would typically be significantly more efficient, consistent and reliable to run your end-to-end deployment pipeline on a single platform. So again, not recommended.

Disaster-recovery-based architectures

This is founded around the "what if" scenario, in which a particular provider could "go down" and cripple your services. In reality, with proper architecture planning, I see this as a risk that can be mitigated given that the major public cloud providers all have widely dispersed and highly redundant geographic infrastructure. Tooling even exists to help you prove this by simulating many different types of failure. For this architecture, it’s not so much that I don’t recommend it, but more that I don’t believe it’s typically needed.

Workload per cloud

You might be wondering by now if I’m planning on saying anything positive. Well, good news: definitely of more interest is running specific workloads within specific cloud platforms. For me, this is the only use case to consider for most people. Which leads us to the key question:

"What problem are you trying to solve?"

It’s important to ask this as, by the nature of implementing multiple platforms, you will increase your effort. The decision process is longer, there are more providers to interact with, more complex designs with higher levels of abstraction, etc. Even the assessment of which workload goes where is non-trivial.

One common reason cited for this "workload per cloud" pattern is compliance, where an organisation has specific needs which dictate that one supplier cannot supply all services in a given area. This clearly doesn’t apply to everyone - see News Corp moving 80% of their estate to AWS, including all SAP infrastructure, as just one example. Equally, it is sometimes as simple as some workloads being perceived as better suited to deployment on a given platform due to the featureset available. As quick examples here, AWS have by far the broadest featureset of services available, Azure offers some excellent PaaS offerings for Microsoft solutions, and GCE features sub-hourly billing and blisteringly quick infrastructure. For some people, the reason is even simpler: just a lack of appetite to commit due to "vendor lock in" worries. My view here is that you have to commit at some point, to something, so you may as well 'pick your horse and back it'. There will be a clear option for you if you evaluate the main platforms. Otherwise, you’ll likely end up locked in to a middle tier vendor or solution anyway, and you won’t have actually reached your original goal.  

Should you do it?

For the purposes of discussion, let’s assume you are going to opt for multi-cloud deployment, perhaps for one of the first two reasons immediately above. How will you go about it? Well, there’s potential for a whole separate blog post (or sizable consulting report!) to cover that, but tooling from the likes of Rightscale, or perhaps more customised offerings using ServiceNow, ServiceMesh, Cliqr and Chef will likely be involved. All of these vendors are evolving their offerings at a rapid rate.

Even if you were to only pick one cloud provider, you would almost certainly want to implement a degree of self service, whereby teams and individuals with appropriate permissions can request, and thereby programmatically create, infrastructure via a portal of one form or another. If you have multiple cloud providers, you will definitely want to do this to provide a layer of abstraction. Your end users should only care about workloads, not underlying infrastructure and which platform is being used. You’ll need to make decisions around the design of the self service process, how you’ll create templates for the infrastructure, who will manage them and keep them up to date, determine which governance rules and design patterns are needed for different levels of data classification, service criticality, etc. You will probably engage with a CSB to make this all happen, or create and train your own similar internal function.

So, back to the question: 'Should you do it' then? Perhaps.

However, I encourage you to think hard about the decision. If you don’t need to implement a multi-cloud deployment, then, in the interests of simplicity - don’t. Conversely, if you do decide to embrace multi-cloud, you won’t be alone. Many other large enterprises will be making this journey with you and the market, tooling and expertise in this space is evolving, and well-architected solutions are being built and used as I write this.