Aniket Braganza explains the benefits of a cloud landing zone, especially when building a secure and well-organized cloud platform.
What is a cloud landing zone?
Let’s start by talking about what a Landing Zone, also known as a Cloud Landing Zone (CLZ), is because understanding the concept is going to help us establish why it is so important to set up at the beginning of our cloud journey.
If we think of our cloud platform like a city, then we can think of the Cloud Landing Zone as the blueprint and essential infrastructure to make sure our city grows up and develops in a well-organized way.
Having the blueprint ensures that as we add new sections and services in our city, we know where they go and how we can set them up in relation to everything that is already there. It may seem frustrating to do some of this work upfront, but establishing that baseline and scaffolding makes it easier to solve more complex problems later, as we no longer have to worry about our foundations.
When searching the internet for information on CLZs, one typically finds the following four reasons:
- Security & Compliance
- Standardized tenancy
- Identity and access management
While there is nothing wrong with looking at a CLZ this way, I think that giving development teams a way to correlate practical SDLC steps to building your CLZ is more appropriate. I believe that there are some simple, common-sense reasons to incorporate a CLZ into the buildout of your cloud platform.
For the sake of this discussion, let us assume that your current Cloud Solution Provider (CSP) Organization is defined as follows:
- 1 Master Billing account
- 1 Security account
- 1 Log Archival account
- 1 Shared Services account
- 2 Upper accounts (PROD and UAT)
- 2 Lower accounts (DEV and TEST)
- n Sandbox accounts (the number to be decided by your team)
Why build a cloud landing zone?
Here are 5 common-sense reasons to incorporate a CLZ into the buildout of your cloud platform:
In order to successfully build a cloud platform, development teams need to be able to experiment. This means that there needs to be a sandboxed area where they can test new features and functionality.
Initially, teams may not know what access to the platform they will need and so they will need more administrator-style control to build something before they refine the minimum set of rights needed. To be able to do this, there has to be an area in the platform where they can experiment, and where, if anything goes wrong, it can be wiped and rebuilt without compromising the stability of the platform as a whole.
This can be accomplished by establishing a Security account.
This account is where the developers and users of the platform can be defined. Nothing else is built in this account. For example, on your CSP, developers could be defined in IAM and be assigned to a specific Role and granted a specific Policy. Users of the system could be registered directly in your Identity Provider or granted access via federated and/or social login.
Now you can create a “Sandbox” account where developers can deploy feature branches and experiment with building new functionality. This environment would have no users of its own but simply establish a federated trust relationship which would grant developers administrator level access. Everything in this sandbox account is essentially throwaway and can be redeployed via Infrastructure as Code (IaC), if necessary.
This security posture would ensure that no developer or user would ever be allowed access to your Logging, Shared Services, Upper or Lower accounts. Additionally, no manual changes would ever be allowed in any of these accounts and all changes would be deployed via pipeline. The pipeline tools would utilize a service principal to deploy functionality to the accounts.
Finally, it would mean that any resource access that was needed by any code or platform services would be specified in IaC (e.g. Terraform) and that access would be audited during code review before it was deployed. This would also provide the necessary reporting for compliance in case of an audit.
The ability to develop new features or to connect to the cloud platform to troubleshoot existing features can sometimes be impeded by network connectivity issues. If the team adopts an ad hoc approach to connecting to resources or opens up secured resources to the public internet, it can severely compromise the team and the platform.
However, if the resources are locked down or connectivity is misconfigured, it may result in teams spending hours trying to track down the root cause. Adopting a network posture that is too loose can allow malicious actors to tamper with the platform, while a posture that is too strict can potentially impede the progress of legitimate work.
A cloud landing zone can help the team avoid wasting time by simplifying how cloud resources are accessed by granting the right access levels to the different actors. Documenting the topography of the application and the potential interactions that need to be established can help provide guidance when developing and help inform future decision making without a substantial amount of rework to the platform.
Managing the platform
The best part of building a cloud native platform is a robust CI/CD infrastructure to continually test and deploy your platform to the cloud. This is an attainable goal, as long as no one is allowed to manually tinker with the end result of that delivery process.
Having a high-quality CI/CD process is the key to success because it allows your engineering team to focus on solving complex problems and not on menial, repetitive tasks. It allows your testers to write or execute tests to validate the functionality, not troubleshoot why they can’t use the deployed platform. Finally, it allows you to recover from an outage or a problem, without your teams having to scramble to find and fix mission-critical problems.
I would argue that your CI/CD pipelines are only as good as the security and infrastructure that you build them upon. Therefore, to establish a high-quality delivery process, it is imperative to continually secure, audit, and enhance delivery pipelines.
I find that the most comprehensive way to manage your platform is by managing the processes and procedures that are used to build and deploy your platform. If focus is given to ensuring that these processes and procedures are instituted before or close to the beginning of feature rollout, then the patterns and practices become second nature to the team and the CLZ infrastructure that is established supports the rapid, relatively hands-off buildout of the platform.
Building a cloud native product can be quite a difficult task. There are a couple of things that make it hard to achieve success.
First, if a team adopts an ad hoc method of building and deploying enhancements for their platform, it can lead to developers overwriting one another’s features, the introduction of bugs and potential regression issues. These problems are compounded if developers manually make changes to the platform that are not tracked or reversible.
Second, for convenience, developers may deploy multiple environments like “Sandbox” and “DEV”, “UAT” or “PROD” to the same account. While it may be easier to take this approach at first, it is more than likely that we will impact user experience and performance for our customer facing environments. For example, if testers are doing load testing in the “TEST” environment at the same time as “PROD” users using the system and the developers deploying changes to “DEV”, there may be so much contention for resources that multiple groups are impacted by it.
Your CSP can scale up to handle the load, but by default, accounts typically have preset soft limits that will block tenants from exceeding their service quotas. While most quotas can be increased, some cannot be increased. Therefore, hosting multiple environments in the same account can potentially impact the experience for the team and all users in those environments. For example, if developers push too many feature branches and do not clean up resources, “PROD” deployments may start to fail as service quotas are exceeded.
These issues can be rectified easily if we have a CLZ in place to clearly demarcate each environment. Unstable code that is being deployed for new features is kept far away from your production deployment and multiple teams of engineers and testers can work in parallel without affecting the performance of mission-critical, customer-facing systems. By establishing a CLZ early, you have essentially set up the guardrails for your platform before you begin building it. This can save both time and money in the long run, but more importantly it insulates your revenue stream by ensuring that customers have a good experience while your team continues to enhance the product.
Logging is often considered to be an auxiliary part of building a cloud platform, but it is one of the most crucial facets. When teams start building a platform, logging is typically utilized to troubleshoot or debug issues. However, as time progresses and the platform grows and features become more complex, it can be hard to understand how a platform is functioning. Offloading all logs to a separate account when establishing a CLZ is important for 2 reasons.
First and foremost, it captures execution and audit data, while separating it from the execution of logic in the platform. This is important because, if, for any reason, there were flaws in the code base or malicious actors that tried to delete or modify logs, those operations would be implicitly blocked. By ensuring that the CLZ is set up such that log transfer is forward only, not allowing any modification to log data, protects the platform from malicious actors or buggy code.
Second, at some point in the future, you will want to either build or integrate into an application performance management and IT operations analytics solution like Datadog, Dynatrace, AppDynamics or SumoLogic. These solutions typically want to connect to your accounts to analyze the log data being generated by your platform. These tools provide a lot of valuable insights about how your platform is running, but I am wary of giving such a tool direct access to my platform. By establishing a Log Archival account in our CLZ, we can avoid this potential security vulnerability because we only need to grant access to 1 isolated account rather than the entirety of our platform.
One Size Fits All?
While it sounds like a Cloud Landing Zone is some sort of cookie-cutter blueprint that can be used everywhere, nothing could be farther from the truth. As I stated at the beginning, if we think of our platform as a city then the CLZ is the blueprint and essential infrastructure to make sure our city grows up and develops in a well-organized way.
Each product is different and may leverage different services. The key is to establish the scaffolding so that we can focus on making sure our platform is well designed, well organized and well-orchestrated while trusting in the framework that we used to establish the foundation.
For more information on how Cloudreach can help you build a secure and well-organized cloud platform, click here.