
Without a doubt, Infrastructure as Code together with DevOps and Migrations is one of the hottest topics in the last 5+ years in the cloud industry.
A few words about cloud migrations
Cloudreach has been in the cloud migrations business since day 0 in 2009 (yes, we recently turned 13!). At the very beginning, we were focussed on Google Apps migrations, then Salesforce, and slightly later we started with AWS.
Back in the day, the only services that AWS provided were EC2, S3 and SQS. In 13 years AWS developed and released more than 200 additional services, more than 15 services per year. It was a booming industry with virtually no limits. Migrating in 2009 was incredibly more painful than migrating now. Tools like AWS Application Migration Service (formerly CloudEndure) are making our lives easier, especially at scale.
TL;DR: Less migration effort, happier customers. Cloudreach knows how to migrate.
A few words about Infrastructure as Code
As a cloud native company, we love promoting DevOps and Infrastructure as Code (IaC). I am not going through all the benefits of CI/CD and Cloud Native Infrastructure, as we have covered them thoroughly in the past.
HashiCorp Terraform became the “de-facto” tool for IaC in Cloud (and Pizza Delivery) since its first release on 28 July 2014. Its adoption rate steadily increased throughout the years and nowadays it’s difficult to find customers that are not using Terraform to manage their cloud infrastructure.
TL;DR: More IaC, happier customers. Cloudreach loves IaC.
A few words about migrations and IaC
There are several different treatment plans that are being considered when migrating workloads to the Cloud. A rehosting migration (otherwise known as “lift-and-shift”) consists of a block-based replication of the content of the on-premise server. It’s the most common migration strategy used when customers have a strict deadline to decommission a data center.
Some may ask if it’s even worth it to use IaC in the context of a rehosting migration as there are multiple aspects not managed as code (e.g. file system configuration, software installation, user configuration, etc.).
We enforce Infrastructure as Code across the board, and migrations are not an exception.
There is a misconception that rehosting migrations don’t go well together with IaC deployments. This is mainly because the resource creation (e.g. EC2, EBS, IAM role etc.) is managed by the migration tool (e.g. AWS MGN) and can’t be integrated in the code.
Thankfully, that’s a misconception.
While this sounds obvious to most readers, my personal experience is that it’s not as obvious to others.
Moreover, migrating at scale requires IaC to minimize the manual “ClickOps” effort required to create servers in the Cloud.
In fact, IaC coupled with CI/CD pipelines bring multiple benefits, such as:
- Standardization – Common elements such as EC2/EBS tags, shared security groups rules, IAM policies are managed via code, removing the risk of drift between environments or spelling mistakes. For example, tagging server 1 with backup=daily and server 2 with Backup=daily will likely cause one of the two servers to not be backed up because tags are case sensitive. Using a common Terraform module with mandatory tags will help to mitigate this risk.
- Repeatability – Once that a workload has been migrated, if the environment is defined as IaC, it would be simple to create a secondary environment with minimal effort via code. For example, on-premise workloads with no lower environments can be migrated and then cloned in the cloud to have a pre-production environment formally identical to production.
- Compliance Testing – As a best practice, IaC code is coupled with CD pipelines to allow fast and secure deployments. Changes to the infrastructure are only allowed by specific teams and only via pipelines to control compliance aspects. A well designed pipeline would include security and compliance testing, for example to enforce encryption at rest, locked down firewall rules, mandatory tags, etc.
In the next sections I’ll explain two different methods to integrate your IaC tools (and code!) with the infrastructure migrated to the cloud.
For the sake of simplicity, I will limit the examples to Terraform and AWS MGN, but rest assured that the same principles are valid for other IaC tools, like AWS CloudFormation, and other cloud providers.
Prerequisites to integrate IaC with migration tooling
This blog post assumes that you are proficient with AWS MGN (or CloudEndure) and HashiCorp Terraform.
The initial conditions are as follows:
- The development environment (IDE) is ready and configured to work with Terraform commands (e.g. terraform apply/plan, terraform state list etc.) and developers have access to the target AWS Accounts and services.
- The Terraform code has been developed and should include the following resources, defined to reflect the target configuration (we will call this group of resources as mock deployment):
- EC2 instance(s)
- EBS volumes and attachments
- IAM Roles
- Security Groups
- KMS Keys
- The infrastructure has been successfully deployed into your target AWS Account through Terraform (successful plan and apply)
- The test cutover has been executed successfully (not mandatory but highly recommended)
- Once the above conditions are satisfied, if we use Method 1 described below, the EC2 instance and the EBS volumes for the mock infrastructure can be terminated to save on the monthly bill. Security Groups and IAM roles need to be preserved as we’ll need them later. If we use Method 2, all the mock infrastructure can be destroyed as we’ll redeploy it later.
Here comes the core of this blog post.
The methods described below can be used to couple Infrastructure as Code and rehosting tools.
Method One – Import
The first method consists of importing the resources created by AWS MGN into the Terraform state after removing the mock resources, so that it reflects the target state.
What does Terraform import mean? As the linked article suggests:
Terraform is able to import existing infrastructure. This allows you to take resources you’ve created by some other means and bring it under Terraform management.
This is a great way to slowly transition infrastructure to Terraform, or to be confident that you can use Terraform in the future if it potentially doesn’t support every feature you need today.
Let’s see how to leverage this feature. We will be working with the Terraform state to replace the mock resources with the actual resources created by AWS MGN.
While defining the target configuration in the AWS MGN settings (Launch Template), the IAM Role and Security Group will correspond to the same resources created previously via Terraform and preserved in the mock deployment.
This is the sequence of events:
- Terminate the EC2 instance and the EBS volumes created previously (you should have done it as part of the prerequisites)
- Using the command terraform state list, retrieve the resource names used for the EC2 instances and volumes (EC2 instance id, EBS volumes, device id)
- Perform the test cutover to validate the server health status after the launch
- Terminate the test cutover instance once verified its successful launch
- Perform the cutover
- Use a Bash or Python script to:
- Remove the mock resources from the Terraform state
- Import the new resources corresponding to the infrastructure created by AWS MGN (EC2 instance, EBS attachments, EBS volumes)
This script can be automated to automatically retrieve the IDs for the EBS volumes and EC2 instances, as well as the device ids.
- Execute a Terraform plan to make sure that there are no changes to the infrastructure deployed
- Your Infrastructure as Code is now aligned with the deployment and you can continue to manage the infrastructure via Terraform
Sample Bash script covering one EC2 instance import with two data volumes, with no automation (values must be entered manually in this case):
#!/bin/bash
#remove/import EC2 instance
echo starting EC2 Imports
terraform state rm ‘module.ec2[“instance_name”].aws_instance.this[0]’
terraform import -var-file=”terraform.tfvars” ‘module.ec2[“instance_name”].aws_instance.this[0]’ i-1234567890abcdef0
#remove/import first EBS volume (/dev/sdb)
terraform state rm ‘module.ec2[“instance_name”].aws_ebs_volume.this[“sdb”]’
terraform import -var-file=”terraform.tfvars” ‘module.ec2[“instance_name”].aws_ebs_volume.this[“sdb”]’ vol-1234567890abcdef1
#…and its attachment
terraform state rm ‘module.ec2[“US06DB01”].aws_volume_attachment.this[“sdb”]’
terraform import -var-file=”terraform.tfvars” ‘module.ec2[“instance_name”].aws_volume_attachment.this[“sdb”]’ /dev/sdb:vol-1234567890abcdef1:i-1234567890abcdef0
#remove/import second EBS volume (/dev/sdc)
terraform state rm ‘module.ec2[“instance_name”].aws_ebs_volume.this[“sdc”]’
terraform import -var-file=”terraform.tfvars” ‘module.ec2[“instance_name”].aws_ebs_volume.this[“sdc”]’ vol-1234567890abcdef0
#…and its attachment
terraform state rm ‘module.ec2[“instance_name”].aws_volume_attachment.this[“sdc”]’
terraform import -var-file=”terraform.tfvars” ‘module.ec2[“instance_name”].aws_volume_attachment.this[“sdc”]’ /dev/sdc:vol-1234567890abcdef0:i-1234567890abcdef0
echo instance_name imports executed
Terraform code design considerations:
- The EC2 instance AMI can be any AMI available from the marketplace as the mock instance will be deleted
- In the EBS volume configuration, the snapshot_id parameter is not defined/used
- The Terraform code for the mock deployment and the AWS MGN Launch Template need to match, to avoid changes to be applied after the migration
Method Two – Redeploy
The second method consists of creating AMI and EBS snapshots of the EC2 instance created by AWS MGN. These will then be used to create the target infrastructure through Terraform.
Sequence of events:
- Destroy the environment created previously for testing (you should have done it as part of the prerequisites)
- Perform the test cutover to validate the server health status after the launch
- Terminate the test cutover instance once verified its successful launch
- Set the setting “Start instance upon launch” to “No” in the General Launch Settings
- Perform the cutover. At this stage, AWS MGN will create an EC2 instance in the same AWS Account where AWS MGN is configured
- Manually or through automation, create an AMI of the EC2 instance
- If the AWS MGN Account is not the same as the target AWS Account:
- Share the AMI with the target AWS Account where the EC2 instance is supposed to run
- Perform a snapshot copy operation to have the snapshots owned by the Target AWS Account. Verify that the KMS-CMK Key configured in AWS MGN can be accessed by the IAM Role used in the target Account to perform the copy.
- Using the Terraform AMI resource, create a new AMI using the root volume snapshot as snapshot_id
- In the Terraform section for the EC2 Instance (either using a module or a resource), specify the AMI created in the previous step as AMI id, and use the EBS volume snapshots ids created previously as input to the ebs_block_device section
- Execute a terraform plan to verify that the code is valid and the changes are the ones expected and then you can proceed with a terraform apply
Why are we not just taking the AMI of the EC2 instance and using it in the ami parameter in the aws_instance resource, and avoid gathering all the snapshots ids for the data volumes? While this approach will work on paper, if an AMI with multiple volumes defined is used, it won’t be possible to manage individual disks as code. For example, it won’t be possible to extend a volume or change its iops settings via code.
Most of the steps mentioned above can and should be automated to reduce errors and speed up the migration process. For example, Step Functions could be used to orchestrate the AMI creation, sharing a copy when an EC2 instance is created with certain tags identifying the target AWS Account, KMS key and Region for the migrated instance.
Which method to choose?
While both the methods can be considered equivalent in most cases, there are pros and cons for each one.
The main drivers are:
Cutover maintenance window and Cross Account requirements
There are mainly two patterns for deploying AWS MGN in a Migration Factory design.
The first pattern makes use of a centralized Staging Area dedicated to AWS MGN, and then AMIs are distributed cross-account.
The main benefits are isolation (all the migration data and infrastructure is stored in a single AWS Account that can be nuked at the end) and easier planning (the target AWS Account and FW rules can be provisioned after the data replication has started).
The second pattern (used in the Cloud Migration Factory solution) consists of decentralized AWS MGN deployed in each AWS Account where the migrated workloads are supposed to run.
The main advantage is that the data does not need to be copied in the target AWS Account after the cutover.
In fact, if the size of the volumes is bigger than 1 TB, the snapshot copy operation might take more than one hour to complete. In this case, having a decentralized AWS MGN pattern would be a hard requirement depending on the cutover window allowed.
This implies the following:
- Method 1 is generally quicker when coupled with a decentralized AWS MGN pattern and this is the recommended choice for applications with short downtime window and/or considerable volumes of data.
- Method 2 is instead recommended when using a centralized pattern, as the starting point to deploy the infrastructure would be EBS snapshots and not an EC2 instance.
Complexity in automation
Both methods require some form of automation to be used at scale.
Method 1 would require scripting to replace the resources in the Terraform state. Instance IDs and EBS volumes IDs could be retrieved programmatically from the script using the AWS CLI, or inputted manually. The choice depends on the scale and the level of tolerance on human errors.
If a centralized AWS MGN pattern is being used with Method 2, AWS Step Functions (coupled with AWS Lambda) to share the EBS Snapshots and copy them cross-account would be a must. Method 2 could require values such as EBS Snapshots ids with the associated EC2 instance names and device names (e.g. /dev/sdc) to be stored in AWS SSM Parameter Store once generated, so that Terraform can use them as data sources to avoid committing the code with updated EBS snapshot ids during the cutover.
Terraform state
Another consideration is around Terraform state. While Terraform allows us to inspect and modify the Terraform state, this needs to be done extremely carefully to avoid drifts and mismatches that would result in inconsistencies while performing plan/apply operations. If you are not confident in working with Terraform state, Method 2 is highly recommended as it doesn’t require you to execute any operations with the state.
It is worthwhile to highlight that there is no right or wrong choice. Sometimes it’s difficult to choose one method and stick to it, as the conditions might be different from workload to workload. It’s good to have at least two options to consider, depending on the requirements.
Learn more about using IaC in rehosting migrations
In this blog post I’ve shown two different ways of integrating resources created by a migration service such as AWS MGN with an Infrastructure as Code tool like Terraform.
Similar principles are applicable to other cloud providers, migration services and Infrastructure as Code software tools.
Despite the combinations of the above choices, customers should always consider as much information as possible “as Code” to allow migration execution at scale, minimize human errors, standardize deployments and lock down access to infrastructure changes by using CI/CD pipelines.
If you want to know more about it, get in touch with us!