The Cloudreach team provided an assessment of Sovrn’s existing AWS environment. Part of this assessment was to look at the environment’s overall architecture and provide remediation plans based off of the 5 pillars (operations, security, reliability, performance, and cost) of Amazon’s Well Architected Framework.
It was determined that Amazon EC2, AWS ELB, and Autoscale groups were needed across availability zones to add resiliency and help balance the load for all application services. Cloudreach assisted in testing and providing input for instance sizing and scaling policies. The Cloudreach team used an “Automation First” strategy and implemented Terraform to automate the client’s infrastructure in AWS. This allowed Sovrn to quickly spin up environments in different domains and spin down to save on costs.
In addition to bringing the infrastructure up to AWS best practices and standards, the Cloudreach team implemented a data lake strategy for Sovrn’s data platform. This allowed the client to ingest billions of data sets from all of their different on-prem data centers into AWS S3. Cloudreach implemented a data pipeline to process ingested data to create rollups and summarized data sets for revenue information. As part of the data pipeline, Cloudreach’s team implemented a 67 node EMR cluster to process more than nine billion records per day ingested into an S3 bucket. The team also implemented an AWS Glue Catalog to use as a Universal Metadata store, applied AWS Lambda functionality for files transfer, and enacted Apache Airflow for job scheduling and monitoring. As part of setting up the data pipeline, the Cloudreach team assisted in migrating over 250 HIve jobs to Sovrn’s new AWS environment.
AWS Glue was a central piece to the overall solution of this engagement, being leveraged as a Universal Metadata storage for EMR. Cloudreach implemented scheduled jobs to crawl S3 buckets for metadata discovery and automatic schema inferences. This allowed Sovrn to not worry about downstream services impacted by slow changing dimensions (SCD) type of schemas. Cloudreach implemented a bi-directional approach where an external table created within the EMR cluster would be automatically discovered and cataloged within AWS Glue. Any schema changes identified by scheduled AWS Glue crawlers would also be queried against using Hive.
Finally, Cloudreach configured AWS CloudTrail and Amazon CloudWatch to enable logging and resource monitoring on the environment. Through this, Sovrn is able to be notified of any unwanted access attempts and/or impacts to environmental performance.
- Amazon VPC & VPC Peering
- Amazon EC2
- Amazon S3
- Amazon SQS
- AWS CloudTrail
- Amazon CloudWatch
- Amazon Route 53
- AWS Glue Catalog
- AWS Lambda
- Amazon RDS