• Media
  • AWS

The Challenge

Sovrn sought Cloudreach’s help for a massive data and analytics migration project. Previously, the Sovrn’s team was running in their data platform on a mixed environment consisting of on-premise data centers and Softlayer cloud services. This environment was extremely expensive and cost the company close to 7 million dollars a year to maintain. Not only was it costly, but it also did not allow for elasticity, flexibility or resiliency. The Company engaged Cloudreach’s help, not only in building the platform, but to teach their technical staff how to support their new AWS environment. Sovrn, in order to stay competitive and maintain their status as an industry leader, needed to alter their existing testing environment to support their plans for moving all of their massive workloads into AWS.  

The Opportunity

Sovrn chose Cloudreach as the partner that could best satisfy their need for relevant skills and experience needed to provide insight and resources for a successful AWS and data and analytics implementation. The Cloudreach team supported the engagement with a comprehensive team of data architects, data engineers, and project management that not only kept the project on track, but helped deliver a successful engagement on time and under budget.

The Solution

The Cloudreach team provided an assessment of Sovrn’s existing AWS environment. Part of this assessment was to look at the environment’s overall architecture and provide remediation plans based off of the 5 pillars (operations, security, reliability, performance, and cost) of Amazon’s Well Architected Framework.

It was determined that Amazon EC2, AWS ELB, and Autoscale groups were needed across availability zones to add resiliency and help balance the load for all application services. Cloudreach assisted in testing and providing input for instance sizing and scaling policies. The Cloudreach team used an “Automation First” strategy and implemented Terraform to automate the client’s infrastructure in AWS. This allowed Sovrn to quickly spin up environments in different domains and spin down to save on costs.

In addition to bringing the infrastructure up to AWS best practices and standards, the Cloudreach team implemented a data lake strategy for Sovrn’s data platform. This allowed the client to ingest billions of data sets from all of their different on-prem data centers into AWS S3. Cloudreach implemented a data pipeline to process ingested data to create rollups and summarized data sets for revenue information. As part of the data pipeline, Cloudreach’s team implemented a 67 node EMR cluster to process more than nine billion records per day ingested into an S3 bucket. The team also implemented an AWS Glue Catalog to use as a Universal Metadata store, applied AWS Lambda functionality for files transfer, and enacted Apache Airflow for job scheduling and monitoring. As part of setting up the data pipeline, the Cloudreach  team assisted in migrating over 250 HIve jobs to Sovrn’s new AWS environment.

AWS Glue was a central piece to the overall solution of this engagement, being leveraged as a Universal Metadata storage for EMR. Cloudreach implemented scheduled jobs to crawl S3 buckets for metadata discovery and automatic schema inferences. This allowed Sovrn to not worry about downstream services impacted by slow changing dimensions (SCD) type of schemas. Cloudreach implemented a bi-directional approach where an external table created within the EMR cluster would be automatically discovered and cataloged within AWS Glue. Any schema changes identified by scheduled AWS Glue crawlers would also be queried against using Hive.

Finally, Cloudreach configured AWS CloudTrail and Amazon CloudWatch to enable logging and resource monitoring on the environment. Through this, Sovrn is able to be notified of any unwanted access attempts and/or impacts to environmental performance.

Services leveraged:

  • Amazon VPC & VPC Peering
  • VPN
  • Amazon EC2
  • Amazon S3
  • Amazon SQS
  • AWS CloudTrail
  • Amazon CloudWatch
  • Amazon Route 53
  • EMR
    • Hive
    • Presto
    • Zeppelin
    • Hue
  • AWS Glue Catalog
  • AWS Lambda
  • Amazon RDS

The Benefit

Sovrn recognized several benefits following the successful implementation of Cloudreach’s AWS and data and analytics strategies. Using AWS, the Cloudreach team was able to create a modern data platform for Sovrn that can grow as their business does with an infinite number of scalable capabilities. As a result of the Cloudreach engagement, the company has a fully-scalable data platform on AWS that can handle over 20 billion transactions per day. With Cloudreach’s help, Sovrn has the ability to continuously innovate without any limits. Sovrn estimates that they are able to save $5,000,000 per year in infrastructure costs because of their move to AWS and the guidance of the Cloudreach team.

About Sovrn

Sovrn is an advertising technology company for publishers that allows greater insights through enhanced audience access and on demand advertising solutions. They do this through their platform that leverages open ad space on websites and connects them with publishers seeking to promote their content based on their targeted audiences. Sovrn is the world’s third-largest advertising technology company.