As the world’s most recognized consumer car media brand the company saw an opportunity to integrate multiple data sources into a data workbench. AWS allowed the company the ability to load multiple data sources into EMR and Spark as well as read five files from S3. The data would then be pre-processed in Spark and then validated, transformed, and written to Redshift. The process was automated using Data Pipeline and delivered with metrics through CloudWatch and SNS.
Cloudreach performed a full platform assessment to inform the reference architecture design. The deliverables included AWS environment diagrams, a documented process for data inputs, a Spark process diagram, documentation for data outputs, Redshift schema and process documentation, and Data Pipeline documentation. Cloudreach leveraged several AWS services, including EMR, Redshift, Data Pipeline, CloudWatch, SNS, CloudFormation, SQS, Direct Connect, CloudTrail, VPC, and S3.
The data analytics solution on AWS allowed the company to leverage new data sources, introduced the power and value of EMR and Redshift, delivered a production ready workbench that can be extended to additional data sources, and increased sophistication of pre- and post-process analytics.
About World's Leading Consumer Car Media Brand
The Director of Technology at the company required a data workbench prototype for numerous data sources. To do so, he required the expertise of a cloud and big data partner experienced with AWS big data architecture and best practices. As an AWS Big Data Competency holder, Relus Cloud was selected to design and configure a big data analytics workbench on AWS.
Related Case Studies
Cloudreach helped Aware build a machine learning pipeline with Amazon SageMaker to enable their internal data scientists to focus on innovation.
Cloudreach helped Dufry reduce their AWS bill by 66% with cost visibility and optimization.