A Data-First Strategy: Which CSP Should You Choose?

Cloud Providers focus on advancing and maturing data capabilities has resulted in a very complex ecosystem of cloud services. In this blog post, Bimal Tandel, Business Line Leader for Data & Analytics, will help you navigate this ecosystem by comparing all data services offered by the three major Cloud providers (AWS, Azure, and GCP).

"By 2022, 50% of cloud buying decisions will be based on the data assets provided by cloud service providers"- Garner Top Data and Analytics Predicts for 2019.

(Reference: Link, Link)

Data is a strategic asset and organizations are heavily favoring Cloud providers with Data Management capability that meets their goals and objectives before evaluating other criteria. Gartner was quick to predict that by 2022, 50% of cloud buying decisions will be based on the data assets provided by cloud service providers. The cloud providers clearly understand that data has gravity and that the transformation of organizations will occur on the cloud platform that organizations choose to keep their most important data assets.  

So what are the data assets provided by the three major public cloud providers?

Google Cloud, Amazon Web Services and Microsoft Azure offer an array of services to help customers manage data, and often the capabilities provided by various services overlap. Understanding the different database services offered by various cloud providers is only one aspect of the decision-making process. Organizations should evaluate the current database technology in use, the capabilities developed in-house to support them, and the existing expertise to influence the decision to choose a cloud provider.

Cloudreach is a trusted partner Enterprise customers often engage with to help with Cloud Adoption, Transformation & Innovation. Talk to an expert to define or validate your cloud adoption strategy, and ensure capabilities for management of data assets is a key driver in the decision making.

Focus on the applications and use-cases to classify and manage data assets

Data Classification based on applications & use-cases:

  • Operational Data - Data Assets, typically RDBMS databases, that support OLTP and Operational workloads such as system or records and e-commerce applications.
  • Accelerated Data - Data Assets, typically Cache & In-Memory databases, provides acceleration needed for highly transactions applications that require extreme scalability.
  • Analytics Data - Data Assets, Typically Data Warehouses & Data Lakes, that support SQL and NoSQL based analytics workloads, and enabled Advanced Analytics capabilities including AI & ML.
Cloud services for operational data assets

Key Considerations:

  • Evaluate existing skills and technology being used.
  • Focus on the reusability of existing process and operational plans.
  • Identify applications that need modernization of databases due to lack of support for scalability and functionality, and consider NoSQL databases.
  • Consider Cloud-Native fully managed databases for a few mission-critical applications.
  • Plan compatibility testing, and migration in waves with a small number of databases in the initial wave.
  • Use learnings to drive adoption of Cloud-Native database services to lower cost of ownership and management.
Google Cloud Platform Amazon Web Service Microsoft Azure
Cloud SQL Amazon Aurora Azure Database
Cloud Spanner Amazon RDS Azure SQL Database
Cloud Datastore Amazon DynamoDB Azure Cosmos DB

Cloud BigTable Amazon DocumentDB Table Storage

Google Cloud Platform

  • Cloud SQL is a fully managed Postgres and MySQL database service.
  • Cloud Spanner is a relational database service that is globally scalable and strongly consistent. It is the only database among all cloud providers that combines the benefits of relational database structure with non-relational horizontal scale.
  • Cloud BigTable is a large petabyte scale, fully managed NoSQL database service that can support both Analytics and Operational workloads.
  • Cloud Datastore is a highly-scalable NoSQL database for applications that need high scalability and sharding support. GCP automatically manages the sharding based on application load which makes it highly efficient for Web, Mobile, and IOT applications.

Amazon Web Services

  • Amazon Aurora is an AWS Native managed database service, which offers MySQL and Postgres database compatibility. Compared to other Amazon RDS services Aurora offers higher performance, durability, and tightly integrated Security & Management features.
  • Amazon RDS is a managed database service for external databases. RDS support MariaDB, PostgreSQL, MySQL, Oracle, and MS SQL Server.
  • Amazon DynamoDB is a fully managed, multi-region, multi-master key-value and document database that delivers single-digit millisecond performance. It's in-memory caching and scalability makes it highly efficient for internet-scale applications.

Microsoft Azure

  • Azure Database is a managed database as a service for PostgreSQL, MariaDB and MySQL.
  • Azure SQL Database is a fully managed database service for SQL Server offering complete compatibility with on-prem SQL Server workloads, including optimizations built for Azure cloud.
  • Azure Cosmos DB is a multi-model NoSQL database service for building fast, planet-scale applications, with native support for NoSQL APIs.
  • Table Storage is a highly available NOSql Key-Value database for semi-structured data, for applications that need high scalability and support for flexible schema.
Cloud services for accelerated data assets

Key Considerations

  • Leverage a cloud-native service if it is compatible with your on-premise solution.
  • Experiment with Spot Instances, Container services prior to lift & shift to EC2 instances.
  • Include Cloud-native solutions for cost optimization and ease of management.
Google Cloud Platform Amazon Web Service Microsoft Azure
Cloud Memorystore Amazon ElasticCache Azure Cache

Google Cloud Platform

  • Cloud Memorystore is managed in-memory datastore compatible with Redis protocols to build highly scalable applications.

Amazon Web Services

  • Amazon ElasticCache is a fully managed Redis or Memcached service that deploys and scales the in-memory data stores to support data-intensive apps or to improve the performance of existing applications.

Microsoft Azure

  • Azure Cache is a full managed Redis based open source–compatible in-memory data store to support fast, scalable applications.

Cloud services for analytics data assets

Key Considerations:

  • Driving analytics maturity should be the primary focus when leveraging Cloud for D&A analytics technology.
  • Success depends on a well defined and business aligned Analytics strategy.
  • Focus on innovation while migrating to cloud and not just Lift & Shift.
  • For customers running complex workloads on-premise should access all their workloads and leverage multiple cloud services as applicable.
Google Cloud Platform Amazon Web Service Microsoft Azure
BigQuery Amazon Redshift SQL Data Warehouse
Cloud Storage

Amazon Athena Azure Databricks
Cloud Dataproc Amazon EMR HDInsight

Amazon S3 Data Lake Storage

Google Cloud Platform

  • BigQuery is a fast, highly scalable, cost-effective, and fully managed cloud data warehouse for analytics, with support native support for Machine Learning using BigQueryML. It is fully managed and seamlessly scalable for any analytics workload.
  • Cloud Storage is multi-tiered & redundant object storage for any type of data. It supports strong consistency across various storage classes with single API access across all storage classes.
  • Cloud Dataproc is a fast, easy-to-use, fully-managed Cloud-native service for Apache Spark and Apache Hadoop workloads. Customers can move existing Spark and Hadoop workloads or ETL pipelines without redevelopment.

Amazon Web Services

  • Amazon Redshift is a fully managed, petabyte-scale data warehouse service. The AWS-native Amazon service is highly scalable for BI workloads. Amazon also provides capability via Redshift Spectrum to read data directly from S3 (Object Storage) to implement a Hot and Warm data strategy for BI.
  • Amazon Athena is an interactive query service that makes it easy to analyze the data that is stored in S3 using standard SQL. Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning
  • Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB. EMR Notebooks, based on the popular Jupyter Notebook, provide a development and collaboration environment for ad hoc querying and exploratory analysis
  • Amazon S3 is Durable Object-Based storage that the customers of any size can use to store and protect the data for a wide range of use cases. The variety of storage classes offers the flexibility to categorize the data based on frequency of access. Choosing the right storage class can reduce the cost of storage significantly. Amazon also provides out-of-the-box security and compliance capabilities such as PCI-DSS, EU Data Protection for protecting the data stored in S3.

Microsoft Azure

  • SQL Data Warehouse is a fully managed data warehouse with elastic scalability. Microsoft Azure offers unlimited storage, automated administration, built-in auditing, and threat detection capabilities. Provides guaranteed 999.9% availability in Azure Regions located worldwide. It is a cloud-based Enterprise Data Warehouse(EDW) that leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data.
  • Azure Databricks is a fast, easy and collaborative Apache Spark-based analytics service. Databricks comes with quick to spin Spark Clusters, Collaborative Workspace for developers, Autoscaling, Multi-Language support, and pre-built libraries. In addition to that native integration with Azure Active Directory and other Azure services enables developers to build complex applications for a variety of business use cases.
  • HDInsight is a fully managed, full-spectrum, open-source analytics service for enterprises. HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. Supports a broad range of scenarios like ETL, Data Warehousing, Machine Learning, and IoT.
  • Data Lake Storage is a highly scalable and cost-effective data lake solution for big data analytics. It combines the power of a high-performance file system with massive scale and economy to help you speed your time to insight. Data Lake Storage Gen2 extends Azure Blob Storage capabilities and is optimized for analytics workloads. Data Lake Storage Gen2 is the most comprehensive data lake available.

A Data First Strategy requires an understanding of the data assets available on various public clouds. Hopefully, this blog will help you start the process of wading through the tools and services available and choose the right Cloud Partner to help take your data capabilities to the next level.

For more information on how an effective data strategy can enhance your business read ourNo-Hype Guide To Data Analytics.

  • aws
  • azure
  • gcp
  • data-analytics