The Path to Data Analytics: Building a Strong Foundation with GCP
While the prospect of deriving reliable insight with machine learning (ML) and AI-driven data analytics may seem overwhelming, it doesn’t have to be, says Cloudreach Chief Technologist, John Loughlin.
In a June 2020 webinar, we discussed how to approach data analytics with pragmatism and confidence using Google Cloud Data (GCD) Services.
Here we'll recap that advice, providing an overview of how businesses can implement the building blocks of becoming a data-driven organization using GCD Services.
Start with the Core Capabilities of Data Management: Security and Governance
Keep in mind that the overarching goal – to enable the analysis that opens the door to better business decisions – rests on the core values of security and governance, which together provide the context for making effective use of technology.
Data Management Security
Safeguarding data is fundamental to your program and to doing right by your stakeholders. Important notes to keep in mind include:
- Use identity and access management practices consistent with the principle of least privilege, meaning that you grant users just enough access to accomplish the task at hand. This reduces potential impact if credentials are compromised.
- Encrypt data at rest and in motion. (This is standard practice in the cloud.)
- You can manage encryption keys the same way you would on-premises using Google Cloud Key Management Service (KMS).
Recent well-publicized instances of exposed data have drawn attention to data governance and led to increasingly stringent requirements.
There are tools and ways to help manage this and ensure that you’re operating at the leading edge of data governance. These include the Data Catalog and metadata management service, as well as third-party audits, certifications and attestations against global standards.
Database Options: Transactional v. Analytic
To achieve performance at scale when moving from on-premises systems to cloud-based database systems, consider your data and how it’s used.
Will you be working with data that’s schema-on-read or schema-on-write? (I.e., data that’s structured and relational in nature [think of a spreadsheet] v. data that’s unstructured or semi-structured, such as data streaming in from devices, or sensor data.)
Will you be performing transaction processing? Functions involving purchasing or transferring money require a small, discrete piece of work or several pieces of work that must happen as a unit. The answer helps you determine whether you’ll be querying individual items of data or sets of items and, in turn, whether you need a transactional processing system or an analytical system.
Once you’ve determined what type of database management system will best suit your needs, GCP provides a range of fully managed, scalable tools to help you dive in and get started. Some options include:
- Cloud SQL/Cloud Spanner: Cloud SQL allows you to run fully managed, relational database services while Cloud Spanner, built to scale horizontally, is a good option for very large operational systems.
- BigQuery: Ideal for analytic workloads.
- Cloud BigTable: Designed to house less-structured data that doesn’t belong in a relational database.
Analytics: A Progression of Data Capabilities, Complexity and Speed
Industry experts have overwhelmingly observed that data analytics is a progression – a cumulative ability that flows from basic to sophisticated in terms of capabilities, the complexity of the data on which you’re basing decisions, and the speed at which you’re operating.
The Analytics Progression infographic illustrations the maturation process moving from left to right along the levels of data capability progression, from descriptive (relational data) and diagnostic (becomes more rules-based) through predictive (multi-sourced data, including machine learning techniques) and prescriptive (near real-time, streaming data).
Cloudreach has seen this progression play out repeatedly throughout its experience in helping clients transform data practices. Comfort level and the ability to produce reliable analytic applications come from that maturation from left to right (see graphic “Analytics progression”).
In other words, wait until you’re comfortable with data feeds, data transformation, collection from multiple sources, and data contextualization before you take on prescriptive analysis.
How GCP Eases the Journey to Sophisticated Data Analytics Capabilities
That being said, it doesn’t have to be an incredibly lengthy process. GCP eases this process by making infrastructure available in a simplified, easily consumable form that facilitates accelerated experimentation with data storage and management.
This eliminates the need to seek out spare capacity from licensed servers, or purchase temporary versions of software. Getting the tools you need to run these experiments is simplified, allowing you to explore possibilities and get the hands-on experience you need to build confidence – and more sophisticated analytics capabilities.
GCP Services also documents and makes available best practices on how to build various components and accomplish specific tasks, helping minimize the risk of creating an unwieldy accumulation of untried practices and production systems.
GDC Tools that Enable ML- and AI-Driven Insights
GCP provides several key tools that help you build a strong foundation supporting ML and AI analytic capabilities, such as:
- Data ingestion at terabyte scale (needed for transforming data in real-time)
- Reliable streaming and data pipeline
- Faster data warehousing and predictive insights
- Advanced data visualization/event visualization
Tying Data Analytics Practices to Reality: Client Success
Cloudreach has extensive experience in helping clients take full advantage of all that GDC Services has to offer. Our company has been able to make a difference for our clients using these tools including the customer win below.
State DOTs Use Prescriptive Analytics to Develop Traffic Management Platform
In a case that clearly illustrates how becoming data-driven is a progressive process, Cloudreach has worked with the Department of Transportation (DOT) in multiple states to create traffic management platforms that use predictive and prescriptive data analysis techniques.
These projects involved migrating from a relational database management system (RDBMS) to BigQuery integration of additional data (e.g., sensor data). The DOTs started with descriptive reporting and descriptive analysis, generating information on their highway systems, such as traffic loads, construction project locations, and rates of accidents in specific areas.
Progressing to integration of sensor data, the DOTs have enabled the development of sophisticated traffic management platforms that increase situational awareness of patterns and response coordination, and predicting of road and device management and even potential weather impacts.
Key technologies used: BigQuery, BigTable, Cloud Spanner, DataFlow, BigQuery BIS, Cloud ML, Pub/Sub and Cloud IOT Core