3 Strategic Questions To Ask During A Data Warehouse Migration
In this post, Solutions Architect, Rob Whelan, suggests 3 simple questions to ask your team when considering a data warehouse migration from on-premise to the cloud.
As a business or IT leader, when you hear about a potential data warehouse migration, it may be tempting to jump to the tactical steps necessary to pull it off- who you need, what departments need to be on board, how long it will take. However, just a few well-placed strategic questions, before you get to the tactical steps, will reveal significant information about your organization and where it is going.
We will first look at what data warehousing is about, and then look at the questions.
When someone comes to you and says, “We need to move our data warehouse out of the data center,” ask these three questions, in this order:
- Why do you want to move?
- Who will use it, and for what?
- What do you want to do that you can’t do today?
It doesn’t matter to whom you are asking these questions – in fact, the more people you ask, the better picture of the situation you’ll have.
A data warehouse is traditionally any large store of business data, organized centrally so many various functions, like Operations, Accounting, and Sales, can access it. Well-known products such as Microsoft SQL Server Data Warehouse, Netezza, Teradata, and Oracle ERP are all examples of enterprise-grade data warehouses. They are built to aggregate and report on large amounts of data — think, “what are my sales by region over the past five years?”
The data is usually stored in a special pattern called “columnar”: instead of rows of data, you have columns of data, which make it faster to sum up a large number of records, when all you want is “revenue” but not every other attribute of the record (such as SKU, date, location, and so on). A typical relational database, like Postgres, would be much slower on that kind of query.
This technology has become an essential tool of enterprise data management.
However, as the quantity of data your company collects grows, you might be running out of room in your on-premise warehouse. Analysts might also complain about the slowness of reports, or the inability to easily explore data. You are faced with the decision either to buy more expensive storage space, or make do with less data. Neither are great options in today’s dynamic business climate where you need agility to respond to opportunities.
With cloud computing, however, you simply don’t have to make any trade offs. Cloud computing offers cheap and unlimited storage, scalable compute resources, and multiple methods of storing data for analysis.
Now, back to the original question. The move to the cloud seems promising enough, but what else is driving it? Let’s look at the questions one at a time.
Why do you want to move?
Generally, there is a “forcing function” that drives departments to consider moving. You might be running out of room and don’t want to purchase more storage. The licensing on your data warehouse could be running out, and instead of renewing, you would rather look at pay-as-you-go options.
If you stop questioning here, your approach will be very technical. Your IT team will want to lift and shift because it is the most straightforward (a “lift and shift” migration is just running the same servers you have on premise in the AWS cloud, without any optimizations). Your business team will be excited about the cost savings because Amazon Redshift, for example, is usually 90% cheaper than competing enterprise-grade offerings. However, you don’t want to let a good forcing function go to waste.
Unless you are a particularly innovative culture, rarely will someone, out of the blue, make a confident argument that moving to the cloud will offer unmatched agility and responsiveness. But, that is the number one benefit of the cloud – not reduced cost. One major reason for this is the flexibility of pulling together many different data sources for richer insights in a fraction of the time on premise. Think of the “unstructured” data you own – images, and scanned PDFs. Then think of the “semi-structured” data you own – customer service transcripts, product reviews, and meeting notes. According to Google Cloud Platform, at least 95% of data like this is never used for analysis, so there is potential. However, none of these belong in a traditional data warehouse, because data warehouses are designed to crunch numbers. So, you need a wider array of options, which only the cloud offers.
So, finding out why the team wants to move will give you a picture of where their priorities and awareness are, and will tell you where to insert yourself into the modernization conversation. For example, if most of what you are hearing is just “we just need to get out of the data center” then you have an opportunity to cast a vision for the increased agility. If some people are saying, “we need more agility,” then you have a visionary in your midst. See if you can give that person a platform to share that vision in the organization.
Who will use it, and for what?
There are two parts here. If the list of constituents is long, as in you already have a fairly advanced data infrastructure, then you know you have many interests that will need to be satisfied on the journey. Many of our customers, faced with this, prioritize use cases so that the transition can happen in a focused way with a limited risk of backfiring. Namely, you might pick a department with only internal users, with simple but valuable reports.
If you find that very few people will use it, that doesn’t mean you have a simple task before you. In fact, you have a broader, possibly cultural problem: why in today’s world would data access be limited to just a few? Find out if there are attitudes that get in the way of democratizing the data you have. Perhaps, there are legitimate fears around data security. Maybe people are being protective of their data. These are common situations that we have helped to resolve with customers. Ironically, the solution tends to be technical at first: we build a small, but cross-functional, proof of concept that displays the power of flexibly combining data from multiple sources. For example, you can join product sales data (operations) with product review data (customer service) to get a rich angle on how customer perception is driving revenue. A picture’s worth a thousand words: seeing how quickly these insights can be had is motivating for protective teams.
Now that you know who will be using the data warehouse, ask what they will be using it for. If people mostly say “reporting,” then you would be doing basically the same thing in the cloud as on-premise, and then you have an opportunity to challenge your teams to think more deeply. Have them explore the other data storage options available. Ask them if there is anything else beyond reporting that can be done– and if the material effort to move to the cloud is even worth it just to do the same thing as before, but on different servers.
If you already have a culture of ad-hoc data exploration, then you are starting from a very rich foundation. Data warehouses are well known for data exploration, but discovering datasets can be difficult, and the queries themselves can be quite slow. For data, one of the biggest productivity benefits in the cloud is rich, ad hoc data exploration that is at once secure and rapid. Your teams that perform ad hoc exploration on-premise can use existing skills (writing SQL) in the cloud, but on slightly different tools. This will shorten your learning curve when coming to the cloud, and excite your teams for the new possibilities.
What do you want to do that you can’t do today?
This question is best delivered to your creative types and your team leads; or perhaps only you can answer this. Common responses we hear are:
- Machine Learning (ML) and Artificial Intelligence (AI). Given the huge hype cycle around AI, this is not surprising to hear this. But, peeling back a layer or two, we usually find this means teams are interested in optimization and automation of some kind. ML is simply using statistical algorithms to model data in ways that we haven’t tried yet with traditional models. So, if a team feels they have reached the limit of their effectiveness with forecasting, then an ML model performing a regression “lookahead” into the future might work very well. Similarly, if a company runs thousands of price incentives around the country, they will want to ensure each incentive is priced optimally without having to expend massive resources analyzing each deal. This is a great spot for an advanced statistical analysis. The point is, if people ask for ML and AI, press for details and get more specific about the business problem, because after all, ML and AI are often just tools in a toolbox.
- Better data access across the company: if you hear this, it is a sign that your teams want to access data more rapidly and without many hurdles. Ask a little further what they will do with that access and how they will control it.
- A “sandbox” for data exploration. This is a great sign of pent-up demand for creative data analysis. You might have talented data engineers and data scientists who cannot easily access data to make new insights; or they have to follow a cumbersome request process to import data into a workstation. Sandbox environments are simple in AWS if you follow our best-practices method of organizing all your raw data in S3.
Whatever your team responds with, keep asking questions because these conversations, if you peel back enough layers, will reveal very concrete, feasible initial projects that can unite your teams during a significant change such as this. Imagine that in the first few weeks of a data warehouse migration project, you have a working dashboard, in the cloud, showing two or three highly valuable analytics that were impossible on premise. Other partners in the ecosystem will create slide decks and plans for months before getting to this, whereas a working POC is a first-class citizen in our workstreams.
Cloudreach works with enterprises in all stages of the cloud data warehousing journey – beginners who have a few departments tinkering with the cloud, to experts who need to stabilize an operational machine learning pipeline. Regardless of the stage, our approach starts with a series of business-centric questions that put the organization’s goals front and center, so we have a productive stream of wins that add value instead of adding technical debt.