Data Management dFakto

Data Vault: prioritizing business change and history over technology

So you actually bought into that new technology stack that was really going to improve your analytics and

So you actually bought into that new technology stack that was really going to improve your analytics and help you make better decisions? The truth is while it may well help, the conflict between flexible enterprise change and accurate historical reporting will be always be a discipline rather than a technology when supporting a successful modern business. Sadly no technology is the holy grail. After all, none of them has yet managed to fully put you at ease, has it?

Just as you cannot expect a project to run itself, even though you have the latest project management software, it’s the same with data warehouse management because it is the methodology that counts. Yet just like with project management methodologies, Data Vault is also agile, compartmentalizing change to encourage flexibility.

Pitfalls and inadequacies of steady state

In the best of cases it probably takes months to organise changes within your data warehouse. And the crazy thing is – even if you are doing even simple changes – it may be incomprehensible why ‘just adding a few fields’ (or updating one, heaven forbid!) can take such a long time.

The main problem is that it is the methodology of storing, not the technology that is the issue. The fact is that old Star Schemas, 3NF (Third Normal Form) systems and Snowflakes, just aren’t cut out for change as they were (and still are) designed for analysis of consistent data rather than data capture. So while Star Schemas and Snowflakes are especially good at some analytical tasks, and 3NF is great for enforcing cascading point-in-time structure, none of these methods cope well with changes. And none are made to accommodate (and reconcile) the data structure from 5 years ago with the data you have today and requirements for tomorrow … they are simply too prescriptive and only designed for a single, current way of doing things. As a consequence, if it changes, you essentially have to (carefully) throw out the old and start again, hence it takes all that (expensive) time.

The best answer is to accept change, and embrace it.

Enter Data Vaulting. (Applause). It is able to capture data from anywhere, and extract virtualised views as moving snapshots when required.

‘But our data is always changing.’

It always seems to be about ‘new’ and ‘the next big thing’, but glancing backward at the ‘history’ or considering ‘change’ is often an afterthought, and best left to ‘others’ to reconcile. How are we to understand how well the business is doing over time without some consistency amidst the technology transformation? Of course it is interesting and important to adapt so that you can be interoperable to the best standards, but how can we understand (or allow!) real growth and change while still accurately tracking history?

The older methods like Kimball and Inmon (Star Schema, 3NF and Snowflake anyone?) were created a long time ago, in times when change wasn’t so rapid and we didn’t have such volumes of data. Back then, you were designing a data model for specific single solutions and could afford the time to throw out the old data model and start again. Their continued appeal is that lots of people use them – but they are trying to use them for the wrong purpose! They are effectively trying to use buckets to grab rivers of information, attempting to read snapshots of a data stream as if it was a fixed data model rather than a continuum of change. They try to put the snapshots together into some sort of continuous historical view of business information – and it all begins to look a bit odd! Data Vault changes the game from old-school disciplines and bends to the flow of the river making the technology irrelevant.

Distinct purpose: capture raw history, flexibly

Emerging from these older disciplines, Data Vault is a hybrid evolution and has been specifically designed for change management and organizing diverse sources from the ground up. So just as older methods attempted to shoehorn the data streams into the old methodologies and make data analytically usable, Data Vault is all about embracing change across the entire enterprise. The old methodologies are still very useful – they are good at transforming data into insights – but they are not good as raw systems of record. The fact is Data Vault’s sole purpose is to structure heavy workloads of changing disparate data sources, and then handle them in an agile way, and promote onward data quality and processing. Data Vault’s purpose is primarily as its name suggests – to take raw ‘Data’ (rather than information) and put it in a ‘Vault’ (captured and safe) that stores everything in its purest original state – an immovable, yet flexibly structured, system of record.

By borrowing modern social data concepts like ‘relationships’ and ‘entities’ and combining them with the older methodologies, the structure of your business concepts is separated out from the sources and forms the skeleton on which the data linked. It is possible, and recommended, for a knowledgeable business user to design the intuitive core data concepts behind the Data Vault on a whiteboard before involving the data professionals to fill in the details. Data Vault is a ‘business first’ methodology, focusing data around the business ideas, rather than conforming the business ideas to the data model.

Business may change superficially, but the concepts that underpin it, aspects such as customer, product, etc., do not. It is these that form the backbone of the Data Vault.

You want to learn more about Data Vault?

Find out more from our friendly team of business and technical experts at: info@dfakto.com or +32(0)2.290.63.90.

Share on FacebookTweet about this on TwitterShare on LinkedInEmail this to someone