You have reviewed the latest reports and the numbers are close, but something has changed? You are tempted to review the data modeling to try and correct the perceived error ? But you know that it is going to take some time to trace, and you have no guarantee of finding an answer. Perhaps someone has changed something in the model, somewhere and it has rippled out? Maybe it’s just a small change? Who knows?
This is not an uncommon scenario. The fragility of old data modeling approaches being treated as data stores, has often resulted in pesky errors being introduced into the systems.
This is usually not the fault of the people managing the data but an inherent problem with the way that the old methods were designed.
They were not intended for the rapid change and transformation that we experience in today’s business environments. If you change something as innocuous as a field name or add anything like a new field to a dimension, you will have to cope with some anomalies possibly appearing. Changes that one department request thinks are small, will for another department equal the kind of impacts in their figures that they really don’t need to be worrying about this quarter.
It’s a moving feast and sometimes, you can’t please anyone!
Times are changing; data modeling, too.
In yesteryear, very clever computer scientists designed methodologies for data capture that were leading technology of their day. Those days were typically slower-paced and less evolutive. Expectations were that it would take some time to change, time that wasn’t as commoditised as it is now. If a system took a few months to get the data modelling right, before it was useful, well then they had better take that extra time.
This is unthinkable these days, yet the same models are being used!
These old methods are still useful but underneath we now need something more. They are useful as ‘Data Marts’ in our new world, and are repurposed for what they are good at : to capture a snapshot view of data. However, the data itself needs to be held in structures that are more suited to and built for change!
Yet still, you will see many attempts to persist this old way of thinking. Star Schemas and 3NF architectures are at breaking point in business environments. Indeed, they are pushed to analyse history and are constrained by regulations like GDPR. They have their place, but it is not for high performing data warehouses. These older data architectures are more suited to stability and consistency rather than evolutive history and change capture!
Decompose structure to master change in data modeling
- Data Vault separates the structure of data from the myriad of data sources that are attached to it.
- The model is fixed and never changes. So is the data that you attach to it.
- Once added, you cannot remove it. Initially this concept sounds restrictive, however the intention of the Data Vault is to ‘capture’ data and hold it in a fixed state and the trade-offs are profound.
- It pulls data from multiple sources around a single reconcilable set of identifiers called a ‘Hub’ (e.g., a business entity, like a customer or product).
- You can attach as many as you like, because the ‘Hub’ is a central point of management.
- This becomes ideal if you are looking to understand discrepancies in your data, while keeping a system of record. Master Data is also a possibility, where you can compare and contrast each source into a derived ‘golden record’, further on.
- ‘Links’ form the second part of the core structure of a Data Vault, and these are where the flexibility and agility come into play.
- You can have different teams working on different ‘Hubs’ that are unaware of each other if need be.
- They may be working on data cleansing or master data, or whatever.
- You can keep them separate by design or by schedule and still hold it all together in the last case by building the links separately.
- The Links are effectively ‘many-to-many’ tables. So the relationships scenarios are what you chose to make them. There are no constraints as long as the business entities are well thought out.
If you want to clean and enrich data, derive data, build business rules and data quality or even build out your ‘golden record’ Master Data.
That happens in another stage called the ‘Business Vault’, but the Data Vault itself is truly a single source of unchanging truth, warts and all.
There are benefits to this approach :
- You know that what is in your data warehouse is truly a historical record;
- It is an audit-able trail of consistency in your business;
- You can derive an unlimited number of Data Marts from it, that will be absolutely consistent over time;
- If you build sympathetic business rules, they will also be consistent with each other;
- The reports and analyses that you conduct on this data will remain consistent over time, even if you add more data, as nothing is EVER deleted from a Data Vault, unless it is specifically designed to do this by regulatory constraint.
In conclusion, the Data Vault is built from the ground up to manage growth, while maintaining consistency.
The magic happens because of ‘separation of concerns’.
Or find out more from our friendly team of business and technical experts at firstname.lastname@example.org or +32(0)2.290.63.90.