You have reviewed the latest reports and the numbers are close, but something has changed? You are tempted to review the data modeling to try and correct the perceived error ? But you know that it is going to take some time to trace, and you have no guarantee of finding an answer. Perhaps someone has changed something in the model, somewhere and it has rippled out? Maybe it’s just a small change? Who knows?

This is not an uncommon scenario. The fragility of old data modeling approaches being treated as data stores, has often resulted in pesky errors being introduced into the systems.

This is usually not the fault of the people managing the data but an inherent problem with the way that the old methods were designed.

They were not intended for the rapid change and transformation that we experience in today’s business environments. If you change something as innocuous as a field name or add anything like a new field to a dimension, you will have to cope with some anomalies possibly appearing. Changes that one department request thinks are small, will for another department equal the kind of impacts in their figures that they really don’t need to be worrying about this quarter.

It’s a moving feast and sometimes, you can’t please anyone!

Times are changing; data modeling, too.

In yesteryear, very clever computer scientists designed methodologies for data capture that were leading technology of their day. Those days were typically slower-paced and less evolutive. Expectations were that it would take some time to change, time that wasn’t as commoditised as it is now. If a system took a few months to get the data modelling right, before it was useful, well then they had better take that extra time.

This is unthinkable these days, yet the same models are being used!

These old methods are still useful but underneath we now need something more. They are useful as ‘Data Marts’ in our new world, and are repurposed for what they are good at : to capture a snapshot view of data. However, the data itself needs to be held in structures that are more suited to and built for change!

Yet still, you will see many attempts to persist this old way of thinking. Star Schemas and 3NF architectures are at breaking point in business environments. Indeed, they are pushed to analyse history and are constrained by regulations like GDPR. They have their place, but it is not for high performing data warehouses. These older data architectures are more suited to stability and consistency rather than evolutive history and change capture!

Decompose structure to master change in data modeling

HUB

  • Data Vault separates the structure of data from the myriad of data sources that are attached to it.
  • The model is fixed and never changes. So is the data that you attach to it.
  • Once added, you cannot remove it. Initially this concept sounds restrictive, however the intention of the Data Vault is to ‘capture’ data and hold it in a fixed state and the trade-offs are profound.
  • It pulls data from multiple sources around a single reconcilable set of identifiers called a ‘Hub’ (e.g., a business entity, like a customer or product).
  • You can attach as many as you like, because the ‘Hub’ is a central point of management.
  • This becomes ideal if you are looking to understand discrepancies in your data, while keeping a system of record. Master Data is also a possibility, where you can compare and contrast each source into a derived ‘golden record’, further on.

LINKS

  • ‘Links’ form the second part of the core structure of a Data Vault, and these are where the flexibility and agility come into play.
  • You can have different teams working on different ‘Hubs’ that are unaware of each other if need be.
  • They may be working on data cleansing or master data, or whatever.
  • You can keep them separate by design or by schedule and still hold it all together in the last case by building the links separately.
  • The Links are effectively ‘many-to-many’ tables. So the relationships scenarios are what you chose to make them. There are no constraints as long as the business entities are well thought out.

BUSINESS VAULT

If you want to clean and enrich data, derive data, build business rules and data quality or even build out your ‘golden record’ Master Data.

That happens in another stage called the ‘Business Vault’, but the Data Vault itself is truly a single source of unchanging truth, warts and all.

There are benefits to this approach :

  • You know that what is in your data warehouse is truly a historical record;
  • It is an audit-able trail of consistency in your business;
  • You can derive an unlimited number of Data Marts from it, that will be absolutely consistent over time;
  • If you build sympathetic business rules, they will also be consistent with each other;
  • The reports and analyses that you conduct on this data will remain consistent over time, even if you add more data, as nothing is EVER deleted from a Data Vault, unless it is specifically designed to do this by regulatory constraint.

In conclusion, the Data Vault is built from the ground up to manage growth, while maintaining consistency.

The magic happens because of ‘separation of concerns’.

Or find out more from our friendly team of business and technical experts at info@dfakto.com or +32(0)2.290.63.90.


Data Vaults are agile, but what does that mean in simple terms?

In simple terms, it means you don’t have to « eat the whole whale » at one time. Indeed small bites are more effective for achieving data visibility. You can even task the meal to a collection of people all at once, and the pace can increase without errors. It’s because the Data Vault is highly structured that you can split it up into pieces and (without any difficulty) join them back together again. It means you can do it all at once, in small increments, or something in between and at your own pace, according to your own resource constraints.

Just like Lego (snap it together, build out)

Your business decides the core structure of business concepts, and the data snaps onto these concepts, rather than being forced into more traditional styles of data modelling. You can start right now with as little or as much data as you have. Add to it as you figure out what your future requirements are instead of designing a full data model first. And because Data Vault is atomically structured you can use the tiniest piece in a dashboard from the first time you have that smallest bit of agile data available. If your data grows, it won’t affect what you have already measured.

You might want to do a trial on a small scale to evaluate the benefits before expanding it to the entire enterprise? No problem, Data Vault won’t waste your time meaning whatever you do now will not change allowing you to grow and expand your enterprise data warehouse once and one time only. You may choose to scale quickly by setting parallel teams each building up separate agile data stores. With Data Vault, these can all be easily synchronised through the separation of the core structure from data and can be merged together afterwards. If teams agree on a simplistic core structure of business concepts and relationships, they can each develop on top of the shared construct. It isn’t a model, it’s an agreed way of connecting business entities – something you can achieve on a whiteboard!

  • Start anywhere, evolve elsewhere, bring it all together anytime: Integrate business domains driven by real business priorities. Relations between different business domains can be established at any later point in time (doesn’t need to be thought of upfront).
  • Start small, grow any size: It doesn’t matter whether you’re building a small data warehouse for some Master Data or a full enterprise data warehouse. The advantage of Data Vault projects is that they bring results very early after you start the project, and because they are based on business concepts more than data, they are highly flexible as they grow.
  • Automate from the start for fast iterations: The key to Data Vault is deconstructed activities which are rapidly repeatable. Data Vaults and automation go hand in hand, indeed the simplicity and consistency of the approach encourages automation. Standardise your business concepts (on a whiteboard), generate the structure, automate the integration routines and then start growing your agile data warehouse.

Deconstructed data models improve automation

Once the key components of the business and relationships are understood (keys or IDs, like the customer reference number on an invoice), you simply hang data off of it, as you find it. It means that a disjointed approach to data gathering, if that’s all that is possible, doesn’t impact on the final Data Vault method, because that’s how it’s meant to work! Fast, slow, big small – it doesn’t matter. Work on different areas independently and then bring it all together, or start by attaching all of your existing data marts as sources and keep going from there, but with agility. It’s a very intuitive and flexible process for the business who can then follow the data modelling without any requirement other than they understand how the business processes work and can operate a whiteboard.

Breaking down and segregating the work in this way makes it a very repetitive process which scales well with automation. In fact, automation is highly recommended as the core structure of the Data Vault is simply meant to extract and load data ‘Satellites’ onto the lattice-work of ‘Hubs’ and ‘Links’. In fact by segregating the methodology into a network of data tables, the system almost requires automation to fulfil it’s ultimate goal of near-real-time business data for analytics. You’ll be pleased to hear that dFakto has already developed the key automation features you need to build your first Data Vault in just a few weeks!

You want to learn more about Data Vault?

Find out more from our friendly team of business and technical experts at: info@dfakto.com or +32(0)2.290.63.90.


The goal is to turn data into information, and information into insight.
Carly Fiorina, Former CEO of HP

 

There are many different factors that contribute to dFakto’s ability to deliver insights and results quickly and accurately. The main factor is its proprietary “DataFactory”, that uses a “Data Vaulting” process for inputting data.

Here is what dFakto has put in place :

dFakto’s understanding of the problem

When dFakto looks to solve a client’s data analysis problem, it starts by understanding the business problem. Not from the amount of data available within the company.

First, dFakto Business Analysts are looking for precise answers needed to take a decision.

Second, they are looking for data that will be able to provide insights.

Consequently, they only source the data they need in order to solve the raised problem.

dFakto’s systematic methodology

The « DataFactory » is conceived around a model of how the client does its business. In this way, it mirrors the critical information that is needed to solve a client’s problem. Furthermore, the incoming data is broken down into its most elemental parts and then archived.

This means that if ever new data fields or new sources are added, there is no need to configure again the database architecture.

dFakto’s DataFactory

dFakto creates a “DataFactory” with the data it receives from the client and uses a “Data Vaulting” process for inputting data. It stores everything, regardless of the operating system it comes from. It is a rigorous and systematic way of storing the data. Therefore, it ensures the history of all changes that may appear with the data. Consequently, it enables an auditor to trace values back to their original source. It also lets the Project Manager see who has updated a particular data field, and when.

This has the objective to have an easy traceability of every change that appear and therefore create insights for the client.

The fundamental principle of the « DataFactory » is that there is no distinction at this stage between good and bad data. This aspect is only considered and worked on after the data has first been stored.

There is only “a single version of the facts”.

dFakto’s insights

The advantages of this type of a system are multiple. The input data is always raw data with no previous manipulations. In this way, they can track exactly:

  • The origin of the data;
  • The accuracy of the data;
  • The responsible of the data and;
  • The location of the data.

Better still, it doesn’t matter how complex the model becomes or how many new data sources are added.

In this way, there is never any need to go back to the beginning and start the whole process again. It means that if another problem arises and the same data is required, then everybody is able to access the same ‘input’ at any time.

In conclusion, dFakto therefore doesn’t necessarily do “big data”, though it can and does. Instead, it works with the “the right data”. The company uses its business analysis expertise and experience to unpack a client’s problem and see precisely what information is needed to answer a specific question. This results to a reduction of time spent on collecting and checking data. Consequently, the time spent on understanding and interpreting what the results mean increases. Better still, clients will enjoy peace of mind, as they know that the answers and insights generated are based on the most recent available data.

 

Find out more from our friendly team of business and technical experts at info@dfakto.com or +32(0)2.290.63.90.