One of the questions about Information Products I get asked a lot, is what is the relationship between an Information Product and the data it relies on?

One set of data, many Information Products

When we start working in the data domain we should be aware that data is a thing that is created once and used many times.

With our AgileData hats on we adopt a pattern where we model the data the first time we use it, to make it easier to use the next time.  We use our Event Modelling and Data Vault patterns to assist us to do this in an AgileData way.

As a result we will find that over time we will end up with many Information Products accessing the same data.

Which leads to the questions what came first the data or the Information Product?  The answer of course is the data, as without data an Information product has no value.

However the first Information Product we create always ends up requiring the effort to model the data for the first time.  So as a result the first Information product always takes the longest.

The benefit of data reuse

The effort to deliver additional Information Products should reduce, overtime as new Information Products access data that has already been modelled and is therefore already available, this is the value we are unlocking by modelling the data.  This is only true for Information Products that are based on the same core business events or data domains.  When we start to work on Information Products which require data for a different core business event, then we will typically need to do some more modelling.  When we start to work on Information Products that require data from another data domain, there is a high likelyhood that we will need to collect data from a new system of record.

The cost of data reuse

As a result of reusing the modelled data, we end up with multiple Information Products being coupled to the same data.  When the structure or content of this data changes their is a larger blast radius as multiple Information Products are impacted.

The good news is with multiple Information Products there is a larger group of users to tell you when the data has mutated, but seriously one of the goals of the AgileData Way of Working is we should identify when data has mutated before our customers do.  Luckily there are a number of AgileData patterns that help us achieve this goal including automated data observability, data maps and self healing data mutation techniques which minimises the downside of data reuse.

The other downside is it is slightly harder to scale the agile developer squads if they are all working on Information Products that are using the same core data, as there is a chance of one squad impacting another squad.  Again we have some AgileData patterns such as the Information product brief that helps with this problem.

Whats the alternative?

The alternative is you create a new set of data for every Information Product, which will solve some of the dependency issue.  But it will result in minimal reuse so will never result in the benefits that reuse of data brings.  Also when data in a system of record mutates, it will still impact every Information Product that is dependent on data from that system.

What about code?

If many Information Products reuse the same modelled data what is the relationship between an Information Product and pieces of code?

There are four sets of code that an Information Product is dependent on.

  1. The code that automates the collection of data from the System of Record (s);
  2. The code that models the data;
  3. The code that produces the final Information Product output;
  4. The DataOps scaffolding code.

As you can see only the third set of code is unique to the Information Product, that is the code that produces the final consumable output, be it a dashboard or report, a data feed to another System of Record or a data extract.

All the other sets of code have multiple Information Products dependent on it.