Its a problem of boiling the ocean, overlapping boundaries  and a lack of a shared language

Data in an organisation is often broad, complex and full of uncertainty.  One of the many data management challenges it brings is how to break large problems down into smaller problems we can work on.

We have complexity in the number of Systems of Capture we have to collect data from.  A survey found that organisations have a myriad of Systems of Capture we need to collect, combine and consume data from.

“The average small business uses 102 different apps, while each mid-market business uses an average of 137 apps. Enterprises have, on average, 288 different SaaS apps in usage across their businesses.”

“Companies are churning through more than 30% of their apps every year”

https://www.blissfully.com/saas-trends/2020-annual-report/

We know that dealing with data from a single System of Capture is much easier than managing data from across multiple systems

We have complexity in the number of people we have in our data teams to do the data work.  It is not uncommon for data teams to have 20 or 40 people in them.  We know that teams between 4 and 9 are the most effective.

We have complexity of language, the technical data language where we argue the difference between an Operational Data Store (ODS) and a Data Warehouse (DW), as well as the business language where we argue about the definition of an “active customer“.

What is a data domain

A Data Domain is an approach to describing a subset of data, in a way that the business stakeholders and the AgileData team can agree where the boundary for that data is.

It sets a specific data boundary that represents the different aspects of an organisation who generate or use data and is typically based on the business reality of the organisation.  This business reality provides us with a shared language. 

Data domains can be thought of as the “realm” of data that is relevant to a particular organisation, industry, or use case.  

Data domains can be defined at different levels of granularity, depending on the context and the specific needs of the stakeholders involved. For example, a data domain may focus on a specific industry, such as healthcare or retail, or it may focus on a specific aspect of an industry, such as customer behavior or supply chain management. 

For example, a healthcare organisation might have data domains related to patient records, clinical trial data, and billing information. An e-commerce company might have data domains related to customer data, product data, and order data.

Data domains can also be defined based on the type of data being analysed, such as structured data, unstructured data, or time-series data.

The concept of data domains is a useful pattern in an AgileData Way of Working, as it helps data teams to ensure that data is managed and used in a way that is meaningful and effective.  It also helps break large data problems into smaller data problems which we can solve faster and therefore gain feedback and deliver value earlier.

The benefits of Data Domains

Data domains help by providing smaller and agreed data boundaries allowing teams to organise data into smaller, more manageable chunks or categories.  This pattern of decomposing large things into smaller things provides the AgileData team with a range of benefits.

Data domains help provide a clear and well-defined scope for the data that is collected and managed. For example data in the Customer domain may hold sensitive data compared to data in the Product domain, and therefore data in each domain may be managed differently.

Data domains make it easier for AgileData teams to focus on specific areas of business and data expertise and become more knowledgeable about those areas. This can be especially useful in large organisations where there may be a wide range of data and subject matter expertise is required across many different data domains.

Data domains help improve communication within an organisation by providing a common language and framework for discussing data. This can make it easier for subject matter experts to share their knowledge and expertise with others, and can help to ensure that everyone is speaking the same “data language” when discussing specific data topics.

By dividing the data into smaller data domains, it becomes easier to understand the relationships between different pieces of data within those domains or across domains.  We know that working with data that crosses multiple data domains is more complex than working with data that is contained within a data domain.

Data domains help prioritisation conversations.  If the organisational strategy is to increase sales and revenue, yet all the data work is currently prioritised around the HR data domain, then there could be a disconnect between the organisational strategy and the prioritisation process.

Data domains can help to facilitate collaboration within and between AgileData teams, as they provide clear boundaries and definitions for the data and help to ensure that everyone is working with the same understanding of the data.  If two AgileData teams are working on delivering Information Products that touch the same data domain at the same time, we know there will need to be increased collaboration between those two teams, compared to if they were working on Information Products in separate domains.

Ultimately, data domains provide a way to organise and categorise data, and to facilitate collaboration and communication between different stakeholders and AgileData teams who are working with the data. By defining the data domain and the concepts and relationships within it, a shared understanding of the data can be developed and how it relates to the specific problem or opportunity trying to be addressed. This can help ensure that data is collected, combined, analysed and consumed in a way that is meaningful and relevant to the domain, and that it can be used to drive decision-making and action.

Data Domains closely align with other AgileData Patterns

The data domain pattern closely aligns with other AgileData patterns.

The Information Product pattern uses the same boundary pattern as data domains to help break big pieces of work down into smaller, manageable chunks.

The core business process area in the Information Product Canvas can also be used to determine whether an Information Product is spanning data from multiple data domains, and therefore inform the level of complexity and effort it will require to deliver it.

The patterns for data design are also related to data domains.  It is common for a core business concept, for example Customer to span multiple data domains, but it is unusual for a core business process to do so, they are typically bound within a single data domain.

Data domains also help with the supporting team roles such as Product Leader and Product Owner.  The Product Owner role is typically bound within a data domain compared to a product leader who typically spans multiple domains..  

Well described data domains also help with recruiting Product Owners with a good leave of subject matter expertise for the domain.

Data Domain Boundary Patterns

There are a number of patterns we can use to help us define relevant and useful data domain boundaries within an organisation.  These include team topology, organisation structures, System of Capture, type of data, core business concept,  core business process or use cases.

  • Team Topology / Organisation Structure: One way to define the boundary for a data domain is based on the specific business function or area of the organisation that the data is relevant to. For example, a data domain might include all of the data that is relevant to the sales and marketing business unit within an organisation or the Human Resources business unit.
  • Systems Of Capture: Another way to define the boundary for a data domain is based on the specific source or sources of the data. This might include data that is collected from a particular system or process.  For example the Customer Relationship Management (CRM) system or the Inventory system.
  • Type of data : A data domain can also be defined based on the specific type or category of data that it includes. For example, a data domain might include all unstructured data such as video or voice data.
  • Data related to a core business concept: The boundary of a data domain can also be defined based on the relationships between different data elements. For example, a data domain might include all of the data that is related to a specific set of products, including data about the product itself, data about the customers who purchase it, and data about the sales and marketing efforts related to the product.
  • Data related to a core business process:  The boundary of a data domain can be defined based on an end to end business process, for example sale to fulfillment,  farm to table or purchase order to payment.
  • Use case: the boundary for a data domain can be defined based on industry or organisational use cases, for example Life Time Value, or a Data Marketplace.

The pattern you use to define your data domains is highly dependent on the context of your team and your organsations, there is no one data domain pattern to fit them all.  There are however common data domains you will see in specific industries, we describe some of these later.

Patterns Overlaps

When selecting a pattern to define your data domains you will identify a lot of possible overlap across the patterns.

For example if you used Systems of Capture as your primary pattern and started with your Webstore system and then compared it to the Core Business Processes pattern of customer orders product, you will see a lot of potential overlaps.  You will see both contain the concept of Customer, Products and Orders.

You will find Core Business Concepts and Processes often span multiple Systems of Capture, and are used in multiple Use Cases, or by multiple Business Units.

The key is to pick one pattern for defining your data boundaries and use it consistently, the value is in the clear and shared language we gain from the data domain pattern.

What is important when defining a data domain

The key pattern we want when defining data domains is the pattern of decomposition, we want to break something which is large into a set of smaller things.

There are several factors that are important to consider when defining a data domain, including:

  • Relevance: It is important to ensure that the data included in the data domain is relevant to the specific needs and goals of the organisation. This might involve considering the business function or area of the organisation that the data is relevant to, the specific source or type of data that is included, or the business goals and outcomes the organisation is trying to achieve.
  • Scope: The scope of the data domain should be clearly defined, in order to ensure that the data included in the domain is well-defined and manageable. A data domain with a narrow, well-defined scope is often easier to work with and understand than a domain with a broad or poorly-defined scope.
  • Stakeholder needs: It is important to consider the needs and goals of the stakeholders who will be using the data in the data domain. This might involve identifying the specific questions or problems that the data is intended to help solve, or the specific ways in which the data will be used.  

The Information Product pattern is a great way to validate these.  Can each Information Product that has been defined be placed into a single specific data domain?

Ideally we will be able to articulate a clear demarcation between Data Domain A and Data Domain B, but in reality it is common to see grey areas between each data domain.  This is ok, when we are working in these grey areas we are aware of that fact and we can adapt our AgileData Ways of Working to help deal with the uncertainties that working on data that potentially spans multiple data domains brings.

Industry examples of Data Domains

Data domains are commonly defined based on industries, as different industries have different areas of focus, generate and use different types of data.

For example, in the healthcare industry, some common data domains might include patient data, clinical data, financial data, and research data. In the financial services industry, some common data domains might include customer data, financial data, market data, and regulatory data. In the retail industry, some common data domains might include customer data, sales data, inventory data, and supply chain data.

E-Commerce

Some examples of data domains that might be relevant to an e-commerce company:

Customer data: This data domain might include information about the customers of the e-commerce company, such as their contact details, demographic information, and purchase history.

Product data: This data domain might include information about the products sold by the e-commerce company, such as product descriptions, images, and pricing information.

Order data: This data domain might include information about the orders placed by customers, such as the products included in the order, the shipping and billing information, and the payment details.

Marketing data: This data domain might include data about the marketing campaigns and initiatives of the e-commerce company, such as email lists, social media followers, and website traffic data.

Logistics data: This data domain might include data about the logistics and fulfillment processes of the e-commerce company, such as information about the warehouses and distribution centers, as well as data about the shipping and delivery of orders.

Insurance

Some examples of data domains that might be relevant to an insurance company:

Customer data: This data domain might include data about the customers of the insurance company, including demographic information, contact details, and policy information.

Claim data: This data domain might include data about the claims process of the insurance company, including data about the type and severity of claims, as well as data about the resolution of claims.

Underwriting data: This data domain might include data about the underwriting process of the insurance company, including data about the risk assessment of potential policyholders and data about the pricing of insurance policies.

Actuarial data: This data domain might include data about the actuarial process of the insurance company, including data about the analysis of risk and the development of insurance products.

Financial data: This data domain might include data about the financial performance of the insurance company, including data about revenue, expenses, and profitability.

Bank

Some examples of data domains that might be relevant to a bank:

Customer data: This data domain might include data about the customers of the bank, including demographic information, contact details, and account information.

Loan data: This data domain might include data about the loan process of the bank, including data about the approval and disbursement of loans, as well as data about loan repayment.

Investment data: This data domain might include data about the investment products and services offered by the bank, including data about the performance of different investments and data about the risk profile of different products.

Financial data: This data domain might include data about the financial performance of the bank, including data about revenue, expenses, and profitability.

Regulatory data: This data domain might include data about the regulatory compliance of the bank, including data about the reporting of financial information and data about the management of risk.

Government Welfare Agency

Some examples of data domains that might be relevant to a government agency that provides social welfare benefits:

Beneficiary data: This data domain might include data about the individuals or households who are eligible to receive social welfare benefits, including demographic information, income data, and information about the specific benefits that they are eligible for.

Benefit program data: This data domain might include data about the specific social welfare benefit programs offered by the agency, including data about the eligibility requirements, the types of benefits available, and the process for applying for and receiving benefits.

Case management data: This data domain might include data about the case management process of the agency, including data about the progress of individual cases, data about the interactions with clients, and data about any issues or challenges that arise.

Financial data: This data domain might include data about the financial performance of the agency, including data about the budget, expenditures, and revenue.

Regulatory data: This data domain might include data about the regulatory compliance of the agency, including data about the reporting of financial information and data about the management of risk.

Electricity Retailer

Some examples of data domains that might be relevant to an electricity retailer:

Customer data: This data domain might include data about the customers of the retailer, including demographic information, contact details, and account information.

Meter data: This data domain might include data about the electricity meters of the retailer’s customers, including data about the consumption of electricity, the billing cycle, and the management of meter readings.

Tariff data: This data domain might include data about the tariffs and pricing structures offered by the retailer, including data about different rate plans, discounts, and incentives.

Billing data: This data domain might include data about the billing process of the retailer, including data about the calculation of bills, the payment process, and the management of any billing disputes.

Financial data: This data domain might include data about the financial performance of the retailer, including data about revenue, expenses, and profitability.

Airline

Some examples of data domains that might be relevant to an airline:

Customer data: This data domain might include data about the customers of the airline, including demographic information, contact details, and travel history.

Flight data: This data domain might include data about the flights operated by the airline, including data about the routes, the schedules, and the capacity of the flights.

Reservations data: This data domain might include data about the reservations process of the airline, including data about the availability of flights, the booking process, and the management of changes and cancellations.

Loyalty program data: This data domain might include data about the loyalty program of the airline, including data about the rewards earned by customers, the redemption of rewards, and the management of customer accounts.

Financial data: This data domain might include data about the financial performance of the airline, including data about revenue, expenses, and profitability.

Charity

Some examples of data domains that might be relevant to a charity:

Donor data: This data domain might include data about the donors of the charity, including demographic information, contact details, and donation history.

Fundraising data: This data domain might include data about the fundraising efforts of the charity, including data about campaigns, events, and online donations.

Beneficiary data: This data domain might include data about the individuals or groups that benefit from the work of the charity, including demographic information and data about the specific services or assistance provided.

Volunteer data: This data domain might include data about the volunteers of the charity, including demographic information, contact details, and data about the specific tasks and roles that volunteers perform.

Financial data: This data domain might include data about the financial performance of the charity, including data about revenue, expenses, and profitability.

University

Some examples of data domains that might be relevant for a university include:

Student data: This data domain might include data on the university’s students, such as enrollment data, demographic data, and academic performance data.

Faculty data: This data domain might include data on the university’s faculty, such as data on faculty credentials, research interests, and teaching performance.

Financial data: This data domain might include data on the university’s revenue, expenses, and financial performance over time.

Curriculum data: This data domain might include data on the university’s academic programs and courses, such as data on enrollment, graduation rates, and student satisfaction.

Research data: This data domain might include data on the university’s research activities, such as data on grants, publications, and research collaborations.

Examples of Data Domains based on Business Units

Data domains can be defined based on business units within an organisation. Business units are typically defined as distinct areas or departments within an organisation that have their own specific goals and objectives. Data domains can be defined based on these business units in order to better align the data with the needs and goals of each unit.

For example, an organisation might have separate business units for sales, marketing, human resources and finance.

Human Resources

Some examples of sub data domains that might be relevant within the HR department of an organisation:

Employee data: This data domain might include data about the employees of the organisation, including demographic information, contact details, and employment history.

Compensation and benefits data: This data domain might include data about the compensation and benefits provided to employees, including data about salaries, bonuses, and benefits such as health insurance and retirement plans.

Performance data: This data domain might include data about the performance of employees, including data about goals and objectives, performance evaluations, and training and development.

Recruitment and hiring data: This data domain might include data about the recruitment and hiring process of the organisation, including data about job postings, resumes, and interviews.

Compliance data: This data domain might include data about the compliance of the HR department with relevant laws and regulations, including data about equal opportunity and diversity, as well as data about data privacy and security.

Finance

Some examples of sub data domains that might be relevant within the finance department of an organisation:

Financial statements data: This data domain might include data about the financial statements of the organisation, including data about the balance sheet, the income statement, and the statement of cash flows.

Budget data: This data domain might include data about the budgeting process of the organisation, including data about the allocation of resources, the forecasting of revenues and expenses, and the tracking of actual performance against the budget.

Invoice data: This data domain might include data about the invoicing process of the organisation, including data about the issuance of invoices, the payment of invoices, and the management of any disputes or issues.

Accounts payable data: This data domain might include data about the accounts payable process of the organisation, including data about the payment of bills, the management of vendor relationships, and the tracking of expenses.

Accounts receivable data: This data domain might include data about the accounts receivable process of the organisation, including data about the billing of customers, the collection of payments, and the management of any disputes or issues.

Examples of Data Domains based on System of Capture

Data domains can be defined based on the Systems of Capture used by an organisation. 

For example, Customer Relationship Management (CRM) or Inventory Management.

Customer Relationship Management (CRM)

An example of data domains that might be relevant for a System of Capture focused on Customer Relationship Management (CRM):

Contact information: This may include basic information such as name, email address, and phone number, as well as more detailed information such as job title, company name, and mailing address.

Interactions: This may include records of phone calls, emails, meetings, and other interactions with customers and potential customers.

Opportunities: This may include information about potential sales opportunities, including the products or services being considered, the budget, and the expected close date.

Leads: This may include information about potential customers who have expressed interest in the company’s products or services, but have not yet made a purchase.

Accounts: This may include information about customer accounts, including account history, current status, and any outstanding issues or concerns.

Marketing campaigns: This may include information about marketing campaigns that have been run or are currently in progress, including the target audience, the channels used, and the results achieved.

Inventory Management

An example of data domains that might be relevant for a System of Capture focused on Inventory Management:

Product information: This may include detailed information about each product, such as the product name, SKU, description, and images.

Stock levels: This may include the current quantity of each product in stock, as well as the minimum and maximum stock levels that should be maintained.

Orders: This may include information about orders that have been placed, including the products ordered, the quantities, and the expected delivery date.

Suppliers: This may include information about the suppliers of each product, including contact information, delivery times, and pricing.

Purchasing: This may include information about the purchasing process, including the approval process, budgeting, and vendor selection.

Warehousing: This may include information about the storage and movement of products within the warehouse, including location, shelf life, and expiration dates.

Examples of Data Domains based on Use Case

Data domains can be defined based on use case across an organisation. 

For example, life time value or data marketplaces.

Life Time Value

An example of data domains that might be relevant for a use case focused on Customer Life Time Value (LTV):

Customer data: This data domain might include data on customer demographics, preferences, and behavior.

Sales data: This data domain might include data on customer purchases, including data on the products or services that customers have purchased, as well as data on the timing and value of those purchases.

Marketing data: This data domain might include data on marketing campaigns and customer engagement, such as data on email campaigns, social media engagement, and website traffic.

Customer service data: This data domain might include data on customer service interactions, including data on customer complaints and support requests.

Data Marketplaces

An example of data domains for a data marketplace which is a platform that allows organisations to buy and sell data might include:

Data provider data: This data domain might include data on the data providers that are participating in the marketplace, including information on their business, data offerings, and pricing.

Data consumer data: This data domain might include data on the data consumers that are participating in the marketplace, including information on their business, data needs, and budget.

Data offerings data: This data domain might include data on the specific data offerings that are available in the marketplace, including information on the data types, sources, and formats of the data.

Data transactions data: This data domain might include data on the data transactions that have taken place in the marketplace, including information on the data that was purchased, the price paid, and the data provider and consumer involved in the transaction.

Domain Hierarchies

Data domain hierarchies are a useful pattern for organising data domains so that they are easier to understand and manage. 

We can compare it to how you might organise different flavors of ice cream in your freezer.  For example, you might have an ice cream domain for “Ice Cream Flavors” at the top level. Under this domain, you would have sub-domains for each flavor of ice cream, such as “Chocolate,” “Vanilla,” “Strawberry,” and so on.

Within each of these sub-domains, you might have even more detailed domains about each flavor. For example, under the “Chocolate” domain, you might have domains based on ingredients, “Chocolate – Dairy”, “Chocolate – Soy” and “Chocolate – Coconut”.

The same goes for your data domains.  

Let’s consider a sales process example. The highest level of the data domain hierarchy might represent the overall sales process, with lower levels representing the specific steps within the process (e.g. lead generation, qualification, proposal, close). Each step of the process might have even lower levels of data, such as the specific actions or tasks that need to be completed within each step.

By organising data domains in this way, it becomes easier to find and understand specific pieces of data, as well as to see how different data domains fit together and relate to one another.

Data Domains vs Subject Areas

Subject areas is another term commonly used to define data domain boundaries.

Subject area is a term which has often been used in data modeling and data warehousing to organise and structure data in a logical and meaningful way.

A number of data experts have previously referenced “subject areas” to define a boundary of data:

    • Ralph Kimball, in his book “The Data Warehouse Lifecycle Toolkit,” published in 1998

    • Bill Inmon, in his book “Building the Data Warehouse,” published in 1996

    • Len Silverston, in his book “The Data Model Resource Book,” published in 2001

The use of the term subject area in these books is the same as the use of the term data domain, as it describes a specific area of business or domain knowledge and is defined by a group of related data entities.

Data Domains vs Domain Driven Development (DDD)

Domain-driven design (DDD) is a software development pattern that focuses on creating a clear and consistent model of the domain (i.e., the area of expertise) that a software system is intended to support. It involves dividing the domain into smaller, more manageable chunks called “bounded contexts,” which can each be worked on independently, and emphasise the importance of understanding and communicating with domain experts in order to ensure that the system accurately reflects the needs and requirements of the domain.

Domain-driven design (DDD) was developed by Eric Evans, a software developer and consultant. In his book “Domain-Driven Design: Tackling Complexity in the Heart of Software,” published in 2003, Evans introduced the concept of DDD and outlined a set of principles and practices for applying DDD to software development.

Since its introduction, DDD has become a widely recognised and influential software development approach, and has been adopted by many organisations and developers around the world. It is often used in complex, mission-critical software systems where a deep understanding of the domain is essential for building a successful and effective solution.

Here is an example of how domain-driven architecture (DDD) might be applied to an ecommerce system:

Imagine that you are building an ecommerce platform to allow customers to purchase products online. In this case, the domain is the world of online retail, and the system needs to support the needs of both customers and merchants.

Using DDD, you might begin by identifying the different bounded contexts within the domain of online retail. For example, you might identify a bounded context for products, a bounded context for orders and payments, and a bounded context for customer accounts.

Next, you would work with domain experts, such as experienced ecommerce professionals, to understand the needs and requirements of each bounded context. You would then design and build the system around the concepts and language of the domain, using the insights and knowledge gained from working with the domain experts.

For example, in the products bounded context, you might design the system to allow merchants to easily add and manage their products, including descriptions, images, and pricing information. In the orders and payments bounded context, you might design the system to support the process of placing orders, tracking orders, and processing payments. And in the customer accounts bounded context, you might design the system to allow customers to create accounts, manage their personal information, and view their order history.

By applying DDD, you can create an ecommerce platform that is closely aligned with the needs and language of the online retail domain, which can help to improve the usability and effectiveness of the system for both customers and merchants.

Just like Subject Areas, the Data Domain pattern is closely related to the Domain Driven Design pattern.  Data Domains involve dividing the data into smaller, more manageable chunks and provide a specific “bounded context”.