Attribute Splitting Pattern in Data Warehousing
TD;LR
One powerful concept in physical data modeling is the Attribute Splitting Pattern.
This pattern is particularly useful for handling rapidly changing attributes within your data.
In this article we’ll delve into what the Attribute Splitting Pattern is, why it is useful, and how you can implement it in both dimensional and data vault models.
Chat AgileData LLM
Concept = Entity
Detail = Attribute
What is the Attribute Splitting Pattern?
The Attribute Splitting Pattern involves dividing the details of an concept into different tables or structures based on their volatility. Details that change frequently are separated from more stable details, allowing for more efficient management and tracking of changes. This separation can help manage and track changes more efficiently and improves the performance and maintainability of a data platform.
Why Use the Attribute Splitting Pattern?
In any data environment, some details of concepts change often. For example, a customer concept might include details like customer status, loyalty tier, or marketing preferences, which can change frequently. If these details were kept in the main table, every change would require a new record, causing the table to grow rapidly and potentially impacting performance.
By moving these volatile details to separate tables, you can:
1. Reduce the Size of the Core Concept Table:
Since only the stable details remain in the main table, its size remains manageable. This is dependent on the physical modeling pattern you use, i.e Dimensional vs Data Vault.
2. Simplify Tracking Changes:
Changes in the separate tables can be managed independently, making it easier to maintain historical accuracy and perform time-based analyses.
3. Improve Query Performance:
Smaller tables with fewer rows and columns may improve query performance and reduce the load on your database. This is dependent on the data storage and compute engine you use.
Attribute Splitting Use Cases
You can split details based on:
1. Rate of Change
Have seperate tables for slowly changing details vs fast changing details.
2. PII
Keep PII data in a separate table to non PII data.
3. Large Text
Keep fields with large text object in a separate table to smaller numerical and textual data.
4. Course Grained Security
Grant or deny access to data in specific tables.
Implementing the Attribute Splitting Pattern in 3NF Models
Implementing the Attribute Splitting Pattern in Dimensional Models
Implementing the Attribute Splitting Pattern in Dimensional Models
In dimensional modeling, the Attribute Splitting Pattern can be implemented by creating mini-dimensions.
Initial Data in Core Dimension Table
The core dimension table contains stable details, such as:
Initial Data in Mini-Dimension Tables
Volatile details are stored in mini-dimension tables. Here are some examples:
Handling Changes
When the marketing team updates John Doe’s marketing preference from “Email” to “SMS” on 2024-07-10, a new record is added to the CustomerPreferenceMiniDimension table, and the previous record’s EndDate is set.
Implementing the Attribute Splitting Pattern in Data Vault Models
Implementing the Attribute Splitting Pattern in Data Vault Models
In data vault modeling, the Attribute Splitting Pattern can be implemented using satellite tables.
Initial Data in Hub Table
The hub table contains the concept keys such as:
Implementing the Attribute Splitting Pattern in Data Vault Models
In data vault modeling, the Attribute Splitting Pattern can be implemented using satellite tables.
In a data vault model, satellite tables store the descriptive details and track changes over time for the concepts captured in the hub tables.
Initial Data in Satellite Tables
Volatile and Non-Volatile details can be stored in seperate satellite tables, or you can use other patterns for splitting them such as numeric feilds vs large character/text fields.
Below are examples of satellite tables for customer status, loyalty tier, and marketing preferences, using the CustomerKey as the business key.
Handling Changes
On 2024-07-10, the marketing team updates John Doe’s (CustomerKey CUST001) marketing preference from “Email” to “SMS”. Additionally, on the same day, John Doe’s loyalty tier is updated from “Gold” to “Platinum”.
(Assuming you are using an end dating pattern for change records)
CustomerPreferenceSatellite Table:
- The original record for John Doe’s marketing preference (Email) now has an EndDate of 2024-07-10 10:00:00.
- A new record is added for the updated marketing preference (SMS) with an EffectiveDate of 2024-07-10 10:00:00.
CustomerLoyaltySatellite Table:
- The original record for John Doe’s loyalty tier (Gold) now has an EndDate of 2024-07-10 10:00:00.
- A new record is added for the updated loyalty tier (Platinum) with an EffectiveDate of 2024-07-10 10:00:00.
These changes ensure that historical data is preserved, and the latest data is available for analysis. By maintaining the EffectiveDate and EndDate columns, you can track when changes occurred, which is crucial for accurate time-based analyses.
Conclusion
The Attribute Splitting Pattern is a powerful technique for managing rapidly changing details in a data platforms.
By splitting volatile details into separate tables, you may improve query performance, simplify maintenance, and keep your core tables manageable.
Implementing this pattern is straightforward and may have significant benefits in your data work.
As with all patterns, there are anti-patterns where it may make performance worse, depending on your physical data modeling patterns and technologies you are leveraging.
Embrace the Attribute Splitting Pattern to keep your data design agile and efficient!