Slowly Changing Dimensions

Introduction

Slowly Changing Dimensions (SCDs) are a common data modeling technique used to manage historical changes in dimension data over time. This enables more accurate time-based analysis and reporting, such as understanding how KPIs were affected under previous attribute values. SCDs are categorized into different types based on how they handle changes to dimension data:

SCD Type 0 - Fixed Dimensions

No changes allowed. The data remains as it was when first inserted.
Useful when historical accuracy is critical and the value should never change.
Example: A product’s original launch date.

SCD Type 1 - Overwrite

Changes overwrite existing data. No history is preserved.
Simple to implement but loses historical context.
Example: If a customer changes their email, the old one is replaced.

SCD Type 2 - Historical Tracking

Each change creates a new record, often with start/end timestamps or versioning.
Preserves full history of changes.
Example: Tracking changes to a customer’s loyalty tier over time.

SCD Type 3 – Previous Value

Stores only the previous value alongside the current one.
Limited history, useful when only one change needs to be tracked.
Example: Keeping a “current region” and a “previous region” field for a customer.

SCD0 and SCD3 are rarely used.

Modeling SCDs in Honeydew

In Honeydew, the modeling of SCDs of Types 0, 1, and 3 is straightforward. Joins between entities are defined using the standard foreign key relationships. For SCD Type 2, Honeydew supports modeling SCDs using a combination of foreign keys and date ranges. Joins between entities are defined using a custom SQL expression that includes the date range logic.

Example: Fact and a Slowly Changing Dimension (SCD2)

Given two tables:

fact_sales: a fact table tracking order transactions that has a foreign key customer_id
dim_customer: a dimension table tracking customer information over time (SCD Type 2). It has multiple entries per customer_id with validity ranges.

Would want to connect the sales data with the correct historical customer information as it was at the time of the order.

Sample data

fact_sales (Fact Table)

order_id	customer_id	order_date	amount
5001	101	2021-06-10	250
5002	101	2023-01-12	300
5003	102	2022-07-22	450

dim_customer (SCD2 Dimension Table)

customer_id	customer_sk	name	region	valid_from	valid_to
101	1	Alice Smith	East	2021-01-01	2022-03-01
101	2	Alice Smith	West	2022-03-01	9999-12-31
102	3	Bob Johnson	North	2021-05-15	9999-12-31

The key for dim_customer is not customer_id (which is repeating across ranges), but rather a surrogate key (customer_sk) valid_from / valid_to define the row’s effective period.Also note that valid_to here is an infinity date (9999-12-31). In some settings it is used as NULL instead, in which case can adjust the join condition accordingly.

Relations

To associate each order with the correct customer version at that point in time, use a custom SQL expression on valid_from and valid_to:

fact_sales.customer_id = dim_customer.customer_id
AND fact_sales.order_date >= dim_customer.valid_from
AND fact_sales.order_date < dim_customer.valid_to

And set a “many-to-one” relationship from fact_sales to dim_customer

Example query

Result of a query on both:

order_id	customer_id	order_date	amount	name	region
5001	101	2021-06-10	250	Alice Smith	East
5002	101	2023-01-12	300	Alice Smith	West
5003	102	2022-07-22	450	Bob Johnson	North

Advanced: Multiple SCD2 (Fact and Dimension) + Point-in-Time Reference point

Advanced use cases for slowly changing dimensions allow to inspect the state of the world at any point in time (including “now”), while every data table has slowly changing dimension fields. Here, the previous example is extended to support of consistent point-in-time queries on historical data where:

fact_sales: a fact table with changing business logic over time (e.g. updated amount, revised status). It has multiple versions per order_id, each valid over a time range.
dim_customer: a dimension table with customer history over time (e.g. changed region), also with validity ranges.

A central dim_date or dim_point_in_time table is used to filter everything as of a specific point.

Since data is duplicated in multiple versions, users must filter on dim_point_in_time to get correct results (whether for “today” or for any historical point of reference).See Conditional Filtering on how to set an automatic filter in a domain, and an example below.You can also ensure a filter is always applied by configuring it directly within user-facing tools, such as BI dashboards.

This structure is used in auditable data models, financial snapshots, and analytics platforms.

Sample data

fact_sales sample data:

order_sk	order_id	customer_id	order_date	amount	status	valid_from	valid_to
9001	5001	101	2021-06-10	250	Pending	2021-06-10	2021-07-01
9002	5001	101	2021-06-10	300	Shipped	2021-07-01	9999-12-31
9003	5002	102	2022-01-15	300	Pending	2022-01-15	9999-12-31

dim_customer sample data:

customer_sk	customer_id	name	region	valid_from	valid_to
1	101	Alice Smith	East	2020-01-01	2022-03-01
2	101	Alice Smith	West	2022-03-01	9999-12-31
3	102	Bob Johnson	North	2021-05-01	9999-12-31

dim_point_in_time: A joint reference point for all data

snapshot_date
2021-06-15
2022-01-01
2023-03-31

This is used to filter time centrally, so other joins respect that single reference point. This table can cover all possible dates.

Same approach can be extended for any type of data versioning - not only for point in time.

Relations

Fact to customers To associate each order with the corresponding version of the customer that was valid at the time the order version was valid, use the following relation:

Join on customer key and validity ranges
Direction: Many to one (from dim_customer to point in time)
Cross-filtering is as needed (one-to-many or bi-directional)

Relation:

fact_sales.customer_id = dim_customer.customer_id
AND fact_sales.valid_from >= dim_customer.valid_from
AND fact_sales.valid_from <  dim_customer.valid_to

The result of that relation is that customers are resolved to the right appropriate customer to the time of the order, while keeping multiple versions of the order.

If a customer has multiple versions within the validity time of an order, it will not be resolved (i.e. will be resolved to NULL).

Entities to point in time reference

fact_sales to dim_point_in_time:
Many to one (from fact_sales to point in time)
Cross-filtering is one-to-many (dim_point_in_time can filter the fact, but not vice versa)

Relation:

dim_point_in_time.snapshot_date >= fact_sales.valid_from
dim_point_in_time.snapshot_date <  fact_sales.valid_to

dim_customer to dim_point_in_time:
Many to one (from dim_customer to point in time)
Cross-filtering is one-to-many (dim_point_in_time can filter dim_customer, but not vice versa)

Relation:

dim_point_in_time.snapshot_date >= dim_customer.valid_from
dim_point_in_time.snapshot_date <  dim_customer.valid_to

The dim_point_in_time is a shared dimension that can filter all associated entities.Using cross-filtering one-to-many ensures that it will filter the entities, but will not be filtered by them.

Ensuring Filtering for a Point in Time

When using SCD with multiple versions, data is duplicated for each snapshot. The semantic modeler must ensure that only one snapshot is selected to prevent double-counting.

Automatic filtering can be done at the dashboard or BI report level. However, a semantic layer allows to enforce automatic filtering across all tools using the same semantics.

To ensure consistency at the semantic layer:

Create a copy of dim_point_in_time called dim_point_in_time_choice. That would be used by the user to choose snapshots.
Create a domain that enforces a snapshot choice:

type: domain
name: all

entities:
  # All entities accessible by the user
  - fact_sales
  - dim_customer
  # An entity to choose the point in time snapshot
  - dim_point_in_time_choice
  # Note dim_point_in_time is *not* included in the domain

# Ensure a single snapshot from dim_point_in_time_choice is always chosen
# By default choose the NULL snapshot, returning no data
source_filters:
  - dim_point_in_time.snapshot = GET_FIELD_SELECTION(dim_point_in_time_choice.snapshot)

The dim_point_in_time_choice entity can be used in a BI tool as a filter so a user can pick a snapshot from it.When the BI tool sends the filtered value it is applied to dim_point_in_time by the domain source filter.If the user did not filter for a snapshot, then no data would be returned, thanks to the source filter.The reason dim_point_in_time_choice is created as a copy is to preserve all possible values unfiltered and to allow BI tools to list the possible values. If a BI tool would try listing the values of dim_point_in_time, it would only receive that single chosen snapshot.

Example query

Status of all orders given reference point of 2021-06-15

order_id	status	amount	name	region
5001	Pending	250	Alice Smith	East

Only one valid version per order_id and customer_id is active per point-in-time

Any rows not yet valid are excluded (e.g. 5002 is not visible on 2021-06-15)

Status of all orders given reference point of 2022-05-01

order_id	status	customer_id	order_date	amount	name	region
5001	Shipped	101	2021-06-10	300	Alice Smith	East
5002	Pending	102	2022-01-15	300	Bob Johnson	North

Use dim_point_in_time to anchor the reference date
Join facts and dimensions based on SCD2 validity ranges
Works seamlessly for time travel, reproducible snapshots, or data backfills

Quick Start

Migration Guides

Reference

Integrations

Security

Release Notes

Introduction

SCD Type 0 - Fixed Dimensions

SCD Type 1 - Overwrite

SCD Type 2 - Historical Tracking

SCD Type 3 – Previous Value

Modeling SCDs in Honeydew

Example: Fact and a Slowly Changing Dimension (SCD2)

Sample data

Relations

Example query

Advanced: Multiple SCD2 (Fact and Dimension) + Point-in-Time Reference point

Sample data

Relations

Ensuring Filtering for a Point in Time

Example query

Quick Start

Migration Guides

Reference

Integrations

Security

Release Notes

​Introduction

​SCD Type 0 - Fixed Dimensions

​SCD Type 1 - Overwrite

​SCD Type 2 - Historical Tracking

​SCD Type 3 – Previous Value

​Modeling SCDs in Honeydew

​Example: Fact and a Slowly Changing Dimension (SCD2)

​Sample data

​Relations

​Example query

​Advanced: Multiple SCD2 (Fact and Dimension) + Point-in-Time Reference point

​Sample data

​Relations

​Ensuring Filtering for a Point in Time

​Example query

Introduction

SCD Type 0 - Fixed Dimensions

SCD Type 1 - Overwrite

SCD Type 2 - Historical Tracking

SCD Type 3 – Previous Value

Modeling SCDs in Honeydew

Example: Fact and a Slowly Changing Dimension (SCD2)

Sample data

Relations

Example query

Advanced: Multiple SCD2 (Fact and Dimension) + Point-in-Time Reference point

Sample data

Relations

Ensuring Filtering for a Point in Time

Example query