Documentation Index
Fetch the complete documentation index at: https://honeydew.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Honeydew entities can model many different data structures. However, it is very common in relational modeling to separate entities to facts and dimensions:- Dimensions provide descriptive information for measurements. Dimensions typically represent business entities (such as a user, a product or a place), and include many properties for each instance. For example, a customer dimension will include per-customer information such as age or geography. In a star or snowflake schemas, dimensions are typically the leafs.
- Facts typically represent measurable events (i.e. transactions, website visits, etc). They typically include a time field and reference related dimensions (i.e. the transacting or visiting user) via foreign keys. In a star schema, the fact is the center of a star.
user entity might appear in different facts describing things the user has done:
loginsfact table describing application login eventsvisitsfact table describing website visits events
Building Shared Dimensions
Honeydew automatically detects shared dimensions when more than one fact has a relationship to them. There is no configuration required. Honeydew will create optimized queries that conform data to the shared dimension.Multiple Paths to Shared Dimensions
In more complex schemas where fact tables refer to other fact tables, more than one relationship path can be defined. For example, can have a schema with two facts and a shared dimension:- A
customershared dimension details information per customer. - A
ordersfact entity has per-order data, and has a relation to the ordering customer. - A
lineitemfact entity has the line items for each order and has a direct relationship to the ordering customer.
lineitem and customer:
- Join directly
lineitemandcustomeronlineitem.customer_id = customer.id - Join
lineitemtoordersonorder_idand them join orders to customers onorders.customer_id = customer.id
Metrics that combine multiple facts
A common use case with shared dimensions is querying one fact based on information from another fact with a connected shared dimension. For example, would want to build a metric that counts the visits of all users that had a login last month. Since there is no direct connection between visits and logins to join on, need to go through the shared user in theusers entity
that takes data from the logins fact, so it can be used later on when joining users with visits.
Connect metrics through a shared dimension
Given the following schema:- Two facts:
loginsandvisits - Connected through a single shared dimension:
users
Break to steps with a calculated attribute in the shared dimension
Use a calculated attribute when the connection between facts goes through a single shared dimension, and there is no dynamic calculation. In that case, can built a simpler single fact metric that uses that calculated attribute. To implement the question above with a calculated attribute approach:-
Build a calculated attribute in the
usersshared entity namedlogins_last_monthdefined asThis counts per user the logins (assuming logins has acountmetric) in the last month (assuming atimeattribute), at the granularity of a user (since the calculated attribute is part of the user entity). See also how metrics behave in calculated attributes. -
Build a metric in
visitscalledcount_visits_of_recently_logged_in_usersdefined asThis is a reusable metric based on data from two facts, counting first at the login fact granularity (logins last month) and then at the visit fact granularity (visits of users that logged in last month).