Relationship
Overview
Relationship
is a key concept in Wren Engine. It represents crucial associations between two models, illustrating a system's intricate connections and interactions. In Modeling Definition Language(MDL), you can construct the connections between models with their relationship. It has some benefits:
- Wren Engine allows user access a model from another model through the relationship in MDL.
- It’s a nice context for the LLM to know the topology of your dataset. It can associate them more accurately.
Define Relationship
To describe a relationship, Wren Engine has some attributes used to describe a relationship. Typically, we can describe a relationship as a JSON object:
{
"name" : "CustomerOrders",
"models" : [ "Customer", "Orders" ],
"joinType" : "ONE_TO_MANY",
"condition" : "Customer.custkey = Orders.custkey"
}
A relationship is made up of:
name
: The name of the relationship.models
: The models associated in this relationship. Wren Engine only associates 2 model in a relationship.joinType
: The type of a relationship. Typically, we have 4 kind of the relationship between 2 models:- ONE_TO_ONE (1-1)
- ONE_TO_MANY (1-M)
- MANY_TO_ONE (M-1)
- MANY_TO_MANY (M-M)
1-1
,1-M
andM-1
have meaning for Wren Engine. Let’s discuss it later.
condition
: The join condition between the two models. The Wren Engine serves as the join condition during SQL generation.
The join type of Relationship
Every relationship is bi-directional. The direction could be TO_ONE
or TO_MANY
. Here are some definitions for them:
TO_ONE
relationship: It means that the left table record only has a corresponding record in the right table according to the join condition.TO_MANY
relationship: It means that the left table record could have multiple records in the right table according to the join condition.
With these compositions, we can have four types to describe the relationship between two models:ONE_TO_ONE
, ONE_TO_MANY
, MANY_TO_ONE
, and MANY_TO_MANY
.
How to Use Relationship
In traditional databases, if a user wants to associate two tables to obtain some results, they must construct the join condition between tables. With the defined relationship, Wren Engine allows users to access a model from another model. It’s very useful for building common metrics or accessing related attributes from another model within a model.
Let’s talk about the special columns: Relationship Field
and Calculated Field
Relationship Field
Assume we have define CustomerOrders
as above. Given a sample definition of two models with the relationship:
{
"name": "Customer",
"refSql": "select * from tpch.customer",
"columns": [
{
"name": "custkey",
"type": "integer",
"expression": "c_orderkey"
},
{
"name": "name",
"type": "varchar",
"expression": "c_name"
}
],
"primaryKey": "custkey"
},
{
"name": "Orders",
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"type": "integer",
"expression": "o_orderkey"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
},
{
"name": "totalprice",
"type": "integer",
"expression": "o_totalprice"
},
{
"name": "customer",
"type": "Customer",
"relationship": "CustomerOrders"
}
],
"primaryKey": "orderkey"
}
A relationship field is a column, and the required attributes are basically the same as a Column. However, there are two differences between them:
type
: The type is the name of an existing model. It specifies which reference will be returned to this column.relationship
: It specifies which relationship this relationship field will be based on to construct the join condition.
Let’s check the sample definition. The model Orders
has a column called customer
which will return the reference of the model Customer
according to the relationship CustomerOrders
.
How is the returned reference of a model used? Let’s continue to learn about calculated fields.
Calculated Field
As mentioned above, there are two types of relationship directions: TO_ONE
or TO_MANY
. They have different behaviors when accessing data. Let’s take a look at the simpler one, TO_ONE
, first. See the sample definition below:
{
"name": "Orders",
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"type": "integer",
"expression": "o_orderkey"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
},
{
"name": "orderdate",
"type": "date",
"expression": "o_orderdate",
},
{
"name": "totalprice",
"type": "integer",
"expression": "o_totalprice"
},
{
"name": "customer",
"type": "Customer",
"relationship": "CustomerOrders"
},
{
"name": "order_year",
"type": "date",
"isCalculated": true,
"expression": "date_trunc('year', orderdate)"
},
{
"name": "customer_name",
"type": "varhcar",
"isCalculated": true,
"expression": "customer.name"
}
],
"primaryKey": "orderkey"
}
The common definition of the calculated field
The required attribute of a calculated field is same as a normal column. The only difference is isCalculated
. We should mark the field is calculated by set this attribute as true. The main difference between normal and calculated field is their scope of expression.
The scope of a normal column is the columns of the data source table. However, the scope of a calculated field is the normal columns of the model. You can check the column order_year
to know this concept. We won't delve into this issue here. Let’s focus on the relationship first.
Use TO_ONE Relationship
As mentioned above, we can access the columns of this model in the expression of the calculated field. The relationship field also belongs to them. See the column customer_name
. Its expression is customer.name
, which means getting the name
from the reference customer
. How can we access customer
? Wren Engine will obtain this information from the relationship, CustomerOrders
.
Once defined, we can query the model Orders
like:
SELECT customer_name FROM Orders
Use TO_MANY Relationship
Every Relationship can be used from any direction. In the previous part, we has tried to access data from Orders
to Customer
. According to OrdersCustomer
, the Orders-to-Customer is a MANY_TO_ONE
relationship. Every record in Orders can map to only one record in Customer. How about use it reversely? Let’s see an example:
{
"name": "Customer",
"refSql": "select * from tpch.customer",
"columns": [
{
"name": "custkey",
"type": "integer",
"expression": "c_orderkey"
},
{
"name": "name",
"type": "varchar",
"expression": "c_name"
},
{
"name": "orders",
"type": "Orders",
"relationship": "CustomerOrders"
},
{
"name": "consumption",
"type": "integer",
"isCalculated": true,
"expression": "sum(orders.totalprice)"
}
],
"primaryKey": "custkey"
},
In this case, we have defined a calculated field called consumption
, which is used to sum all the related totalprice
values in Orders. You may notice that the expression is an aggregation expression. It’s a limitation of using the TO_MANY
relationship. Since the result of a TO_MANY
mapping will increase the amount of data, the Wren Engine only allows the TO_MANY
relationship to be used with aggregation.
Use TO_MANY without Aggregation
To handle the aggregation for TO_MANY result, Wren Engine will add the primary key of the left model to be the dimension of grouping. In this case, the left model is Customer
. The column custkey
will be the dimension of sum(order_item.totalprice)
.
Once the defined, we can query Customer
like:
SELECT name, consumption FROM Customer
About MANY_TO_MANY relationship
Wren Engine strives to preserve the original characteristics of every model. The number of rows won’t be changed, and the filter condition won’t be altered. To achieve this purpose, we only allow the TO_MANY
relationship to be used with aggregation and grouped by the primary key.
However, we have no way to handle MANY_TO_MANY
. When a MANY_TO_MANY
relationship is used, the row count of its owner model will increase. This is why Wren Engine does not support MANY_TO_MANY
.
How Relationship Works
The model is implemented by CTE. Wren Engine will made up the calculated field and required JOIN condition in the CTE when a MDL is deployed. Let’s see the generated SQL of selecting customer_name
in Orders
wrenai=> select orderkey, customer_name from Orders;
WITH
"Customer" AS (
SELECT
"Customer"."custkey" "custkey"
, "Customer"."name" "name"
FROM
(
SELECT
c_custkey "custkey"
, c_name "name"
FROM
"memory"."tpch"."customer" "Customer"
) "Customer"
)
, "Orders" AS (
SELECT
"Orders"."orderkey" "orderkey"
, "Orders"."custkey" "custkey"
, "Orders"."totalprice" "totalprice"
, "Orders"."orderdate" "orderdate"
, "Orders_relationsub"."customer_name" "customer_name"
FROM
((
SELECT
o_orderkey "orderkey"
, o_custkey "custkey"
, o_totalprice "totalprice"
, o_orderdate "orderdate"
FROM
"memory"."tpch"."orders" "Orders"
) "Orders"
LEFT JOIN (
SELECT
"Orders"."orderkey"
, "Customer"."name" "customer_name"
FROM
((
SELECT
o_orderkey "orderkey"
, o_custkey "custkey"
, o_totalprice "totalprice"
, o_orderdate "orderdate"
FROM
"memory"."tpch"."orders" "Orders"
) "Orders"
LEFT JOIN "Customer" ON ("Orders"."custkey" = "Customer"."custkey"))
) "Orders_relationsub" ON ("Orders"."orderkey" = "Orders_relationsub"."orderkey"))
)
SELECT
orderkey
, customer_name
FROM
Orders
In the Orders
CTE, we can find that it contains a JOIN to access the column name
from Customer
. The user can use customer_name
like a normal column.
How about TO_MANY
? Let’s see the generated SQL of consumption
in Customer
.
WITH
"Orders" AS (
SELECT
"Orders"."orderkey" "orderkey"
, "Orders"."custkey" "custkey"
, "Orders"."orderstatus" "orderstatus"
, "Orders"."totalprice" "totalprice"
, "Orders"."orderdate" "orderdate"
FROM
(
SELECT
o_orderkey "orderkey"
, o_custkey "custkey"
, o_orderstatus "orderstatus"
, o_totalprice "totalprice"
, o_orderdate "orderdate"
FROM
"memory"."tpch"."orders" "Orders"
) "Orders"
)
, "Customer" AS (
SELECT
"Customer"."custkey" "custkey"
, "Customer"."name" "name"
, "consumption"."consumption" "consumption"
FROM
((
SELECT
c_custkey "custkey"
, c_name "name"
FROM
"memory"."tpch"."customer" "Customer"
) "Customer"
LEFT JOIN (
SELECT
"Customer"."custkey"
, sum("Orders"."totalprice") "consumption"
FROM
((
SELECT
c_custkey "custkey"
, c_name "name"
FROM
"memory"."tpch"."customer" "Customer"
) "Customer"
LEFT JOIN "Orders" ON ("Orders"."custkey" = "Customer"."custkey"))
GROUP BY 1
) "consumption" ON ("Customer"."custkey" = "consumption"."custkey"))
)
SELECT
name
, consumption
FROM
Customer
It’s easy to find the JOIN part in the Customer
CTE is different from the part of customer_name
in Orders
. It’s an aggregation query which sums the totalprice
in Orders
and groups by the primary key of Customer
, custkey
.
The implementation is simple; however, it has a limitation: Cycling Relationship.
As we discussed, both the model implementation and the relationship are managed by CTE. The Wren Engine dynamically generates the SQL. However, CTE requires a sequential ordering; it can only access CTEs that appear before it.
Think about this case:
SELECT o.customer_name, c.consumption
FROM Customer c JOIN Orders o ON o.custkey = c.custkey
It uses both customer_name
and consumption
in SQL. How does the Wren Engine sort CTEs? Understanding this will allow us to determine that the topology of the CTEs is a Directed Acyclic Graph (DAG). If you receive the error message regarding cycling relationships, check whether the selected items create a dependency cycle or not.