Relationship

Overview

Relationship is a key concept in Wren Engine. It represents crucial associations between two models, illustrating a system's intricate connections and interactions. In Modeling Definition Language(MDL), you can construct the connections between models with their relationship. It has some benefits:

Wren Engine allows user access a model from another model through the relationship in MDL.
It’s a nice context for the LLM to know the topology of your dataset. It can associate them more accurately.

Define Relationship

To describe a relationship, Wren Engine has some attributes used to describe a relationship. Typically, we can describe a relationship as a JSON object:

{
    "name" : "CustomerOrders",
    "models" : [ "Customer", "Orders" ],
    "joinType" : "ONE_TO_MANY",
    "condition" : "Customer.custkey = Orders.custkey"
}

A relationship is made up of:

name: The name of the relationship.
models: The models associated in this relationship. Wren Engine only associates 2 model in a relationship.
- joinType: The type of a relationship. Typically, we have 4 kind of the relationship between 2 models:
  - ONE_TO_ONE (1-1)
  - ONE_TO_MANY (1-M)
  - MANY_TO_ONE (M-1)
  - MANY_TO_MANY (M-M)
  Only 1-1, 1-M and M-1 have meaning for Wren Engine. Let’s discuss it later.
condition: The join condition between the two models. The Wren Engine serves as the join condition during SQL generation.

The join type of Relationship

Every relationship is bi-directional. The direction could be TO_ONE or TO_MANY. Here are some definitions for them:

TO_ONE relationship: It means that the left table record only has a corresponding record in the right table according to the join condition.
TO_MANY relationship: It means that the left table record could have multiple records in the right table according to the join condition.

With these compositions, we can have four types to describe the relationship between two models:ONE_TO_ONE, ONE_TO_MANY, MANY_TO_ONE, and MANY_TO_MANY.

How to Use Relationship

In traditional databases, if a user wants to associate two tables to obtain some results, they must construct the join condition between tables. With the defined relationship, Wren Engine allows users to access a model from another model. It’s very useful for building common metrics or accessing related attributes from another model within a model.

Let’s talk about the special columns: Relationship Field and Calculated Field

Relationship Field

Assume we have define CustomerOrders as above. Given a sample definition of two models with the relationship:

{
  "name": "Customer",
  "refSql": "select * from tpch.customer",
  "columns": [
    {
      "name": "custkey",
      "type": "integer",
      "expression": "c_orderkey"
    },
    {
      "name": "name",
      "type": "varchar",
      "expression": "c_name"
    }
  ],
  "primaryKey": "custkey"
},
{
  "name": "Orders",
  "refSql": "select * from tpch.orders",
  "columns": [
    {
      "name": "orderkey",
      "type": "integer",
      "expression": "o_orderkey"
    },
    {
      "name": "custkey",
      "type": "integer",
      "expression": "o_custkey"
    },
    {
      "name": "totalprice",
      "type": "integer",
      "expression": "o_totalprice"
    },
    {
      "name": "customer",
      "type": "Customer",
      "relationship": "CustomerOrders"
    }
  ],
  "primaryKey": "orderkey"
}

A relationship field is a column, and the required attributes are basically the same as a Column. However, there are two differences between them:

type: The type is the name of an existing model. It specifies which reference will be returned to this column.
relationship: It specifies which relationship this relationship field will be based on to construct the join condition.

Let’s check the sample definition. The model Orders has a column called customer which will return the reference of the model Customer according to the relationship CustomerOrders.

How is the returned reference of a model used? Let’s continue to learn about calculated fields.

Calculated Field

As mentioned above, there are two types of relationship directions: TO_ONE or TO_MANY. They have different behaviors when accessing data. Let’s take a look at the simpler one, TO_ONE, first. See the sample definition below:

{
  "name": "Orders",
  "refSql": "select * from tpch.orders",
  "columns": [
    {
      "name": "orderkey",
      "type": "integer",
      "expression": "o_orderkey"
    },
    {
      "name": "custkey",
      "type": "integer",
      "expression": "o_custkey"
    },
    {
      "name": "orderdate",
      "type": "date",
      "expression": "o_orderdate",
    },
    {
      "name": "totalprice",
      "type": "integer",
      "expression": "o_totalprice"
    },
    {
      "name": "customer",
      "type": "Customer",
      "relationship": "CustomerOrders"
    },
    {
      "name": "order_year",
      "type": "date",
      "isCalculated": true,
      "expression": "date_trunc('year', orderdate)"
    },
    {
      "name": "customer_name",
      "type": "varhcar",
      "isCalculated": true,
      "expression": "customer.name"
    }
  ],
  "primaryKey": "orderkey"
}

The common definition of the calculated field

The required attribute of a calculated field is same as a normal column. The only difference is isCalculated. We should mark the field is calculated by set this attribute as true. The main difference between normal and calculated field is their scope of expression.

The scope of a normal column is the columns of the data source table. However, the scope of a calculated field is the normal columns of the model. You can check the column order_year to know this concept. We won't delve into this issue here. Let’s focus on the relationship first.

Use TO_ONE Relationship

As mentioned above, we can access the columns of this model in the expression of the calculated field. The relationship field also belongs to them. See the column customer_name. Its expression is customer.name, which means getting the name from the reference customer. How can we access customer? Wren Engine will obtain this information from the relationship, CustomerOrders.

Once defined, we can query the model Orders like:

SELECT customer_name FROM Orders

Use TO_MANY Relationship

Every Relationship can be used from any direction. In the previous part, we has tried to access data from Orders to Customer. According to OrdersCustomer , the Orders-to-Customer is a MANY_TO_ONE relationship. Every record in Orders can map to only one record in Customer. How about use it reversely? Let’s see an example:

{
  "name": "Customer",
  "refSql": "select * from tpch.customer",
  "columns": [
    {
      "name": "custkey",
      "type": "integer",
      "expression": "c_orderkey"
    },
    {
      "name": "name",
      "type": "varchar",
      "expression": "c_name"
    },
    {
      "name": "orders",
      "type": "Orders",
      "relationship": "CustomerOrders"
    },
    {
      "name": "consumption",
      "type": "integer",
      "isCalculated": true,
      "expression": "sum(orders.totalprice)"
    }
  ],
  "primaryKey": "custkey"
},

In this case, we have defined a calculated field called consumption, which is used to sum all the related totalprice values in Orders. You may notice that the expression is an aggregation expression. It’s a limitation of using the TO_MANY relationship. Since the result of a TO_MANY mapping will increase the amount of data, the Wren Engine only allows the TO_MANY relationship to be used with aggregation.

Use TO_MANY without Aggregation

To handle the aggregation for TO_MANY result, Wren Engine will add the primary key of the left model to be the dimension of grouping. In this case, the left model is Customer. The column custkey will be the dimension of sum(order_item.totalprice).

Once the defined, we can query Customer like:

SELECT name, consumption FROM Customer

About MANY_TO_MANY relationship

Wren Engine strives to preserve the original characteristics of every model. The number of rows won’t be changed, and the filter condition won’t be altered. To achieve this purpose, we only allow the TO_MANY relationship to be used with aggregation and grouped by the primary key.

However, we have no way to handle MANY_TO_MANY. When a MANY_TO_MANY relationship is used, the row count of its owner model will increase. This is why Wren Engine does not support MANY_TO_MANY.

How Relationship Works

The model is implemented by CTE. Wren Engine will made up the calculated field and required JOIN condition in the CTE when a MDL is deployed. Let’s see the generated SQL of selecting customer_name in Orders

wrenai=> select orderkey, customer_name from Orders;

WITH
  "Customer" AS (
   SELECT
     "Customer"."custkey" "custkey"
   , "Customer"."name" "name"
   FROM
     (
      SELECT
        c_custkey "custkey"
      , c_name "name"
      FROM
        "memory"."tpch"."customer" "Customer"
   )  "Customer"
) 
, "Orders" AS (
   SELECT
     "Orders"."orderkey" "orderkey"
   , "Orders"."custkey" "custkey"
   , "Orders"."totalprice" "totalprice"
   , "Orders"."orderdate" "orderdate"
   , "Orders_relationsub"."customer_name" "customer_name"
   FROM
     ((
      SELECT
        o_orderkey "orderkey"
      , o_custkey "custkey"
      , o_totalprice "totalprice"
      , o_orderdate "orderdate"
      FROM
        "memory"."tpch"."orders" "Orders"
   )  "Orders"
   LEFT JOIN (
      SELECT
        "Orders"."orderkey"
      , "Customer"."name" "customer_name"
      FROM
        ((
         SELECT
           o_orderkey "orderkey"
         , o_custkey "custkey"
         , o_totalprice "totalprice"
         , o_orderdate "orderdate"
         FROM
           "memory"."tpch"."orders" "Orders"
      )  "Orders"
      LEFT JOIN "Customer" ON ("Orders"."custkey" = "Customer"."custkey"))
   )  "Orders_relationsub" ON ("Orders"."orderkey" = "Orders_relationsub"."orderkey"))
) 
SELECT
  orderkey
, customer_name
FROM
  Orders

In the Orders CTE, we can find that it contains a JOIN to access the column name from Customer. The user can use customer_name like a normal column.

How about TO_MANY? Let’s see the generated SQL of consumption in Customer.

WITH
  "Orders" AS (
   SELECT
     "Orders"."orderkey" "orderkey"
   , "Orders"."custkey" "custkey"
   , "Orders"."orderstatus" "orderstatus"
   , "Orders"."totalprice" "totalprice"
   , "Orders"."orderdate" "orderdate"
   FROM
     (
      SELECT
        o_orderkey "orderkey"
      , o_custkey "custkey"
      , o_orderstatus "orderstatus"
      , o_totalprice "totalprice"
      , o_orderdate "orderdate"
      FROM
        "memory"."tpch"."orders" "Orders"
   )  "Orders"
) 
, "Customer" AS (
   SELECT
     "Customer"."custkey" "custkey"
   , "Customer"."name" "name"
   , "consumption"."consumption" "consumption"
   FROM
     ((
      SELECT
        c_custkey "custkey"
      , c_name "name"
      FROM
        "memory"."tpch"."customer" "Customer"
   )  "Customer"
   LEFT JOIN (
      SELECT
        "Customer"."custkey"
      , sum("Orders"."totalprice") "consumption"
      FROM
        ((
         SELECT
           c_custkey "custkey"
         , c_name "name"
         FROM
           "memory"."tpch"."customer" "Customer"
      )  "Customer"
      LEFT JOIN "Orders" ON ("Orders"."custkey" = "Customer"."custkey"))
      GROUP BY 1
   )  "consumption" ON ("Customer"."custkey" = "consumption"."custkey"))
) 
SELECT
  name
, consumption
FROM
  Customer

It’s easy to find the JOIN part in the Customer CTE is different from the part of customer_name in Orders. It’s an aggregation query which sums the totalprice in Orders and groups by the primary key of Customer, custkey.

The implementation is simple; however, it has a limitation: Cycling Relationship.

As we discussed, both the model implementation and the relationship are managed by CTE. The Wren Engine dynamically generates the SQL. However, CTE requires a sequential ordering; it can only access CTEs that appear before it.

Think about this case:

SELECT o.customer_name, c.consumption
FROM Customer c JOIN Orders o ON o.custkey = c.custkey

It uses both customer_name and consumption in SQL. How does the Wren Engine sort CTEs? Understanding this will allow us to determine that the topology of the CTEs is a Directed Acyclic Graph (DAG). If you receive the error message regarding cycling relationships, check whether the selected items create a dependency cycle or not.

Relationship

Overview

Define Relationship

The join type of Relationship​

How to Use Relationship

Relationship Field​

Calculated Field​

The common definition of the calculated field​

Use TO_ONE Relationship​

Use TO_MANY Relationship​

About MANY_TO_MANY relationship​