Skip to main content

Calculated Field

As mentioned in Column Attribute and Relationship , the column of the model has a variant type: Calculated Field. Wren Engine provides the calculated field to define a calculation in the model. A calculation can use a defined column in the same model or a related column in another model through a relationship. Typically, a common metric is related to many different tables. Through calculated fields, it is easy to define a common metric that interacts between different models.

Define Calculated Field

After modeling a model and its columns, we can consider basing on them to create more value. Let’s check an example quickly:

{
"name": "orders",
"tableReference": {
"catalog": "memory",
"schema": "main",
"table": "orders"
},
"columns": [
{
"name": "orderkey",
"type": "INTEGER"
},
{
"name": "custkey",
"type": "INTEGER"
}
{
"name": "price",
"type": "INTEGER"
},
{
"name": "purchasetimestamp",
"type": "TIMESTAMP"
}
]
}

It’s a defined model called orders with 3 columns. To enhance the model, we may want to add a column called customer_last_month_orders_price to know the growth of every customer. We can define a calculated field like

"columns": [
{
"name": "orderkey",
"type": "INTEGER"
},
{
"name": "custkey",
"type": "INTEGER"
}
{
"name": "price",
"type": "INTEGER"
},
{
"name": "purchasetimestamp",
"type": "TIMESTAMP"
},
{
"name": "customer_last_month_orders_price",
"type": "INTEGER",
"isCalculated": "true",
"expression": "lag(price) over (partition by custkey order by date_trunc('YEAR', purchasetimestamp), 0, 0)"
}
]

You can see the new column has an attribute called isCalculated. It means the column is a calculated field which can access other column in this model or use relationship field to access the column in another model. About the relationship field, you can refer to Relationship for more details.

Then, you can query this model like

SELECT customer_last_month_orders_price FROM orders

How Calculated Field work

The concept is very simple: Just wrap another subquery for the model. Let’s see what generated by Wren Engine from the above orders model.

WITH
"orders" AS (
-- finaly projection
SELECT
"orders"."orderkey" "orderkey"
, "orders"."custkey" "custkey"
, "orders"."price" "price"
, "orders"."purchasetimestamp" "purchasetimestamp"
, "orders"."customer_last_month_orders_price" "customer_last_month_orders_price"
FROM
(
-- calculated field scope
SELECT
"orders"."orderkey" "orderkey"
, "orders"."custkey" "custkey"
, "orders"."price" "price"
, "orders"."purchasetimestamp" "purchasetimestamp"
-- build customer_last_month_orders_price field
, lag(price) OVER (PARTITION BY custkey ORDER BY date_trunc('YEAR', purchasetimestamp) ASC, 0 ASC, 0 ASC) "customer_last_month_orders_price"
FROM
(
-- normal column scope
SELECT
"orderkey" "orderkey"
, "custkey" "custkey"
, "price" "price"
, "purchasetimestamp" "purchasetimestamp"
FROM
"memory"."main"."orders" "orders"
) "orders"
) "orders"
)
SELECT customer_last_month_orders_price
FROM
orders

We can see that the structure of this query has multiple subqueries, each used for different purposes. Calculated fields are built in the second layer, the calculated field scope.

It also mentions a limitation of the calculated field: it can only access the normal columns in the same model. It cannot access another calculated field in the same model.

Let’s see a more complex case with relationship.

Calculation with Relationship

Just modify the above mdl:

 "models": [
{
"name": "orders",
"tableReference": {
"catalog": "memory",
"schema": "main",
"table": "orders"
},
"columns": [
{
"name": "orderkey",
"type": "INTEGER"
},
{
"name": "custkey",
"type": "INTEGER"
},
{
"name": "price",
"type": "INTEGER"
},
{
"name": "purchasetimestamp",
"type": "TIMESTAMP"
},
{
"name": "cusotmer",
"type": "customer",
"relationship": "customer_orders"
},
{
"name": "customer_last_month_orders_price",
"type": "INTEGER",
"isCalculated": "true",
"expression": "lag(price) over (partition by custkey order by date_trunc('YEAR', purchasetimestamp), 0, 0)"
},
{
"name": "customer_name_last_month_orders_price",
"type": "INTEGER",
"isCalculated": "true",
"expression": "lag(price) over (partition by cusotmer.name order by date_trunc('YEAR', purchasetimestamp), 0, 0)"
}
],
"primaryKey": "orderkey"
},
{
"name": "customer",
"tableReference": {
"catalog": "memory",
"schema": "main",
"table": "customer"
},
"columns": [
{
"name": "custkey",
"type": "INTEGER"
},
{
"name": "name",
"type": "STRING"
}
]
}
],
"relationships": [
{
"name": "customer_orders",
"models": ["customer", "orders"],
"joinType": "ONE_TO_MANY",
"condition": "customer.custkey = orders.custkey"
}
]

We create another model called customer and establish the relationship customer_orders. We add another similar calculated field, customer_name_last_month_orders_price, using customer.name as the partition key.

The SQL to access customer_last_month_orders_price and customer_name_last_month_orders_price is as follows:

SELECT customer_last_month_orders_price, customer_name_last_month_orders_price FROM orders

The generated SQL will be:

WITH
"customer" AS (
SELECT
"customer"."custkey" "custkey"
, "customer"."name" "name"
FROM
(
SELECT
"customer"."custkey" "custkey"
, "customer"."name" "name"
FROM
(
SELECT
"custkey" "custkey"
, "name" "name"
FROM
"memory"."main"."customer" "customer"
) "customer"
) "customer"
)
, "orders" AS (
SELECT
"orders"."orderkey" "orderkey"
, "orders"."custkey" "custkey"
, "orders"."price" "price"
, "orders"."purchasetimestamp" "purchasetimestamp"
, "orders"."customer_last_month_orders_price" "customer_last_month_orders_price"
, "orders_relationsub"."customer_name_last_month_orders_price" "customer_name_last_month_orders_price"
FROM
((
SELECT
"orders"."orderkey" "orderkey"
, "orders"."custkey" "custkey"
, "orders"."price" "price"
, "orders"."purchasetimestamp" "purchasetimestamp"
-- calcualted field no used relationship
, lag(price) OVER (PARTITION BY custkey ORDER BY date_trunc('YEAR', purchasetimestamp) ASC, 0 ASC, 0 ASC) "customer_last_month_orders_price"
FROM
(
-- normal column scope
SELECT
"orderkey" "orderkey"
, "custkey" "custkey"
, "price" "price"
, "purchasetimestamp" "purchasetimestamp"
FROM
"memory"."main"."orders" "orders"
) "orders"
) "orders"
LEFT JOIN (
SELECT
"orders"."orderkey"
-- calcualted field used relationship
, lag(price) OVER (PARTITION BY "customer"."name" ORDER BY date_trunc('YEAR', purchasetimestamp) ASC, 0 ASC, 0 ASC) "customer_name_last_month_orders_price"
FROM
((
-- normal column scope
SELECT
"orderkey" "orderkey"
, "custkey" "custkey"
, "price" "price"
, "purchasetimestamp" "purchasetimestamp"
FROM
"memory"."main"."orders" "orders"
) "orders"
LEFT JOIN "customer" ON ("customer"."custkey" = "orders"."custkey"))
) "orders_relationsub" ON ("orders"."orderkey" = "orders_relationsub"."orderkey"))
)
SELECT
customer_last_month_orders_price
, customer_name_last_month_orders_price
FROM
orders

It's easy to see that the field invoking the relationship would be built in another scope because it requires additional join tables.

However, they have the same limitations: they can only use the normal columns in the same model or the columns in another related model.