Skip to main content

Advanced Tutorial

What to expect from this tutorial?

In this tutorial, you will use many awesome features in the Modeling Definition Language (MDL).

NOTE: This tutorial assumes you have access to a running Wren Engine. If not, check out the QuickStart and Initialize.

Before we use Wren Engine, let us show you the most important part of Wren Engine - the Modeling Definition Language (MDL). We use the MDL to define references and relationships, it provides many convenient features. Learn more at Modeling Overview

Prepare MDL file

You must create a file name mdl.json into the path etc/mdl.

├── etc
│ ├── mdl
│ │ └── mdl.json
│ ├── config.properties

Set the property wren.directory with etc/mdl in the etc/config.properties.

wren.directory=etc/mdl

All MDL settings will be in the etc/mdl/mdl.json. Let’s modify the mdl.json for you now.

What is Modeling Definition Language (MDL)

What is Modeling Definition Language (MDL)?

The basic elements of MDL must have these fields.

  • catalog: You can define a new catalog name to abstract the original catalog.
  • schema: You can define a new schema name to abstract the original schema.
  • models: Define a new model to reference real data entities.
{
"catalog": "my_catalog",
"schema": "my_schema",
"models": [
{
"name": "Orders",
"properties": {},
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"expression": "o_orderkey",
"type": "integer"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
}
],
"primaryKey": "orderkey"
},
{
"name": "Customer",
"refSql": "select * from tpch.customer",
"columns": [
{
"name": "custkey",
"expression": "c_custkey",
"type": "integer"
},
{
"name": "name",
"expression": "c_name",
"type": "varchar"
}
],
"primaryKey": "custkey"
}
]
}

What is a Model?

A model serves as a logical layer for your table, allowing you to enrich its meaning without affecting the actual data. You should only expose the specific dataset you intend to share with the end users.

The model is a reference to your physical table in your database. Wren Engine has provided some attributes used to describe a model. Typically, we can define a model as a JSON object.

More details about each fields you can find them in the Model

Query Models

Let’s query data through Wren Engine.

All datasets (model, metric, or view) will be placed into the search path my_catalog.my_schema. Then, we can query the defined model with Wren Engine SQL:

SELECT orderkey FROM my_catalog.my_schema.Orders

You can query data just as you would with traditional RDBMS. The Wren Engine will then translate your SQL into executable code for your specific data source.

Add more different datasets

We provide many types of dataset

  • Model
  • Relationship
  • Metric
  • View

In this tutorial, we will add Relationship in the mdl.json. For other data modeling types you can find them in the Data Modeling - Overview

What is a Relationship?

A relationship describes how two models linked together and enable seamless data integration and analysis across both entities, such as how to calculate specific column or JOIN two tables correctly. We will auto-join two models by joinType and condition when you query the related tables. More detail in the Relationship.

"relationships": [
{
"name": "OrdersCustomer",
"models": [ "Orders", "Customer" ],
"joinType": "MANY_TO_ONE",
"condition": "Orders.custkey = Customer.custkey"
}
]

After adding the relationships, you can add a new field called relationship to a model to establish a relationship between two models.

{
"name": "Orders",
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"type": "integer",
"expression": "o_orderkey"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
},
{
"name": "customer",
"type": "Customer",
// add relationship field here
"relationship": "CustomerOrders"
}
],
"primaryKey": "orderkey"
}

Calculated field

And you can add new field with isCalculated to allow you to get a column value by referencing from other models.

{
"name": "Orders",
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"type": "integer",
"expression": "o_orderkey"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
},
{
"name": "customer",
"type": "Customer",
"relationship": "CustomerOrders"
},
{
"name": "customer_name",
"type": "varhcar",
// add `isCalculated` field
"isCalculated": true,
// reference to `customer.name` column
"expression": "customer.name"
}
],
"primaryKey": "orderkey"
}

Finally, you can build a MDL to query calculated field as below.

{
"catalog": "my_catalog",
"schema": "my_schema",
"models": [
{
"name": "Orders",
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"type": "integer",
"expression": "o_orderkey"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
},
{
"name": "customer",
"type": "Customer",
"relationship": "CustomerOrders"
},
{
"name": "customer_name",
"type": "varhcar",
"isCalculated": true,
"expression": "customer.name"
}
],
"primaryKey": "orderkey"
},
{
"name": "Customer",
"refSql": "select * from tpch.customer",
"columns": [
{
"name": "custkey",
"expression": "c_custkey",
"type": "integer"
},
{
"name": "name",
"expression": "c_name",
"type": "varchar"
}
],
"primaryKey": "custkey"
}
],
"relationships": [
{
"name": "OrdersCustomer",
"models": [ "Orders", "Customer" ],
"joinType": "MANY_TO_ONE",
"condition": "Orders.custkey = Customer.custkey"
}
]
}

Now, you can directly query the name of the Customer that related to Orders

SELECT customer_name FROM Orders;
   customer_name
--------------------
Customer#000000370
Customer#000000781
Customer#000001234
Customer#000001369
Customer#000000557
Customer#000000392
...

In the original database, the SQL that you want to get the name of the Customer via o_custkey of Orders need to write query as below:

SELECT
Customer.c_name
FROM
tpch.orders AS Orders
JOIN
tpch.customer AS Customer ON Orders.o_custkey = Customer.c_custkey;

With the calculated field in Wren Engine, you can reduce a lot of replications of JOIN statements and aggregation functions when querying different metrics and KPIs.

What’s Next?