Advanced Tutorial
What to expect from this tutorial?
In this tutorial, you will use many awesome features in the Modeling Definition Language (MDL).
NOTE: This tutorial assumes you have access to a running Wren Engine. If not, check out the QuickStart and Initialize.
Before we use Wren Engine, let us show you the most important part of Wren Engine - the Modeling Definition Language (MDL). We use the MDL to define references and relationships, it provides many convenient features. Learn more at Modeling Overview
Prepare MDL file
You must create a file name mdl.json
into the path etc/mdl
.
├── etc
│ ├── mdl
│ │ └── mdl.json
│ ├── config.properties
Set the property wren.directory
with etc/mdl
in the etc/config.properties
.
wren.directory=etc/mdl
All MDL settings will be in the etc/mdl/mdl.json
. Let’s modify the mdl.json
for you now.
What is Modeling Definition Language (MDL)
What is Modeling Definition Language (MDL)?
The basic elements of MDL must have these fields.
catalog
: You can define a new catalog name to abstract the original catalog.schema
: You can define a new schema name to abstract the original schema.models
: Define a new model to reference real data entities.
{
"catalog": "my_catalog",
"schema": "my_schema",
"models": [
{
"name": "Orders",
"properties": {},
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"expression": "o_orderkey",
"type": "integer"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
}
],
"primaryKey": "orderkey"
},
{
"name": "Customer",
"refSql": "select * from tpch.customer",
"columns": [
{
"name": "custkey",
"expression": "c_custkey",
"type": "integer"
},
{
"name": "name",
"expression": "c_name",
"type": "varchar"
}
],
"primaryKey": "custkey"
}
]
}
What is a Model?
A model serves as a logical layer for your table, allowing you to enrich its meaning without affecting the actual data. You should only expose the specific dataset you intend to share with the end users.
The model is a reference to your physical table in your database. Wren Engine has provided some attributes used to describe a model. Typically, we can define a model as a JSON object.
More details about each fields you can find them in the Model
Query Models
Let’s query data through Wren Engine.
All datasets (model, metric, or view) will be placed into the search path my_catalog.my_schema
. Then, we can query the defined model with Wren Engine SQL:
SELECT orderkey FROM my_catalog.my_schema.Orders
You can query data just as you would with traditional RDBMS. The Wren Engine will then translate your SQL into executable code for your specific data source.
Add more different datasets
We provide many types of dataset
- Model
- Relationship
- Metric
- View
In this tutorial, we will add Relationship
in the mdl.json
. For other data modeling types you can find them in the Data Modeling - Overview
What is a Relationship?
A relationship describes how two models linked together and enable seamless data integration and analysis across both entities, such as how to calculate specific column or JOIN two tables correctly. We will auto-join two models by joinType
and condition
when you query the related tables. More detail in the Relationship.
"relationships": [
{
"name": "OrdersCustomer",
"models": [ "Orders", "Customer" ],
"joinType": "MANY_TO_ONE",
"condition": "Orders.custkey = Customer.custkey"
}
]
After adding the relationships
, you can add a new field called relationship
to a model to establish a relationship between two models.
{
"name": "Orders",
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"type": "integer",
"expression": "o_orderkey"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
},
{
"name": "customer",
"type": "Customer",
// add relationship field here
"relationship": "CustomerOrders"
}
],
"primaryKey": "orderkey"
}
Calculated field
And you can add new field with isCalculated
to allow you to get a column value by referencing from other models.
{
"name": "Orders",
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"type": "integer",
"expression": "o_orderkey"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
},
{
"name": "customer",
"type": "Customer",
"relationship": "CustomerOrders"
},
{
"name": "customer_name",
"type": "varhcar",
// add `isCalculated` field
"isCalculated": true,
// reference to `customer.name` column
"expression": "customer.name"
}
],
"primaryKey": "orderkey"
}
Finally, you can build a MDL to query calculated field as below.
{
"catalog": "my_catalog",
"schema": "my_schema",
"models": [
{
"name": "Orders",
"refSql": "select * from tpch.orders",
"columns": [
{
"name": "orderkey",
"type": "integer",
"expression": "o_orderkey"
},
{
"name": "custkey",
"type": "integer",
"expression": "o_custkey"
},
{
"name": "customer",
"type": "Customer",
"relationship": "CustomerOrders"
},
{
"name": "customer_name",
"type": "varhcar",
"isCalculated": true,
"expression": "customer.name"
}
],
"primaryKey": "orderkey"
},
{
"name": "Customer",
"refSql": "select * from tpch.customer",
"columns": [
{
"name": "custkey",
"expression": "c_custkey",
"type": "integer"
},
{
"name": "name",
"expression": "c_name",
"type": "varchar"
}
],
"primaryKey": "custkey"
}
],
"relationships": [
{
"name": "OrdersCustomer",
"models": [ "Orders", "Customer" ],
"joinType": "MANY_TO_ONE",
"condition": "Orders.custkey = Customer.custkey"
}
]
}
Now, you can directly query the name of the Customer that related to Orders
SELECT customer_name FROM Orders;
customer_name
--------------------
Customer#000000370
Customer#000000781
Customer#000001234
Customer#000001369
Customer#000000557
Customer#000000392
...
In the original database, the SQL that you want to get the name of the Customer via o_custkey
of Orders need to write query as below:
SELECT
Customer.c_name
FROM
tpch.orders AS Orders
JOIN
tpch.customer AS Customer ON Orders.o_custkey = Customer.c_custkey;
With the calculated field in Wren Engine, you can reduce a lot of replications of JOIN statements and aggregation functions when querying different metrics and KPIs.
What’s Next?
- If you want to learn more about data model, check out the Data Modeling - Overview
- Want to use different data sources? We provide a complete and detailed tutorial at Connecting to Data Sources - Overview