Model
The model provides a logical layer for your table, allowing you to enhance its semantics without impacting the real data. Through the model, you can expose only the specific data you intend to expose.
Define Model
Model is a reference to your physical table. Wren Engine has some attributes used to describe a model. Typically, we can describe a model as a JSON object:
{
"name": "Course",
"properties": {},
"refSql": "select * from \"bq-project\".spider.\"college_3-Course\"",
"columns": [
{
"name": "CID",
"type": "VARCHAR",
"notNull": false,
"isCalculated": false,
"expression": "CID",
"properties": {}
},
{
"name": "CName",
"type": "VARCHAR",
"notNull": false,
"isCalculated": false,
"expression": "CName",
"properties": {}
}
],
"primaryKey": "CID"
}
The model definition is made up of:
- Reference: The pointer to the physical table.
- Columns: The list of columns you intend to expose.
- Attributes: Common attributes to describe the model.
Common Attributes
name
: The name of the model. You will query the model by this name.primaryKey
: The primary key of the model. It’s used by relationship querying. It should be unique.properties
: The additional descriptions of the model. It’s used to enhance the semantics for LLM context or readability. It should be the string-to-string key value pair.
Reference to a table
Wren Engine provides 2 way to reference a physical table. You should choose one of them to present the reference:
-
refSql
: Use an SQL query result as a reference. You can manipulate SQL in this section to obtain the desired result set. The only thing to note is that the SQL syntax here must follow the Wren Engine syntax.{
"name": "CS_Course",
"properties": {},
"refSql": "select * from \"bq-project\".spider.\"college_3-Course\" where CName = 'CS'"
} -
tableReference
: The specific reference is used to point to a physical table.{
"name": "CS_Course",
"properties": {},
"tableReference": {
"catalog": "canner-cml",
"schema": "spider",
"table": "college_3-Course"
}
}- Using
tableReference
is more favorable for query optimization in some databases. We will talk about this in How Model Works
- Using
Column
After defining the reference, it is necessary to assign columns to the model. A model must possess at least one column. These columns serve as the components that can be manipulated within the model.
- Use
columns
attribute to build a list of columns.
{
"name": "Course",
"properties": {},
"refSql": "select * from \"bq-project\".spider.\"college_3-Course\"",
"columns": [
{
"name": "Course_ID",
"type": "VARCHAR",
"notNull": false,
"expression": "CID",
"properties": {}
},
{
"name": "CName",
"type": "VARCHAR",
"notNull": false,
"expression": "CName",
"properties": {}
}
]
}
- As seen in the example above, each column has numerous attributes that define its properties.
Column Attribute
There’re some attributes you can used to describe a column:
name
: The name of a column. Ifexpression
isn’t set,column
should be a column belong to the referenced physical table.type
: The data type of a column. Wren Engine follows Postgres Type. See more in Fields and Types.notNull
: To mention if the column is nullable.expression
: You can use a SQL expression to represent a column. Sometimes, the original name isn’t semantically meaningful. You can use anexpression
to refer to it and rename the column byname
accordingly. If theexpression
is set, thename
becomes the alias of this column.- About the access scope of an expression, it can only access the columns of the referenced physical table.
properties
: Just same as the properties of the model.
Column has two variants, which can be utilized by setting specific attributes.
- Relationship Field
- Calculated Field
Both of them are related to a key concept in Wren Engine, Relationship
. We will talk about them in Relationship.
Query Model
When define a Modeling Definition Language(MDL), we need to set up the catalog
and schema
for it. For example:
{
"catalog": "my_catalog",
"schema": "my_schema",
"models": [
{
"name": "Course",
"properties": {},
"refSql": "select * from \"bq-project\".spider.\"college_3-Course\"",
"columns": [
{
"name": "Course_ID",
"type": "VARCHAR",
"notNull": false,
"expression": "CID",
"properties": {}
},
{
"name": "CName",
"type": "VARCHAR",
"notNull": false,
"expression": "CName",
"properties": {}
}
]
}
]
}
All dataset (model, metric or view…) will be placed into the search path my_catalog.my_schema
. Then, we can query the defined model with Wren Engine SQL:
SELECT Course_ID FROM my_catalog.my_schema.Course
You can query them just as you would with traditional RDBMS. The Wren Engine will then translate your SQL into executable SQL for your specific data source.
How Model Works
Wren Engine has a statement analysis mechanism to determine which dataset is required at runtime. Basically, if you queried a SQL like
SELECT Course_ID FROM my_catalog.my_schema.Course
Wren Engine will translate the SQL to
WITH Course as (
SELECT
CID as Course_ID,
CName
FROM
(
select * from "bq-project".spider."college_3-Course"
)
)
SELECT Course_ID FROM Course
You can see that the columns of CTE are what you set in the model.
TableReference and RefSql
As shown in the example, you may notice that the subquery is defined in the refSql
. How the Wren Engine accesses your data source depends on your settings. Mostly, people prefers to select all columns from the physical table. However, this approach is not always favorable for SQL optimization. Ideally, a SQL engine should perform column projection based on your usage at runtime. Unfortunately, we cannot guarantee that all data sources can handle this efficiently. This is why we offer tableReference
as an alternative option for referencing.
Let’s take a look how tableReference
works. Given a MDL like
{
"catalog": "my_catalog",
"schema": "my_schema",
"models": [
{
"name": "Course",
"tableReference": {
"catalog": "canner-cml",
"schema": "spider",
"table": "college_3-Course"
},
"columns": [
{
"name": "Course_ID",
"type": "VARCHAR",
"expression": "CID",
},
{
"name": "CName",
"type": "VARCHAR",
"expression": "CName",
}
]
}
]
}
If we submit the same SQL:
SELECT Course_ID FROM my_catalog.my_schema.Course
Wren Engine will translate it to
WITH Course as (
SELECT
CID as Course_ID,
CName
FROM
"bq-project".spider."college_3-Course"
)
SELECT Course_ID FROM Course
This CTE serves as an explicit projection for any data source, alleviating concerns about SQL optimization issues on the data source side. If you don't need to perform any special actions in refSql
, it's advisable to consider using tableReference
instead.