Skip to main content

Model

The model provides a logical layer for your table, allowing you to enhance its semantics without impacting the real data. Through the model, you can expose only the specific data you intend to expose.

Define Model

Model is a reference to your physical table. Wren Engine has some attributes used to describe a model. Typically, we can describe a model as a JSON object:

{
"name": "Course",
"properties": {},
"refSql": "select * from \"bq-project\".spider.\"college_3-Course\"",
"columns": [
{
"name": "CID",
"type": "VARCHAR",
"notNull": false,
"isCalculated": false,
"expression": "CID",
"properties": {}
},
{
"name": "CName",
"type": "VARCHAR",
"notNull": false,
"isCalculated": false,
"expression": "CName",
"properties": {}
}
],
"primaryKey": "CID"
}

The model definition is made up of:

  • Reference: The pointer to the physical table.
  • Columns: The list of columns you intend to expose.
  • Attributes: Common attributes to describe the model.

Common Attributes

  • name : The name of the model. You will query the model by this name.
  • primaryKey: The primary key of the model. It’s used by relationship querying. It should be unique.
  • properties: The additional descriptions of the model. It’s used to enhance the semantics for LLM context or readability. It should be the string-to-string key value pair.

Reference to a table

Wren Engine provides 2 way to reference a physical table. You should choose one of them to present the reference:

  • refSql: Use an SQL query result as a reference. You can manipulate SQL in this section to obtain the desired result set. The only thing to note is that the SQL syntax here must follow the Wren Engine syntax.

    {
    "name": "CS_Course",
    "properties": {},
    "refSql": "select * from \"bq-project\".spider.\"college_3-Course\" where CName = 'CS'"
    }
  • tableReference: The specific reference is used to point to a physical table.

    {
    "name": "CS_Course",
    "properties": {},
    "tableReference": {
    "catalog": "canner-cml",
    "schema": "spider",
    "table": "college_3-Course"
    }
    }
    • Using tableReference is more favorable for query optimization in some databases. We will talk about this in How Model Works

Column

After defining the reference, it is necessary to assign columns to the model. A model must possess at least one column. These columns serve as the components that can be manipulated within the model.

  • Use columns attribute to build a list of columns.
{
"name": "Course",
"properties": {},
"refSql": "select * from \"bq-project\".spider.\"college_3-Course\"",
"columns": [
{
"name": "Course_ID",
"type": "VARCHAR",
"notNull": false,
"expression": "CID",
"properties": {}
},
{
"name": "CName",
"type": "VARCHAR",
"notNull": false,
"expression": "CName",
"properties": {}
}
]
}
  • As seen in the example above, each column has numerous attributes that define its properties.

Column Attribute

There’re some attributes you can used to describe a column:

  • name : The name of a column. If expression isn’t set, column should be a column belong to the referenced physical table.
  • type : The data type of a column. Wren Engine follows Postgres Type. See more in Fields and Types.
  • notNull: To mention if the column is nullable.
  • expression: You can use a SQL expression to represent a column. Sometimes, the original name isn’t semantically meaningful. You can use an expression to refer to it and rename the column by name accordingly. If the expression is set, the name becomes the alias of this column.
    • About the access scope of an expression, it can only access the columns of the referenced physical table.
  • properties: Just same as the properties of the model.

Column has two variants, which can be utilized by setting specific attributes.

  • Relationship Field
  • Calculated Field

Both of them are related to a key concept in Wren Engine, Relationship. We will talk about them in Relationship.

Query Model

When define a Modeling Definition Language(MDL), we need to set up the catalog and schema for it. For example:

{
"catalog": "my_catalog",
"schema": "my_schema",
"models": [
{
"name": "Course",
"properties": {},
"refSql": "select * from \"bq-project\".spider.\"college_3-Course\"",
"columns": [
{
"name": "Course_ID",
"type": "VARCHAR",
"notNull": false,
"expression": "CID",
"properties": {}
},
{
"name": "CName",
"type": "VARCHAR",
"notNull": false,
"expression": "CName",
"properties": {}
}
]
}
]
}

All dataset (model, metric or view…) will be placed into the search path my_catalog.my_schema. Then, we can query the defined model with Wren Engine SQL:

SELECT Course_ID FROM my_catalog.my_schema.Course

You can query them just as you would with traditional RDBMS. The Wren Engine will then translate your SQL into executable SQL for your specific data source.

How Model Works

Wren Engine has a statement analysis mechanism to determine which dataset is required at runtime. Basically, if you queried a SQL like

SELECT Course_ID FROM my_catalog.my_schema.Course

Wren Engine will translate the SQL to

WITH Course as (
SELECT
CID as Course_ID,
CName
FROM
(
select * from "bq-project".spider."college_3-Course"
)
)
SELECT Course_ID FROM Course

You can see that the columns of CTE are what you set in the model.

TableReference and RefSql

As shown in the example, you may notice that the subquery is defined in the refSql. How the Wren Engine accesses your data source depends on your settings. Mostly, people prefers to select all columns from the physical table. However, this approach is not always favorable for SQL optimization. Ideally, a SQL engine should perform column projection based on your usage at runtime. Unfortunately, we cannot guarantee that all data sources can handle this efficiently. This is why we offer tableReference as an alternative option for referencing.

Let’s take a look how tableReference works. Given a MDL like

{
"catalog": "my_catalog",
"schema": "my_schema",
"models": [
{
"name": "Course",
"tableReference": {
"catalog": "canner-cml",
"schema": "spider",
"table": "college_3-Course"
},
"columns": [
{
"name": "Course_ID",
"type": "VARCHAR",
"expression": "CID",
},
{
"name": "CName",
"type": "VARCHAR",
"expression": "CName",
}
]
}
]
}

If we submit the same SQL:

SELECT Course_ID FROM my_catalog.my_schema.Course

Wren Engine will translate it to

WITH Course as (
SELECT
CID as Course_ID,
CName
FROM
"bq-project".spider."college_3-Course"

)
SELECT Course_ID FROM Course

This CTE serves as an explicit projection for any data source, alleviating concerns about SQL optimization issues on the data source side. If you don't need to perform any special actions in refSql, it's advisable to consider using tableReference instead.