Skip to main content

Model

A Model is the core building block of Wren MDL. It maps a physical table (or a SQL expression) to a named semantic entity that AI agents and SQL clients query by name. Models define which columns are exposed and how columns relate to other models.

Defining a Model

Every model requires three things:

  1. A name — the identifier used in queries (SELECT * FROM customers)
  2. A data source — where the data lives (table_reference, ref_sql, or base_object)
  3. Columns — the fields that are exposed

YAML format (wren project)

name: customers
table_reference:
catalog: jaffle_shop
schema: main
table: customers
primary_key: customer_id
columns:
- name: customer_id
type: INTEGER
is_primary_key: true
- name: first_name
type: VARCHAR
- name: last_name
type: VARCHAR
- name: number_of_orders
type: BIGINT
- name: customer_lifetime_value
type: DOUBLE

JSON format (MDL manifest)

{
"name": "customers",
"tableReference": {
"catalog": "jaffle_shop",
"schema": "main",
"table": "customers"
},
"primaryKey": "customer_id",
"columns": [
{ "name": "customer_id", "type": "INTEGER", "isPrimaryKey": true, "isCalculated": false },
{ "name": "first_name", "type": "VARCHAR", "isCalculated": false },
{ "name": "last_name", "type": "VARCHAR", "isCalculated": false },
{ "name": "number_of_orders", "type": "BIGINT", "isCalculated": false },
{ "name": "customer_lifetime_value", "type": "DOUBLE", "isCalculated": false }
]
}

Model Fields

FieldRequiredDescription
nameYesUnique identifier used in SQL queries
table_referenceOne of threePoints to an existing physical table (catalog.schema.table)
ref_sqlOne of threeA SQL SELECT statement used as the model's data source
base_objectOne of threeReferences another model or view as the base
columnsYesList of columns to expose (see Column Fields)
primary_keyNoColumn name that uniquely identifies a row; required for relationships
propertiesNoArbitrary key-value metadata (description, tags, etc.)

Data Source: Three Ways to Point at Data

1. table_reference — map to a physical table

Used when the underlying table already exists in the database.

jaffle_shop example — the orders model maps directly to jaffle_shop.main.orders:

name: orders
table_reference:
catalog: jaffle_shop
schema: main
table: orders

When a query like SELECT * FROM orders is executed, Wren rewrites it to the fully-qualified physical table.

2. ref_sql — define the model inline with SQL (not yet supported)

Used when the model is derived — for example, a staging transform or a complex join that doesn't exist as a physical table.

name: stg_orders
ref_sql: >
SELECT
id AS order_id,
user_id AS customer_id,
order_date,
status
FROM jaffle_shop.main.raw_orders

3. base_object — inherit from another model (not yet supported)

References an existing model or view by name as the base. Useful for layered modeling (raw → staging → mart).

name: active_orders
base_object: orders
columns:
- name: order_id
type: INTEGER
- name: order_date
type: DATE

Column Fields

FieldRequiredDescription
nameYesColumn name used in SQL
typeYesSQL data type (VARCHAR, INTEGER, DOUBLE, DATE, TIMESTAMP, etc.)
is_calculatedNoIf true, the column value is derived from expression at query time
expressionNoSQL expression for calculated columns
relationshipNoName of a Relationship — makes this column a join handle
not_nullNoConstraint hint; false by default
is_primary_keyNoMarks the column as the model's primary key
is_hiddenNoEngine-internal flag; column is excluded from the symbol table and invisible to all clients
propertiesNoArbitrary metadata

Regular columns

A regular column maps to a field in the underlying table. These are called source columns — the engine registers them in the physical schema so DataFusion can read them directly.

By default, the model column name is used as the physical field name. If the physical column has a different name, use expression to declare a simple rename:

- name: order_date        # model name (exposed to clients)
type: DATE
is_calculated: false

- name: customer_id # renamed from the physical column "usr_id"
type: INTEGER
is_calculated: false
expression: usr_id

The expression on a non-calculated column must be a single column reference — it cannot contain operators or function calls. See Engine Internals for the full resolution rules.

Calculated columns

A calculated column is computed from a SQL expression at query time. Wren inlines the expression into the generated SQL.

- name: is_large_order
type: BOOLEAN
is_calculated: true
expression: "amount > 100"

Calculated columns can reference other columns in the same model or traverse relationships:

- name: customer_name
type: VARCHAR
is_calculated: true
expression: "customers.first_name || ' ' || customers.last_name"
relationship: orders_customers

Relationship columns

A relationship column declares a join path to another model. The relationship field names a Relationship defined elsewhere in the MDL.

# In the orders model
- name: customer
type: customers # the related model name
relationship: orders_customers

This makes orders.customer.first_name valid SQL — Wren resolves the join automatically.

jaffle_shop Example

The jaffle_shop dataset has three layers of models that illustrate the full range of modeling patterns:

raw_orders ──► stg_orders ──► orders
raw_customers ──► stg_customers ──► customers
raw_payments ──► stg_payments

Raw layer — table_reference

Raw models point directly at source tables with minimal transformation:

name: raw_orders
table_reference:
catalog: jaffle_shop
schema: main
table: raw_orders
primary_key: id
columns:
- { name: id, type: INTEGER, is_primary_key: true }
- { name: user_id, type: INTEGER }
- { name: order_date, type: DATE }
- { name: status, type: VARCHAR }

Staging layer — renamed and typed

Staging models clean column names and enforce types. They use ref_sql or point at staging tables:

name: stg_orders
table_reference:
catalog: jaffle_shop
schema: main
table: stg_orders
primary_key: order_id
columns:
- { name: order_id, type: INTEGER, is_primary_key: true }
- { name: customer_id, type: INTEGER }
- { name: order_date, type: DATE }
- { name: status, type: VARCHAR }

Mart layer — enriched with metrics

Mart models expose business-ready fields, including pre-aggregated metrics:

name: customers
table_reference:
catalog: jaffle_shop
schema: main
table: customers
primary_key: customer_id
columns:
- { name: customer_id, type: INTEGER, is_primary_key: true }
- { name: first_name, type: VARCHAR }
- { name: last_name, type: VARCHAR }
- { name: first_order, type: DATE }
- { name: most_recent_order, type: DATE }
- { name: number_of_orders, type: BIGINT }
- { name: customer_lifetime_value, type: DOUBLE }

Cross-model relationships

The orders_customers relationship (defined in relationships.yml) links orders.customer_id → customers.customer_id. With this in place, you can query across models without writing any JOIN:

-- Wren resolves the join automatically
SELECT
order_id,
orders.customer.first_name,
orders.customer.last_name,
amount
FROM orders
WHERE orders.customer.number_of_orders > 3

See Relationship for full details on defining join paths.

Using Models in SQL

Once defined, models are first-class SQL table names:

SELECT * FROM customers;

SELECT o.order_id, o.amount, c.first_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;

-- Or let Wren handle the join via a relationship column:
SELECT order_id, customer.first_name, amount
FROM orders;

Wren translates these queries to the appropriate dialect SQL for the connected data source before execution.

Column-Level Access Control via Selective Exposure

A model does not have to expose every column in the underlying table. By explicitly listing only the columns a client should see, you create a hard boundary at the semantic layer — columns that are not declared in the model simply do not exist from the client's perspective.

This is especially valuable in the AI era. When an AI agent (Claude, Cursor, Cline, etc.) connects through Wren MCP, it can only discover and query the columns that are declared in the model. Sensitive fields that are omitted from the model are physically invisible to the agent — no prompt injection or accidental exposure can retrieve them.

Example: hiding PII from AI agents

Suppose the physical customers table contains PII columns that should never reach an AI agent:

Physical columnExpose to AI?
customer_idYes
first_nameYes
last_nameYes
emailNo
phone_numberNo
date_of_birthNo
number_of_ordersYes
customer_lifetime_valueYes

Define the model with only the safe columns:

name: customers
table_reference:
catalog: jaffle_shop
schema: main
table: customers
primary_key: customer_id
columns:
- { name: customer_id, type: INTEGER, is_primary_key: true }
- { name: first_name, type: VARCHAR }
- { name: last_name, type: VARCHAR }
- { name: number_of_orders, type: BIGINT }
- { name: customer_lifetime_value, type: DOUBLE }

The AI agent sees a customers model with five columns. email, phone_number, and date_of_birth do not appear in schema introspection, cannot be referenced in SQL, and are never included in query results — regardless of what the agent asks.

Summary

TechniqueColumn reachable via SQLVisible in schema
Declared columnYesYes
Omitted from modelNoNo

Use omission to enforce hard boundaries for AI agents.

Engine Internals

Physical schema registration (infer_and_register_remote_table)

When the engine initialises a model, it builds an Arrow schema that represents the physical table as DataFusion sees it. Only source columns — columns that map directly to a field in the underlying table — are registered in this schema. The engine uses infer_source_column to decide whether each column qualifies, following these rules in order:

Column configurationSource column?Physical field name
is_calculated: trueNo— computed at query time from expression
has relationshipNo— resolved as a join at query time
no expressionYessame as name
expression is a simple column referenceYesinferred from the expression (supports rename)
expression is a complex SQL expressionNo— cannot be resolved statically
is_hidden: trueexcludedstripped from the symbol table before this step

is_hidden — engine-internal columns

is_hidden: true is an engine-internal flag. The engine strips hidden columns from its symbol table during MDL initialisation (get_visible_columns), so they never appear in schema introspection, lineage analysis, or access-control checks. They are invisible to every client — AI agents, SQL clients, and the metadata API alike.

This is used for columns the engine generates internally (e.g. join keys added automatically for relationship resolution) that should not be addressable by user queries.

expression on a non-calculated column — column rename

When is_calculated is false but an expression is present, the expression must be a simple column reference. The engine uses it to resolve which physical column to read and registers the model column name as an alias.

# Physical table has column "usr_id"; expose it as "customer_id" in the model
- name: customer_id
type: INTEGER
is_calculated: false
expression: usr_id

At query time SELECT customer_id FROM stg_orders becomes SELECT usr_id AS customer_id FROM ... in the generated SQL.

If the expression is compound (table.column), the engine takes the last identifier as the physical column name:

- name: customer_id
type: INTEGER
is_calculated: false
expression: raw_orders.user_id # physical name resolved as "user_id"

If the expression cannot be reduced to a single identifier (e.g. amount * 1.1), the column is not registered as a source column — it must use is_calculated: true instead.

is_calculated + expression — computed column

A calculated column is never registered as a source column. The engine inlines the expression SQL directly into the generated query at plan time:

- name: total_with_tax
type: DOUBLE
is_calculated: true
expression: "amount * 1.1"

Generated SQL: SELECT amount * 1.1 AS total_with_tax FROM orders

Calculated columns can also traverse relationship joins:

- name: customer_name
type: VARCHAR
is_calculated: true
expression: "customers.first_name || ' ' || customers.last_name"
relationship: orders_customers

The engine resolves customers.* references by expanding the orders_customers join automatically.

Summary

Column definition

├── is_hidden: true → stripped from symbol table; invisible to all clients

├── is_calculated: true → inlined as SQL expression at query time
│ └── relationship → join is expanded before inlining

├── no expression → direct physical column (name = physical name)

└── expression (simple) → rename: physical name from expression, model name as alias
expression (complex) → must use is_calculated: true