Skip to main content

MDL schema reference

This page documents every YAML artifact in a Wren project — wren_project.yml, models, relationships, views, cubes, and instructions.md — with the full field surface for each.

For the conceptual framing of MDL, see What does MDL do for the agent?. For the project lifecycle commands, see Manage project. For the canonical YAML compilation flow, run wren context build after editing.

Project structure

my_project/
├── wren_project.yml # project metadata
├── models/
│ ├── orders/
│ │ └── metadata.yml # table_reference mode (physical table)
│ ├── customers/
│ │ └── metadata.yml
│ └── revenue_summary/
│ ├── metadata.yml # ref_sql mode (SQL-defined model)
│ └── ref_sql.sql # SQL in separate file (optional)
├── views/
│ ├── monthly_revenue/
│ │ ├── metadata.yml
│ │ └── sql.yml # statement in separate file (optional)
│ └── top_customers/
│ └── metadata.yml # statement inline
├── cubes/
│ └── revenue/
│ └── metadata.yml
├── relationships.yml # all relationships
├── instructions.md # business and operational guidance for agents
├── queries.yml # curated NL-SQL pairs (optional)
├── .wren/ # runtime state (gitignored)
│ └── memory/ # LanceDB index files
└── target/
└── mdl.json # build output (gitignored)

YAML files use snake_case field names. The compiled target/mdl.json uses camelCase — the wire format expected by the engine.

wren_project.yml

schema_version: 3
name: my_project
version: "1.0"
catalog: wren
schema: public
data_source: postgres
profile: my-pg
FieldTypeRequiredDescription
schema_versionintyesDirectory layout version. 2 = folder-per-entity, 3 = adds dialect field support (current). Owned by the CLI — bump with wren context upgrade.
namestringyesProject identifier.
versionstringnoUser-defined project version (free-form, no parsing effect).
catalogstringnoWren AI namespace — not your database catalog. Defaults to wren.
schemastringnoWren AI namespace — not your database schema. Defaults to public.
data_sourcestringnoData source type (postgres, bigquery, snowflake, ...). Set by wren context set-profile.
profilestringnoThe bound connection profile name. Set by wren context set-profile.

Two levels of catalog and schema. The same field names appear in two places with completely different meanings. The project-level fields are Wren AI's internal namespace; the model-level table_reference.catalog and table_reference.schema point at the underlying database location.

LocationRefers toExample
wren_project.ymlcatalog, schemaWren AI namespacewren, public
models/*/metadata.ymltable_reference.catalog, table_reference.schemaDatabase locationjaffle_shop, main

Models (models/<name>/metadata.yml)

Each model is its own directory under models/. A model defines:

  • where its data comes from — table_reference or ref_sql
  • which columns are exposed
  • relationships and calculated fields

Model fields

FieldRequiredDescription
nameyesUnique model name (used in SQL queries).
table_referenceone of twoMaps to a physical table (catalog, schema, table).
ref_sqlone of twoA SQL SELECT used as the model's data source.
columnsyesList of columns to expose.
primary_keynoColumn name uniquely identifying a row; required for TO_MANY relationship traversals.
cachednoWhether query results should be cached. Defaults to false.
dialectnoSQL dialect of the model's ref_sql. Overrides project-level data_source for this model. Requires schema_version: 3.
propertiesnoArbitrary key-value metadata.

Using both table_reference and ref_sql in the same model is a validation error.

table_reference

FieldTypeRequiredDescription
catalogstringnoSource-side catalog (DuckDB database stem, BigQuery project, Snowflake database). Omit for sources without a catalog layer.
schemastringnoSource-side schema or dataset. Omit for flat sources.
tablestringyesSource-side table or view name.

Example: table_reference

name: customers
table_reference:
catalog: jaffle_shop
schema: main
table: customers
primary_key: customer_id
columns:
- name: customer_id
type: INTEGER
is_primary_key: true
not_null: true
- name: first_name
type: VARCHAR
- name: last_name
type: VARCHAR
- name: number_of_orders
type: BIGINT

Example: ref_sql

name: revenue_summary
ref_sql: |
SELECT DATE_TRUNC('month', order_date) AS month,
SUM(total) AS total_revenue
FROM orders
GROUP BY 1
columns:
- name: month
type: DATE
- name: total_revenue
type: DECIMAL

The SQL can live inline (above) or in a sibling ref_sql.sql file. The .sql file takes precedence.

Columns

FieldRequiredDescription
nameyesColumn name used in SQL.
typeyesSQL type (VARCHAR, INTEGER, DOUBLE, DATE, TIMESTAMP, BOOLEAN, DECIMAL, JSON, ...).
is_calculatednoIf true, the value is derived from expression at query time.
expressionnoSQL expression for calculated columns, or a single-column reference for simple renames.
relationshipnoName of a relationship — makes this a join handle column.
not_nullnoConstraint hint. Defaults to false.
is_primary_keynoMarks the column as the model's primary key.
is_hiddennoEngine-internal flag; column is stripped from the symbol table and invisible to all clients.
propertiesnoArbitrary metadata (e.g. properties.description).

Calculated columns

- name: total_with_tax
type: DOUBLE
is_calculated: true
expression: "amount * 1.1"

Relationship columns

A relationship column declares a join path to another model:

- name: customer
type: customers # the related model name
relationship: orders_customers

Then orders.customer.first_name is valid SQL — the engine resolves the join automatically.

Column rename via expression

When is_calculated is false but an expression is present, the expression must be a simple column reference. The engine resolves the physical column name from the expression and uses the model column name as the alias:

- name: customer_id # exposed name
type: INTEGER
expression: usr_id # physical column name

Selective exposure for column-level access control

A model does not have to expose every column in the underlying table. Omitted columns are physically invisible to clients — no SQL can reference them, and they do not appear in schema introspection.

This matters for AI agents: any column you omit from the model cannot be retrieved through Wren AI, regardless of what the agent asks.

Relationships (relationships.yml)

relationships:
- name: orders_customers
models:
- orders
- customers
join_type: MANY_TO_ONE
condition: orders.customer_id = customers.customer_id
FieldTypeRequiredDescription
namestringyesUnique relationship identifier.
modelsarray[string]yesExactly two model names [from, to].
join_typeenumyesONE_TO_ONE, ONE_TO_MANY, MANY_TO_ONE, or MANY_TO_MANY.
conditionstringyesSQL equality condition using model.column references on both sides.

The first model in models should appear on the left side of the condition. Only equality conditions are supported.

For TO_MANY relationships, calculated columns that traverse the relationship must use aggregate functions — the engine wraps the join in an aggregate subquery to prevent row multiplication.

Views (views/<name>/metadata.yml)

name: top_customers
statement: |
SELECT customer_id, SUM(total) AS lifetime_value
FROM wren.public.orders
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100
properties:
description: "Top customers by lifetime value"
FieldTypeRequiredDescription
namestringyesUnique view name.
statementstringyesComplete SQL SELECT. May reference other models or views.
dialectstringnoSQL dialect (metadata only — engine parses with its generic parser). Requires schema_version: 3.
propertiesnoArbitrary metadata.

The statement can live inline or in a sibling sql.yml file. The sql.yml takes precedence.

Views inherit no column declarations — schema is inferred from the statement at query time. Views can reference other views; the engine expands them recursively before resolving models.

Cubes (cubes/<name>/metadata.yml)

A cube is a pre-aggregated semantic object: a base model or view, plus declared measures, dimensions, time dimensions, and hierarchies.

name: revenue
base_object: orders
measures:
- name: total
expression: SUM(amount)
type: DOUBLE
- name: order_count
expression: COUNT(*)
type: BIGINT
dimensions:
- name: status
expression: status
type: VARCHAR
time_dimensions:
- name: month
expression: order_date
grain: month
type: DATE
hierarchies:
- name: time
levels: [year, quarter, month]
FieldRequiredDescription
nameyesUnique cube name.
base_objectyesModel or view this cube aggregates over.
measures[]yesAggregated values (expression + type).
dimensions[]noCategorical group-bys.
time_dimensions[]noTime-based group-bys with explicit grain.
hierarchies[]noOrdered levels for drill-down (year → quarter → month).
refresh_timenoCache refresh interval.
propertiesnoArbitrary metadata.

Cubes are queried structurally via wren cube query, not by writing raw GROUP BY SQL. See Pre-aggregate with cubes for the agent-facing recipe.

Instructions (instructions.md)

Free-form markdown with business and operational guidance for AI agents. Organized by topic with ## headings — each heading and its body becomes a retrievable chunk in memory.

## Business rules
- Revenue queries must use `net_revenue`, not `gross_revenue`.
- All active-customer queries exclude rows where `is_internal = true`.

## Canonical tables
- Use `customers` for analytics, not `customers_v3` or `loyalty_v3`.

## Formatting
- Currency is USD; display with thousand separators and 2 decimals.
- Timestamps are stored in UTC.

Instructions are consumed by agents, not by the engine. They are intentionally excluded from target/mdl.json. Agents access them via:

  • wren context instructions — full text, run once at session start
  • wren memory fetch -q "..." — relevant chunks per query

queries.yml (optional)

Curated natural-language-to-SQL pairs that seed memory. Same format as wren memory dump output:

version: 1
pairs:
- nl: "monthly revenue by product category"
sql: |
SELECT category, DATE_TRUNC('month', order_date) AS month, SUM(amount)
FROM orders
GROUP BY 1, 2
source: user
datasource: postgres-prod

wren memory index auto-loads queries.yml after indexing the schema. Pairs added through wren memory store can be exported back to queries.yml with wren memory dump.

Snake_case to camelCase mapping

wren context build converts YAML field names to camelCase in target/mdl.json:

YAMLJSON
table_referencetableReference
ref_sqlrefSql
is_calculatedisCalculated
not_nullnotNull
is_primary_keyisPrimaryKey
primary_keyprimaryKey
join_typejoinType
data_sourcedataSource
layout_versionlayoutVersion
refresh_timerefreshTime
base_objectbaseObject

Generic rule: split on _, capitalize each word after the first, join. All other fields (name, type, catalog, schema, table, condition, models, columns, cached, dialect, properties) are identical in both formats.

layoutVersion is stamped automatically by wren context build based on the project's schema_version — you do not set it manually in YAML.

Dialect override

Models and views support an optional dialect field declaring which SQL dialect their embedded SQL is written in. Requires schema_version: 3.

SettingBehavior
dialect omittedFalls back to project-level data_source. Default.
dialect setEngine uses the named dialect parser for this object's SQL.

Valid dialect values: athena, bigquery, canner, clickhouse, databricks, datafusion, doris, duckdb, gcs_file, local_file, minio_file, mssql, mysql, oracle, postgres, redshift, s3_file, snowflake, spark, trino.

Schema version vs layout version

schema_versionlayoutVersionCapabilities
11Legacy flat-file project format.
21Folder-per-entity project format.
32Adds the dialect field on models and views.

Use wren context upgrade to bump. The upgrade is dry-runnable with --dry-run.

target/
.wren/memory/

Source YAML and instructions.md are committed. target/mdl.json is derived; .wren/memory/ is binary LanceDB data — share confirmed NL-SQL pairs through queries.yml instead.

See also