Skip to main content

MDL schema reference

This page documents every YAML artifact in a Wren project — wren_project.yml, models, relationships, views, cubes, and the knowledge/ files — with the full field surface for each.

For the conceptual framing of MDL, see What does MDL do for the agent?. For the project lifecycle commands, see Manage project. For the canonical YAML compilation flow, run wren context build after editing.

Project structure

my_project/
├── wren_project.yml # project metadata
├── models/
│ ├── orders/
│ │ └── metadata.yml # table_reference mode (physical table)
│ ├── customers/
│ │ └── metadata.yml
│ └── revenue_summary/
│ ├── metadata.yml # ref_sql mode (SQL-defined model)
│ └── ref_sql.sql # SQL in separate file (optional)
├── views/
│ ├── monthly_revenue/
│ │ ├── metadata.yml
│ │ └── sql.yml # statement in separate file (optional)
│ └── top_customers/
│ └── metadata.yml # statement inline
├── cubes/
│ └── revenue/
│ └── metadata.yml
├── relationships.yml # all relationships
├── knowledge/ # business context (schema_version 5+)
│ ├── rules/ # business rules for agents (supersedes instructions.md)
│ ├── glossary/ metrics/ caveats/
│ ├── sql/ # NL→SQL pairs — source of truth for memory
│ └── knowledge.yml # knowledge-axis schema_version (decoupled from MDL)
├── instructions.md # deprecated — move into knowledge/rules/ (still read)
├── queries.yml # legacy NL-SQL pairs — superseded by knowledge/sql/
├── .wren/ # runtime state (gitignored)
│ └── memory/ # derived LanceDB index (optional; rebuilt from knowledge/sql/)
└── target/
└── mdl.json # build output (gitignored)

wren_project.yml carries a schema_version; version 5 is the current layout. To upgrade an older project — and migrate instructions.md / memory into knowledge/ — see Migration.

YAML files use snake_case field names. The compiled target/mdl.json uses camelCase — the wire format expected by the engine.

wren_project.yml

schema_version: 5
name: my_project
version: "1.0"
catalog: wren
schema: public
data_source: postgres
profile: my-pg
FieldTypeRequiredDescription
schema_versionintyesProject layout version (current: 5 — adds first-class knowledge/). 2 = folder-per-entity, 3 = dialect support, 4 = composite primary keys, 5 = knowledge/. Owned by the CLI — bump with wren context upgrade (see Migration).
namestringyesProject identifier.
versionstringnoUser-defined project version (free-form, no parsing effect).
catalogstringnoWren AI namespace — not your database catalog. Defaults to wren.
schemastringnoWren AI namespace — not your database schema. Defaults to public.
data_sourcestringnoData source type (postgres, bigquery, snowflake, ...). Set by wren context set-profile.
profilestringnoThe bound connection profile name. Set by wren context set-profile.

Two levels of catalog and schema. The same field names appear in two places with completely different meanings. The project-level fields are Wren AI's internal namespace; the model-level table_reference.catalog and table_reference.schema point at the underlying database location.

LocationRefers toExample
wren_project.ymlcatalog, schemaWren AI namespacewren, public
models/*/metadata.ymltable_reference.catalog, table_reference.schemaDatabase locationjaffle_shop, main

Models (models/<name>/metadata.yml)

Each model is its own directory under models/. A model defines:

  • where its data comes from — table_reference or ref_sql
  • which columns are exposed
  • relationships and calculated fields

Model fields

FieldRequiredDescription
nameyesUnique model name (used in SQL queries).
table_referenceone of twoMaps to a physical table (catalog, schema, table).
ref_sqlone of twoA SQL SELECT used as the model's data source.
columnsyesList of columns to expose.
primary_keynoColumn name uniquely identifying a row; required for TO_MANY relationship traversals.
cachednoWhether query results should be cached. Defaults to false.
dialectnoSQL dialect of the model's ref_sql. Overrides project-level data_source for this model. Requires schema_version: 3.
propertiesnoArbitrary key-value metadata.

Using both table_reference and ref_sql in the same model is a validation error.

table_reference

FieldTypeRequiredDescription
catalogstringnoSource-side catalog (DuckDB database stem, BigQuery project, Snowflake database). Omit for sources without a catalog layer.
schemastringnoSource-side schema or dataset. Omit for flat sources.
tablestringyesSource-side table or view name.

Example: table_reference

name: customers
table_reference:
catalog: jaffle_shop
schema: main
table: customers
primary_key: customer_id
columns:
- name: customer_id
type: INTEGER
is_primary_key: true
not_null: true
- name: first_name
type: VARCHAR
- name: last_name
type: VARCHAR
- name: number_of_orders
type: BIGINT

Example: ref_sql

name: revenue_summary
ref_sql: |
SELECT DATE_TRUNC('month', order_date) AS month,
SUM(total) AS total_revenue
FROM orders
GROUP BY 1
columns:
- name: month
type: DATE
- name: total_revenue
type: DECIMAL

The SQL can live inline (above) or in a sibling ref_sql.sql file. The .sql file takes precedence.

Columns

FieldRequiredDescription
nameyesColumn name used in SQL.
typeyesSQL type (VARCHAR, INTEGER, DOUBLE, DATE, TIMESTAMP, BOOLEAN, DECIMAL, JSON, ...).
is_calculatednoIf true, the value is derived from expression at query time.
expressionnoSQL expression for calculated columns, or a single-column reference for simple renames.
relationshipnoName of a relationship — makes this a join handle column.
not_nullnoConstraint hint. Defaults to false.
is_primary_keynoMarks the column as the model's primary key.
is_hiddennoEngine-internal flag; column is stripped from the symbol table and invisible to all clients.
propertiesnoArbitrary metadata (e.g. properties.description).

Calculated columns

- name: total_with_tax
type: DOUBLE
is_calculated: true
expression: "amount * 1.1"

Relationship columns

A relationship column declares a join path to another model:

- name: customer
type: customers # the related model name
relationship: orders_customers

Then orders.customer.first_name is valid SQL — the engine resolves the join automatically.

Column rename via expression

When is_calculated is false but an expression is present, the expression must be a simple column reference. The engine resolves the physical column name from the expression and uses the model column name as the alias:

- name: customer_id # exposed name
type: INTEGER
expression: usr_id # physical column name

Selective exposure for column-level access control

A model does not have to expose every column in the underlying table. Omitted columns are physically invisible to clients — no SQL can reference them, and they do not appear in schema introspection.

This matters for AI agents: any column you omit from the model cannot be retrieved through Wren AI, regardless of what the agent asks.

Relationships (relationships.yml)

relationships:
- name: orders_customers
models:
- orders
- customers
join_type: MANY_TO_ONE
condition: orders.customer_id = customers.customer_id
FieldTypeRequiredDescription
namestringyesUnique relationship identifier.
modelsarray[string]yesExactly two model names [from, to].
join_typeenumyesONE_TO_ONE, ONE_TO_MANY, MANY_TO_ONE, or MANY_TO_MANY.
conditionstringyesSQL equality condition using model.column references on both sides.

The first model in models should appear on the left side of the condition. Only equality conditions are supported.

For TO_MANY relationships, calculated columns that traverse the relationship must use aggregate functions — the engine wraps the join in an aggregate subquery to prevent row multiplication.

Views (views/<name>/metadata.yml)

name: top_customers
statement: |
SELECT customer_id, SUM(total) AS lifetime_value
FROM wren.public.orders
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100
properties:
description: "Top customers by lifetime value"
FieldTypeRequiredDescription
namestringyesUnique view name.
statementstringyesComplete SQL SELECT. May reference other models or views.
dialectstringnoSQL dialect (metadata only — engine parses with its generic parser). Requires schema_version: 3.
propertiesnoArbitrary metadata.

The statement can live inline or in a sibling sql.yml file. The sql.yml takes precedence.

Views inherit no column declarations — schema is inferred from the statement at query time. Views can reference other views; the engine expands them recursively before resolving models.

Cubes (cubes/<name>/metadata.yml)

A cube is a pre-aggregated semantic object: a base model or view, plus declared measures, dimensions, time dimensions, and hierarchies.

name: revenue
base_object: orders
measures:
- name: total
expression: SUM(amount)
type: DOUBLE
- name: order_count
expression: COUNT(*)
type: BIGINT
dimensions:
- name: status
expression: status
type: VARCHAR
time_dimensions:
- name: order_date
expression: order_date
type: DATE
hierarchies:
- name: time
levels: [year, quarter, month]
FieldRequiredDescription
nameyesUnique cube name.
base_objectyesModel or view this cube aggregates over.
measures[]yesAggregated values (expression + type).
dimensions[]noCategorical group-bys.
time_dimensions[]noTime-based group-bys. Granularity is applied at query time via --time-dimension name:granularity (see CLI reference).
hierarchies[]noOrdered levels for drill-down (year → quarter → month).
refresh_timenoCache refresh interval.
propertiesnoArbitrary metadata.

Cubes are queried structurally via wren cube query, not by writing raw GROUP BY SQL. See Pre-aggregate with cubes for the agent-facing recipe.

Business rules (knowledge/rules/)

Free-form markdown with business and operational guidance for AI agents — one file per topic under knowledge/rules/. Each file (and ## heading within it) becomes a retrievable chunk in memory.

## Business rules
- Revenue queries must use `net_revenue`, not `gross_revenue`.
- All active-customer queries exclude rows where `is_internal = true`.

## Canonical tables
- Use `customers` for analytics, not `customers_v3` or `loyalty_v3`.

## Formatting
- Currency is USD; display with thousand separators and 2 decimals.
- Timestamps are stored in UTC.

Rules are consumed by agents, not by the engine — they are excluded from target/mdl.json. Agents access them via:

  • wren context instructions — full text, run once at session start
  • wren memory fetch -q "..." — relevant chunks per query

A top-level instructions.md is still read (alongside knowledge/rules/) but is deprecated — move it into knowledge/rules/. See Migration.

NL→SQL pairs (knowledge/sql/)

Confirmed natural-language-to-SQL pairs — one markdown file per pair under knowledge/sql/, the source of truth for memory recall. YAML frontmatter plus an optional body:

---
nl: monthly revenue by product category
sql: |
SELECT category, DATE_TRUNC('month', order_date) AS month, SUM(amount)
FROM orders
GROUP BY 1, 2
source: user
datasource: postgres-prod
---

wren memory store writes these files; wren memory index (re)builds the index from them. A legacy top-level queries.yml is still auto-loaded on index for the transition, but new pairs land in knowledge/sql/. See Migration.

Snake_case to camelCase mapping

wren context build converts YAML field names to camelCase in target/mdl.json:

YAMLJSON
table_referencetableReference
ref_sqlrefSql
is_calculatedisCalculated
not_nullnotNull
is_primary_keyisPrimaryKey
primary_keyprimaryKey
join_typejoinType
data_sourcedataSource
layout_versionlayoutVersion
refresh_timerefreshTime
base_objectbaseObject

Generic rule: split on _, capitalize each word after the first, join. All other fields (name, type, catalog, schema, table, condition, models, columns, cached, dialect, properties) are identical in both formats.

layoutVersion is stamped automatically by wren context build based on the project's schema_version — you do not set it manually in YAML.

Dialect override

Models and views support an optional dialect field declaring which SQL dialect their embedded SQL is written in. Requires schema_version: 3.

SettingBehavior
dialect omittedFalls back to project-level data_source. Default.
dialect setEngine uses the named dialect parser for this object's SQL.

Valid dialect values: athena, bigquery, canner, clickhouse, databricks, datafusion, doris, duckdb, gcs_file, local_file, minio_file, mssql, mysql, oracle, postgres, redshift, s3_file, snowflake, spark, trino.

Schema version vs layout version

schema_versionlayoutVersionCapabilities
11Legacy flat-file project format.
21Folder-per-entity project format.
32Adds the dialect field on models and views.
43Adds composite (list-form) primary keys.
53Adds the first-class knowledge/ base. Current.

Use wren context upgrade to bump. The upgrade is dry-runnable with --dry-run.

target/
.wren/memory/

Source YAML and the knowledge/ base (rules, glossary, metrics, caveats, and the NL→SQL pairs under knowledge/sql/) are committed. target/mdl.json is derived; .wren/memory/ is a binary LanceDB index rebuilt from knowledge/sql/ — commit the knowledge/sql/*.md pairs, not the index.

See also