Skip to main content

Best Practice

Before You Start Wren AI: Define Your “First 20 Questions.”

Best practice: don’t start by modeling everything. Start by modeling what people actually ask.

  1. Sit with your stakeholders and list:
    • 10–20 questions they ask every week (dashboards, reports, daily checks).
    • The core metrics they care about (MRR, AOV, LTV, churn, CAC, retention, etc.).
  2. Mark each question as:
    • Simple retrieval (counts, sums, simple group-by),
    • Business metric (has a specific formula),
    • Complex analysis (window functions, multi-table joins, intermediate steps).

You’ll use this list to decide:


Step 1 – Build a Semantic Layer That Speaks Your Business Language

Wren AI’s semantic layer is where you tell the AI what the data represents: entities, metrics, relationships, and business meaning.

1.1 Use business-first descriptions

When defining models and columns (manually or via Modeling AI Assistant  ):

  • Describe tables as real-world objects, not technical storage:
    • Instead of: “daily_report – production table”
    • Use: “Daily actual production status per machine per day.”
  • Call out purpose/role in the description:
    • “Target production volume provided by management” vs “daily actual production status” so the AI knows which table is “Goals” vs “Actuals”.

This helps Wren AI choose the right tables when users ask for targets, KPIs, or actuals.

1.2 Mark measures, dimensions, and entities clearly

From your schema:

  • Measures/metrics
    • Numeric fields that should be summed or averaged (e.g., run_minutes, revenue, quantity).
    • In descriptions, say “Total X” / “Amount of Y” / “Metric used for …”.
  • Dimensions
    • Dates, categories, and IDs used for grouping and filtering (e.g., state_date, region, plan_type).
    • Explicitly mention “Date for …” or “Category of …”.
  • Entity identifiers
    • Mark IDs as entity keys with descriptions like “Unique identifier for customer/employee/order”.
    • Wren AI can then infer COUNT(DISTINCT employee_id) for “headcount” instead of counting rows.

1.3 Define relationships, not just joins

Use the modeling UI / AI assistant to:

  • Declare one-to-one, one-to-many, and many-to-one relationships between models (e.g., “One organization → many projects”).
  • Make sure key joins (orders→order_items, customers→subscriptions, etc.) are correctly defined so users can ask “by customer”, “by product”, “by month” without thinking about join logic.

Anti-pattern: only connecting tables at the SQL level and hoping the LLM discovers joins. In Wren AI, you should encode the real relationships explicitly.

1.4 Use semantics to resolve jargon & ambiguity

Add column/table descriptions that map internal jargon to business meaning:

  • If you have multiple LTV definitions (raw_ltv, calculated_ltv), describe each precisely and note in which contexts they’re used.

Rule of thumb: if a junior PM would confuse two columns, add more semantic description.


Step 2 – Add Instructions as Your “Universal Rules.”

Instructions are reusable rules that shape SQL generation, chart behavior, and summaries. They can be:

  • Global – always applied to all queries.
  • Question-Matching – only applied when questions match certain patterns/keywords.

2.1 Start with these core Global Instructions

For a first project, we recommend you configure at least:

  1. Data quality/exclusion rules
    • Example (from docs):

      “Exclude orders with order_status IN ('canceled', 'unavailable') from any sales or revenue-related calculations.”

    • Similarly define:

      • statuses to ignore (test users, internal orders),
      • flags to exclude (fraud, sandbox).
  2. Date and time defaults
    • For example:
      • “When a question mentions ‘orders last week’, use order_created_at as the default date field.”
      • “If no date range is specified, default to the last 30 days.”
  3. Numeric & percentage formatting
    • “Round all monetary metrics to 2 decimal places.”
    • “Format rates as percentages with 1 decimal place (e.g., 12.5%).”
  4. Chart & visualization rules
    • Examples:
      • “When a question implies a time trend (trend, over time, monthly), default to a line chart with time on the X-axis.”
      • “For ‘top N’ questions, default to a horizontal bar chart, sorted descending.”
  5. Summary style guidelines
    • Example from docs:
      • Start summaries with: “Here is the analysis summary: …”
      • Always include the total first, and if comparing periods, include the % change.

Examples:

Instruction TypeConfiguration ContentWren AI Behavior
Default Date HandlingWhen a user mentions only 'orders', default to using the order_created_at field for time filtering, not shipped_date.When asked "What is the number of orders last week," the AI automatically uses order_created_at for the filter.
Decimal Precision StandardFor all monetary metrics (e.g., Revenue, AOV), always ROUND the result to 2 decimal places. For count metrics (e.g., Orders, Users), cast results as integers.Ensures clean reporting, displaying $1,250.50 instead of $1,250.4999 and preventing decimal values for headcounts.
Percentage FormattingWhen calculating rates or ratios (e.g., Conversion Rate), output the result as a percentage with 1 decimal place (e.g., ROUND(value * 100, 1)).Improves readability by displaying values like 12.5% rather than raw decimals like 0.12543.
Chart PaletteUse the following chart palette for all visualizations. Primary colors (in order): Brand Primary - #0000FF Brand Secondary - #FF0000 Accent 1 - #FFFF00 Accent 2 - #FF6600Automatically apply the expected chart palette instead of using system-default colors.
Ranking Visualization RuleFor ranking or comparison questions (e.g., "Top 5 products", "Sales by region"), default to a Bar Chart sorted in descending order.Helps users instantly identify top performers without needing to manually adjust chart settings.
Summary Style GuideWhen summarizing data, always explicitly mention the Total Value first. If comparing time periods, include the percentage change in brackets.Generates concise, executive-style summaries like: "Total revenue is $50k (+12% vs last month)."
Universal Exclusion RuleWhen querying any product-related data, automatically exclude records where category_id is '999' (internal test products).Ensures all product-related analyses only include production data.
Naming Ambiguity ResolutionIf a user mentions 'LTV', use the calculated_ltv column. If they mention 'life time value', use the raw_ltv column.Resolves situations where business terminology maps to multiple database fields.
Naming Ambiguity Resolution / Real-World use casesWhen the question involves product categories in general (e.g., grouping by product or listing categories), always include the column product_category_name_english in the SELECT clause. Only use product_category_name if the question specifically asks for the product category in PortugueseAddress situations where AI needs to follow updated data transformation results in real-world use cases

These immediately make answers feel “on-brand” and consistent across your team.

2.2 Use Question-Matching Instructions for tricky concepts

Use Question-Matching Instructions when:

  • A specific domain concept has a precise definition (late deliveries, churn, VIP customers, etc.).
  • You want all questions that mention certain phrases to use the same logic.

Example from docs:

For “late delivery” questions, calculate lateness as order_delivered_customer_date > order_estimated_delivery_date in the olist_orders_dataset.

Patterns you should consider:

  • “late delivery”, “delayed”, “on-time rate”
  • “churned customer”, “active customer”
  • “trial to paid conversion”
  • “retention”, “cohort”

Step 3 – Capture Gold-Standard Queries as Question-SQL Pairs

Question-SQL Pairs are your Key Case Handbook: they encode the exact SQL that should be used for high-stakes or complex questions.

3.1 When to create a Question-SQL Pair

Create a Pair when:

  • The metric is core to the business (MRR, AOV, LTV, churn, CAC, yield rate, utilization).
  • The SQL is non-trivial:
    • Multiple joins across many tables,
    • Window functions,
    • Complex filters/exclusions,
    • Weighted averages over time.
  • Your existing BI tool already has an approved query for it.

Examples:

Operation MethodNatural Language Question (Q)Gold Standard SQL (S) - Example LogicApplicable Scenario (Purpose)
Save to Knowledge - After a Successful Query"List the top 5 machines with the lowest efficiency."RANK() OVER (ORDER BY utilization_rate ASC) ... LIMIT 5Stores a standard ranking pattern to consistently identify under performers without manual sorting configuration.
Manual Addition for Complex Business Logic"Show the difference between actual production and target goals by month."SELECT COALESCE(a.val, 0) - COALESCE(t.val, 0)FROM actual a FULL OUTER JOIN target t ON ...Handles complex, multi-step logic involving data merging (Full Joins) to ensure dates missing in one dataset (e.g., no production but has target) are still calculated correctly.
Manual Addition for Complex Business Logic"Which specific machine parts caused the most downtime based on the logs?"CASE WHEN POSITION('oven' IN note) > 0 THEN 'Oven' ELSE 'Other' ENDParses unstructured text data into structured categories, allowing the AI to answer analytical questions based on raw log notes.
Manual Addition for Complex Business Logic"Identify VIP customers whose revenue exceeded $100K for three consecutive months."(Complex SQL query involving Window Functions)Handles complex, multi-step logic that the AI cannot easily infer, ensuring the accuracy of advanced analysis.
Manual Addition / Optimization"What is the AOV (average order value)?"ROUND(SUM(price + freight_value) / COUNT(DISTINCT order_id), 2)Ensures the calculation logic for core metrics (e.g., AOV) remains consistent and accurate across the organization.
Manual Addition / Optimization"What is the overall Yield Rate for the last quarter?"ROUND(SUM(good_quantity) * 100.0 / SUM(total_quantity), 2)Ensures the calculation logic uses weighted averages for aggregated periods instead of averaging daily percentages, which prevents mathematical errors.
Manual Addition / Optimization"Calculate the Machine Utilization Rate for Plant A."ROUND(SUM(run_minutes) * 100.0 / SUM(total_minutes), 2)WHERE total_minutes > 0Standardizes efficiency metrics by aggregating total runtime vs. available time, preventing division by zero errors in the analysis.
Manual Addition / Optimization"What is the material breakage rate?"ROUND(SUM(abnormal_count) * 100000.0 / SUM(total_meter), 2)Enforces specific industry coefficients (per 100,000 units) to ensure the metric matches standard reporting formats.

3.2 How to build them efficiently

  1. Run a question in Wren AI, review the SQL, and if correct:
    • Click Save to Knowledge to store it as a Question-SQL Pair.
  2. Or copy existing SQL from your warehouse / BI tool and:
    • Go to Knowledge → Question-SQL Pairs, add a new item,
    • Paste the question and the “gold” SQL.

3.3 Best practices

  • Write questions in natural stakeholder language:
    • “What is our AOV?” instead of “AOV metric definition”.
  • Include variants if necessary:
    • “average order value”
    • “AOV”
    • “avg spend per order.”
  • Keep the SQL fully explicit:
    • Include all filters, joins, and aggregations,
    • Use rounding/formatting logic that matches your reporting.

Over time, this becomes a “training set” for your own data – but without needing to fine-tune models.


Step 4 – Iterate with Real Questions (Human-in-the-Loop)

Once semantics, initial instructions, and a few Question-SQL Pairs are in place:

  1. Use Wren AI’s Ask flow to ask realistic questions.
  2. For each answer:
    • Review the SQL breakdown.
    • If it’s correct and valuable → Save as a new Question-SQL Pair.
    • If it’s almost right but missing a rule → add or refine an Instruction.
  3. Pay attention to Suggested Instructions:
    • Wren AI may propose grouping, filtering, or aggregation rules based on your questions.
    • Decide whether to save, edit, or ignore them.

This feedback loop turns everyday usage into continuous context-building.


Step 5 – Maintain Your Context as the Schema Evolves

As your data changes:

  • Use schema change detection to see when tables/columns are removed, renamed, or type-changed.
  • After resolving schema changes, review:
    • Affected Semantics (descriptions, relationships),
    • Any Instructions that reference changed fields,
    • Question-SQL Pairs that might break.

Lightweight process to adopt:

  • Once per sprint:
    • Check schema change notifications.
    • Run a quick sanity set of your top 10 Questions (smoke test).
    • Update any broken Question-SQL Pairs.