Creating Workflows

Creating a workflow involves setting up queries, choosing data sources, and organizing the execution flow.

Creating a new workflow

Navigate to the Data Workflows section
Click the create button
Give your workflow a name and optional description
Select the data source(s) you want to query

Adding queries

A workflow starts with one query, but you can add more to build multi-step pipelines. Each query needs:

A name - a descriptive identifier (e.g., raw_transactions, aggregated_by_region)
SQL code - the actual query, written in cells
An execution order - determines which queries run first

Query names are important because dashboards reference specific queries by name when binding data to widgets.

Queries execute in the order you define. Earlier queries complete before later ones begin. This lets you structure dependent operations - for example, a summary query that depends on a detail query having already run.

Referencing other queries

A key feature of multi-query workflows is that later queries can reference the output of earlier queries. You do this by using the query name as a table name in your SQL:

-- Query 1: "raw_transactions" (execution order 0)
SELECT customer_id, amount, transaction_date
FROM sales.transactions
WHERE transaction_date >= {{start_date}}

-- Query 2: "customer_totals" (execution order 1)
SELECT customer_id, SUM(amount) as total_spend, COUNT(*) as transaction_count
FROM raw_transactions
GROUP BY customer_id

-- Query 3: "top_customers" (execution order 2)
SELECT t.customer_id, t.total_spend, c.name, c.region
FROM customer_totals t
JOIN customers.dim_customers c ON t.customer_id = c.id
WHERE t.total_spend > {{min_spend}}
ORDER BY t.total_spend DESC

In this example, customer_totals references raw_transactions by name, and top_customers references customer_totals. Each query builds on the results of the previous one.

Rules for cross-query references

Use the query name, not the cell name - if a query is named raw_transactions, reference it as FROM raw_transactions
No forward references - a query can only reference queries with a lower execution order (queries that run before it)
Only SELECT queries can be referenced - stored procedure calls cannot be used as a data source for other queries

Referencing data layers

When writing SQL in your queries, you reference data layers by their virtual name. How you reference a data layer depends on how it is organized:

In a folder - use the folder path and table name: sales.transactions, places.regions.dim_regions
Unfiled - use the source schema and table name: schema.table_name
Unfiled with no name collision - you can use just the table name: table_name

-- Referencing a data layer in the "sales" folder
SELECT customer_id, SUM(amount) as total_spend
FROM sales.transactions
GROUP BY customer_id

-- Referencing an unfiled data layer by just its table name
SELECT * FROM dim_customers

The workflow editor provides autocomplete for virtual names - as you type, it suggests available data layers and their columns. Virtual names are handled automatically when the query runs.

A workflow can reference data layers from multiple data sources, giving you flexibility to combine data across different databases.

Organizing with folders

Workflows can be organized into folders, and queries within a workflow can be grouped into query folders for complex workflows with many queries.

Validating queries

Before publishing, you can validate your queries with a dry run. This checks the SQL syntax and returns the result schema (column names and types) without executing the full query. Use this to catch errors early and verify that your queries produce the expected output structure.