WOQL queries look declarative — you describe what you want and the engine figures out the rest. But underneath, the engine makes choices about how to find answers, and those choices follow specific rules. Understanding these rules is the difference between a query that streams results in milliseconds and one that materializes an entire dataset into memory before returning anything.
This page explains the execution model from the ground up: how solutions are generated, how variables move between inner and outer scopes, and how to structure queries so the engine can do its job efficiently.
Prerequisites: This page assumes you've completed the WOQL Tutorial and understand basic concepts like
and,or,triple, andeq. If terms like "binding" and "variable" are unfamiliar, start there.
See also: WOQL Explanation | WOQL Tutorial | Range Queries with triple_slice | Time Handling | WOQL Class Reference
The Execution Model: Solutions and Backtracking
What is a solution?
A WOQL query produces a table of results. Each row is a solution — a complete assignment of values to variables that satisfies every constraint in the query. The engine's job is to find all such solutions.
// This query has exactly 1 solution:
eq("v:x", 42)
// Result: x = 42
// This query has 0 solutions:
and(eq("v:x", 42), eq("v:x", 99))
// No value of x satisfies both constraints simultaneouslyHow backtracking works
The engine evaluates a query left to right. When it reaches a predicate that can produce multiple results — like triple, or, or member — it picks the first result and continues forward. If a later constraint fails, the engine backtracks to the most recent choice point and tries the next option.
Think of it as exploring a tree of possibilities:
and(
triple(v.person, "city", v.city), ← choice point: many people
triple(v.person, "age", v.age), ← for each person, look up age
greater(v.age, 30) ← filter: does age > 30?
)The engine:
- Picks the first person/city triple.
- Looks up that person's age.
- Checks if age > 30. If yes, emits a solution. If no, backtracks to step 1.
- Picks the next person/city triple. Repeats.
This is backtracking — the engine systematically explores all possibilities by going forward when constraints succeed and going backward when they fail.
Deterministic vs. non-deterministic predicates
Every WOQL predicate falls into one of two categories:
| Category | Behavior | Examples |
|---|---|---|
| Non-deterministic (generators) | Can produce multiple solutions via backtracking | triple, triple_slice, member, or, sequence, path |
| Deterministic (filters/transforms) | Produce exactly one solution or fail | eq, greater, less, not, typecast, length, sum, group_by, limit, distinct |
Non-deterministic predicates are choice points — they are where the engine branches. Deterministic predicates either pass or block a solution, but never multiply the number of solutions.
Understanding this distinction is crucial for reasoning about how many rows a query will return and where the computational work happens.
Variable Scope: Inner and Outer Queries
The default: all variables are globally scoped
In a flat WOQL query, all variables live in the same scope. A binding made anywhere is visible everywhere:
and(
eq("v:x", 42),
eq("v:y", "v:x") // v:x is already bound to 42, so v:y = 42
)This is simple and intuitive for flat queries. It gets more interesting with sub-queries.
select — restricting which variables escape
select runs a sub-query but only exposes the listed variables to the outer query. Variables inside the sub-query that are not selected are invisible outside:
and(
select("v:name",
and(
triple("v:person", "name", "v:name"),
triple("v:person", "age", "v:age") // v:age is computed but not exposed
)
),
eq("v:age", "unknown") // v:age is unbound here — select hid it
)When to use select: When a sub-query binds helper variables that would pollute the outer scope or cause unintended unification. It is the primary tool for variable hygiene.
group_by — grouping with scoped aggregation
group_by runs a sub-query, groups the results by one or more variables, and collects the grouped values into lists. The sub-query's variables are scoped — only the grouping variables and the collected list are visible outside.
// Count people per city
and(
group_by(
["city"], // group by these variables
["person"], // collect these into lists
"v:city_group", // the collected list
and(
isa("v:person", "Person"),
triple("v:person", "city", "v:city")
)
),
length("v:city_group", "v:count")
)Key scoping behavior:
v:cityis the grouping variable — it is bound in the outer scope (one value per group).v:personis the template variable — its individual values are not visible outside, only the collected list is.v:city_groupis the result list — visible in the outer scope.- The inner query runs to exhaustion for each group.
Common pitfall: Using group_by with an empty grouping key just to collect all values into a list:
// Verbose way to collect all ages into a list
group_by(
[],
["age"],
"v:all_ages",
triple("v:person", "age", "v:age")
)This works but is semantically misleading — you're not grouping by anything. The proposed Collect predicate (see below) makes this intent explicit.
Collect — gathering bindings into a list (proposed)
Collect is the inverse of Member. Where Member unpacks a list into individual bindings, Collect runs a sub-query to exhaustion and gathers all values of a template variable into a list.
// Collect all ages into a list
WOQL.collect(
WOQL.var("age"),
WOQL.var("all_ages"),
WOQL.triple(WOQL.var("person"), "age", WOQL.var("age"))
)
// Result: all_ages = [28, 35, 28, 42, 31]Scoping behavior:
- The
queryparameter is a self-contained sub-query. It runs independently. - The
intovariable receives the resulting list in the outer scope. - The
templatespecifies what to collect from each solution (last to support fluent queries) - Variables in the sub-query that are not in the template are existentially quantified — they exist only to help generate solutions.
The key insight: Collect must control the backtracking. You cannot collect values from a variable that is already backtracking in the outer scope — the sub-query that generates the values must be inside the Collect. This is the same constraint as group_by and the findall Prolog predicate.
// This does NOT collect outer backtracking into a list:
and(
triple("v:person", "age", "v:age"), // generates many bindings
WOQL.collect("v:age", ???, "v:list") // too late — v:age is already bound
)
// This DOES work — the generating query is inside Collect:
WOQL.collect(
WOQL.var("age"),
WOQL.triple(WOQL.var("person"), "age", WOQL.var("age")),
WOQL.var("all_ages")
)member — unpacking a list into bindings
Member is the inverse of Collect. Given a list, it generates one solution per element:
and(
eq("v:fruits", ["apple", "banana", "cherry"]),
member("v:fruit", "v:fruits")
)
// Result: 3 rows — fruit = "apple", fruit = "banana", fruit = "cherry"Member is non-deterministic — it creates a choice point with as many branches as there are list elements. Each branch produces one binding.
The Member/Collect symmetry:
| Predicate | Direction | Behavior |
|---|---|---|
Member | List → Bindings | Non-deterministic: generates one solution per element |
Collect | Bindings → List | Deterministic: runs sub-query to exhaustion, produces one solution |
distinct — deduplication within a scope
distinct eliminates duplicate values for a specified variable while preserving backtracking for everything else:
distinct("v:city",
triple("v:person", "city", "v:city")
)
// Returns each city once, even if multiple people live thereScoping behavior: distinct observes the values of the specified variable as the inner query generates them, and suppresses solutions that would repeat a previously seen value. The inner query still runs normally — distinct is a filter on the output stream.
not — negation as failure
not succeeds when its inner query fails (produces zero solutions). It does not bind any variables:
and(
isa("v:person", "Person"),
triple("v:person", "name", "v:name"),
not(triple("v:person", "email", "v:email"))
)
// People without an email addressCritical scoping rule: Variables that appear only inside not are existentially quantified. not checks whether any solution exists — it does not expose which specific values matched. If you need to know which values were excluded, use a different pattern (e.g., opt + filter).
opt — optional matching
opt tries to match its inner query. If the inner query succeeds, its bindings flow to the outer scope. If it fails, the outer query continues with those variables unbound:
and(
triple("v:person", "name", "v:name"),
opt(triple("v:person", "email", "v:email"))
)
// All people, with email if they have oneScoping behavior: Unlike not, opt does expose inner variable bindings when the inner query succeeds. When it fails, those variables remain unbound (empty in the result).
Structuring Queries for Performance
Principle 1: Put the most selective constraint first
The engine evaluates left to right. Constraints that eliminate more possibilities early mean less backtracking overall:
// Slow: scan all triples, then filter
and(
triple("v:person", "name", "v:name"),
triple("v:person", "city", "v:city"),
triple("v:person", "age", "v:age"),
greater("v:age", 60)
)
// Faster: if few people are over 60, this is easiest to reason about
// (But only if you can — triple doesn't support filtering and will materialize every triple for the non-grounded variable)In practice, triple always scans all matching triples for a given predicate. The real optimization opportunity is choosing between triple and triple_slice.
Principle 2: Use triple_slice for range constraints
triple iterates every triple for a predicate. triple_slice binary-searches the sorted value dictionary and only touches values in the range:
// O(n) — scans all timestamps, then filters
and(
triple("v:doc", "timestamp", "v:time"),
greater("v:time", literal("2025-01-01T00:00:00Z", "xsd:dateTime")),
less("v:time", literal("2025-02-01T00:00:00Z", "xsd:dateTime"))
)
// O(log n) — binary search, only touches matching values
triple_slice("v:doc", "timestamp", "v:time",
"2025-01-01T00:00:00Z", "2025-02-01T00:00:00Z")Both return the same results. The performance difference grows with dataset size. For time-series data or large numeric ranges, triple_slice is almost always the right choice.
See the triple_slice guide for full details on binding modes, type inference, and worked examples.
Principle 3: Understand what streams and what materializes
WOQL predicates fall into two performance categories:
Streaming predicates yield one solution at a time, using constant memory regardless of how many total solutions exist:
triple/quadtriple_slice/quad_slicemembersequencepathand,or(composition)
Materializing predicates must collect all solutions from their sub-query before producing output. Memory usage scales with the number of inner solutions:
group_by— groups all solutions, then emits groups one at a timeCollect— collects all template values into a list in memorydistinct— tracks all seen values to detect duplicatesorder_by— must see all solutions before it can sortaggregate/count/sum/length(on query results)
The practical consequence: If a sub-query inside group_by or Collect produces millions of solutions, those millions are held in memory. For very large result sets, consider whether you can narrow the sub-query with limit, triple_slice, or more selective constraints.
Principle 4: Streaming responses — results flow as they complete
When you execute a WOQL query via the API, TerminusDB can stream results back as they are found. The client receives the first solution as soon as it is computed — it does not have to wait for all solutions to be found.
This means:
- Streaming queries (built from
triple,and,or, etc.) start returning results immediately. Time-to-first-result is fast even for queries with millions of solutions. - Materializing queries (containing
group_by,order_by, etc.) must finish their inner computation before the first result streams out. Time-to-first-result depends on the inner query's total execution time.
Designing for responsiveness: If your application can process results incrementally (e.g., displaying a table row by row), prefer streaming query shapes. If you need aggregation, do it as late as possible in the query so the streaming portion runs first.
Principle 5: Leverage bound variables for index access
triple uses different internal access patterns depending on which arguments are bound, examples:
| Subject | Predicate | Object | Access pattern |
|---|---|---|---|
| bound | bound | unbound | Direct lookup — fast |
| bound | unbound | unbound | Iterate predicates for one subject |
| unbound | bound | unbound | Iterate subjects for one predicate |
| unbound | unbound | unbound | Full scan — slowest |
The engine is fastest when it can use bound arguments to narrow the search. A common pattern is to bind the subject first (via a previous triple or isa), then look up specific properties:
// Good: v.person is bound by isa, then used to look up properties directly
and(
isa("v:person", "Person"),
triple("v:person", "name", "v:name"),
triple("v:person", "age", "v:age")
)When you use and, the shared variable v:person is bound by the first predicate and used as an index key by the subsequent ones. This is the natural join pattern — and it is efficient because each subsequent triple does a direct lookup rather than a scan.
Principle 6: Compose Collect with list operators
Rather than building custom aggregation logic, use Collect to bridge query results into lists, then apply existing list operators:
// Min salary
and(
WOQL.collect(
WOQL.var("salary"),
WOQL.triple(WOQL.var("emp"), "salary", WOQL.var("salary")),
WOQL.var("salaries")
),
WOQL.range_min(WOQL.var("salaries"), WOQL.var("lowest"))
)
// Count of results
and(
WOQL.collect(
WOQL.var("doc"),
WOQL.isa(WOQL.var("doc"), "Person"),
WOQL.var("docs")
),
WOQL.length(WOQL.var("docs"), WOQL.var("count"))
)
// Window of results
and(
WOQL.collect(
WOQL.var("x"),
WOQL.sequence(WOQL.var("x"), 0, 100),
WOQL.var("all")
),
WOQL.slice(WOQL.var("all"), WOQL.var("window"), 10, 20)
)This pattern keeps the API surface small — one new predicate (Collect) unlocks all existing list operators (RangeMin, RangeMax, Slice, Length, Sum, ListToSet, SetDifference, SetIntersection, SetUnion) for use over query results.
Predicate Quick Reference: Control Flow Properties
| Predicate | Deterministic? | Streams? | Scopes variables? | Description |
|---|---|---|---|---|
and | — | Yes | No (shared scope) | All constraints must hold |
or | No | Yes | No (shared scope) | Any branch can hold; one row per successful branch |
triple | No | Yes | No | Pattern match on subject-predicate-object |
triple_slice | No | Yes | No | Range-constrained triple with O(log n) access |
triple_slice_rev | No | Yes | No | Range-constrained triple with O(log n) access, reversed |
triple_first | No | Yes | No | Convenience range function, it's a limit(1) of the slice |
triple_last | No | Yes | No | Convenience range function, it's a limit(1) of the reverse slice |
member | No | Yes | No | Generates one binding per list element |
sequence | No | Yes | No | Generates values in a range |
path | No | Yes | No | Graph traversal with regex-like path expressions |
select | — | Yes | Yes — hides non-selected variables | Filter which variables are exposed |
distinct | — | Partial | No | Deduplicates; must track seen values |
not | Yes | — | Yes — inner variables are existential | Succeeds when inner query fails |
opt | — | Yes | Partial — bindings escape on success | Optional match; continues on failure |
group_by | Yes | No (materializes) | Yes — template variables are scoped | Groups results, collects into lists |
collect | Yes | No (materializes) | Yes — sub-query is self-contained | Gathers all template values into a list |
order_by | — | No (materializes) | No | Sorts all solutions |
limit | — | Yes | No | Caps the number of solutions |
eq | Yes | — | No | Unifies two values |
greater / less | Yes | — | No | Strict comparison filter |
Patterns and Anti-Patterns
Pattern: Narrow early, expand late
// Good: isa + triple_slice narrows to a small set, then join
and(
isa("v:doc", "SensorReading"),
triple_slice("v:doc", "timestamp", "v:time",
"2025-01-01T00:00:00Z", "2025-01-02T00:00:00Z"),
triple("v:doc", "temperature", "v:temp")
)Anti-pattern: Collect everything, then filter
// Bad: materializes all readings into memory, then filters client-side
WOQL.collect(
WOQL.var("temp"),
WOQL.var("all_temps")
WOQL.triple(WOQL.var("doc"), "temperature", WOQL.var("temp")),
)
// Then filter all_temps in application codeInstead, filter inside the sub-query:
// Better: only collect what you need
WOQL.collect(
WOQL.var("temp"),
WOQL.var("daily_temps"),
WOQL.and(
WOQL.triple_slice(WOQL.var("doc"), "timestamp", WOQL.var("time"),
"2025-01-01T00:00:00Z", "2025-01-02T00:00:00Z"),
WOQL.triple(WOQL.var("doc"), "temperature", WOQL.var("temp"))
),
)Pattern: Use Member and Collect as inverses
// Generate a list from a query
WOQL.collect(
WOQL.var("city"),
WOQL.var("all_cities"),
WOQL.and(
WOQL.isa(WOQL.var("person"), "Person"),
WOQL.triple(WOQL.var("person"), "city", WOQL.var("city"))
),
)
// Later: iterate over that list
WOQL.member(WOQL.var("city"), WOQL.var("unique_cities"))Pattern: Streaming joins for responsive UIs
When your UI can display results incrementally, prefer streaming query shapes:
// Streams results immediately — first row arrives fast
and(
isa("v:person", "Person"),
triple("v:person", "name", "v:name"),
triple("v:person", "age", "v:age")
)Over materializing shapes:
// Must compute all results before returning any
order_by("v:age", "asc",
and(
isa("v:person", "Person"),
triple("v:person", "name", "v:name"),
triple("v:person", "age", "v:age")
)
)Both return the same data. The first starts streaming immediately; the second waits until all results are sorted.
Summary
| Concept | Key takeaway |
|---|---|
| Backtracking | The engine explores all possibilities by going forward on success and backward on failure |
| Choice points | Non-deterministic predicates (triple, member, or, sequence) create branches |
| Variable scope | Variables are global by default; select, group_by, Collect, and not create scoped boundaries |
| Streaming vs. materializing | triple, member, and stream; group_by, Collect, order_by materialize |
| Performance | Use triple_slice for ranges, bind variables early, filter inside sub-queries |
| Member/Collect symmetry | Member unpacks lists into bindings; Collect packs bindings into lists |
Understanding these mechanics lets you write WOQL queries that are not just correct, but efficient — queries that leverage the engine's strengths rather than fighting against them.