WOQL Control Flow: Backtracking, Scope, and Performance

Open inAnthropic

WOQL queries look declarative — you describe what you want and the engine figures out the rest. But underneath, the engine makes choices about how to find answers, and those choices follow specific rules. Understanding these rules is the difference between a query that streams results in milliseconds and one that materializes an entire dataset into memory before returning anything.

This page explains the execution model from the ground up: how solutions are generated, how variables move between inner and outer scopes, and how to structure queries so the engine can do its job efficiently.

Prerequisites: This page assumes you've completed the WOQL Tutorial and understand basic concepts like and, or, triple, and eq. If terms like "binding" and "variable" are unfamiliar, start there.

See also: WOQL Explanation | WOQL Tutorial | Range Queries with triple_slice | Time Handling | WOQL Class Reference


The Execution Model: Solutions and Backtracking

What is a solution?

A WOQL query produces a table of results. Each row is a solution — a complete assignment of values to variables that satisfies every constraint in the query. The engine's job is to find all such solutions.

Example: JavaScript
// This query has exactly 1 solution:
eq("v:x", 42)
// Result: x = 42

// This query has 0 solutions:
and(eq("v:x", 42), eq("v:x", 99))
// No value of x satisfies both constraints simultaneously

How backtracking works

The engine evaluates a query left to right. When it reaches a predicate that can produce multiple results — like triple, or, or member — it picks the first result and continues forward. If a later constraint fails, the engine backtracks to the most recent choice point and tries the next option.

Think of it as exploring a tree of possibilities:

Example: Text
and(
  triple(v.person, "city", v.city),     ← choice point: many people
  triple(v.person, "age", v.age),       ← for each person, look up age
  greater(v.age, 30)                    ← filter: does age > 30?
)

The engine:

  1. Picks the first person/city triple.
  2. Looks up that person's age.
  3. Checks if age > 30. If yes, emits a solution. If no, backtracks to step 1.
  4. Picks the next person/city triple. Repeats.

This is backtracking — the engine systematically explores all possibilities by going forward when constraints succeed and going backward when they fail.

Deterministic vs. non-deterministic predicates

Every WOQL predicate falls into one of two categories:

CategoryBehaviorExamples
Non-deterministic (generators)Can produce multiple solutions via backtrackingtriple, triple_slice, member, or, sequence, path
Deterministic (filters/transforms)Produce exactly one solution or faileq, greater, less, not, typecast, length, sum, group_by, limit, distinct

Non-deterministic predicates are choice points — they are where the engine branches. Deterministic predicates either pass or block a solution, but never multiply the number of solutions.

Understanding this distinction is crucial for reasoning about how many rows a query will return and where the computational work happens.


Variable Scope: Inner and Outer Queries

The default: all variables are globally scoped

In a flat WOQL query, all variables live in the same scope. A binding made anywhere is visible everywhere:

Example: JavaScript
and(
  eq("v:x", 42),
  eq("v:y", "v:x")   // v:x is already bound to 42, so v:y = 42
)

This is simple and intuitive for flat queries. It gets more interesting with sub-queries.

select — restricting which variables escape

select runs a sub-query but only exposes the listed variables to the outer query. Variables inside the sub-query that are not selected are invisible outside:

Example: JavaScript
and(
  select("v:name",
    and(
      triple("v:person", "name", "v:name"),
      triple("v:person", "age", "v:age")     // v:age is computed but not exposed
    )
  ),
  eq("v:age", "unknown")  // v:age is unbound here — select hid it
)

When to use select: When a sub-query binds helper variables that would pollute the outer scope or cause unintended unification. It is the primary tool for variable hygiene.

group_by — grouping with scoped aggregation

group_by runs a sub-query, groups the results by one or more variables, and collects the grouped values into lists. The sub-query's variables are scoped — only the grouping variables and the collected list are visible outside.

Example: JavaScript
// Count people per city
and(
  group_by(
    ["city"],                                          // group by these variables
    ["person"],                                        // collect these into lists
    "v:city_group",                                    // the collected list
    and(
      isa("v:person", "Person"),
      triple("v:person", "city", "v:city")
    )
  ),
  length("v:city_group", "v:count")
)

Key scoping behavior:

  • v:city is the grouping variable — it is bound in the outer scope (one value per group).
  • v:person is the template variable — its individual values are not visible outside, only the collected list is.
  • v:city_group is the result list — visible in the outer scope.
  • The inner query runs to exhaustion for each group.

Common pitfall: Using group_by with an empty grouping key just to collect all values into a list:

Example: JavaScript
// Verbose way to collect all ages into a list
group_by(
  [],
  ["age"],
  "v:all_ages",
  triple("v:person", "age", "v:age")
)

This works but is semantically misleading — you're not grouping by anything. The proposed Collect predicate (see below) makes this intent explicit.

Collect — gathering bindings into a list (proposed)

Collect is the inverse of Member. Where Member unpacks a list into individual bindings, Collect runs a sub-query to exhaustion and gathers all values of a template variable into a list.

Example: JavaScript
// Collect all ages into a list
WOQL.collect(
  WOQL.var("age"),
  WOQL.var("all_ages"),
  WOQL.triple(WOQL.var("person"), "age", WOQL.var("age"))
)
// Result: all_ages = [28, 35, 28, 42, 31]

Scoping behavior:

  • The query parameter is a self-contained sub-query. It runs independently.
  • The into variable receives the resulting list in the outer scope.
  • The template specifies what to collect from each solution (last to support fluent queries)
  • Variables in the sub-query that are not in the template are existentially quantified — they exist only to help generate solutions.

The key insight: Collect must control the backtracking. You cannot collect values from a variable that is already backtracking in the outer scope — the sub-query that generates the values must be inside the Collect. This is the same constraint as group_by and the findall Prolog predicate.

Example: JavaScript
// This does NOT collect outer backtracking into a list:
and(
  triple("v:person", "age", "v:age"),   // generates many bindings
  WOQL.collect("v:age", ???, "v:list")  // too late — v:age is already bound
)

// This DOES work — the generating query is inside Collect:
WOQL.collect(
  WOQL.var("age"),
  WOQL.triple(WOQL.var("person"), "age", WOQL.var("age")),
  WOQL.var("all_ages")
)

member — unpacking a list into bindings

Member is the inverse of Collect. Given a list, it generates one solution per element:

Example: JavaScript
and(
  eq("v:fruits", ["apple", "banana", "cherry"]),
  member("v:fruit", "v:fruits")
)
// Result: 3 rows — fruit = "apple", fruit = "banana", fruit = "cherry"

Member is non-deterministic — it creates a choice point with as many branches as there are list elements. Each branch produces one binding.

The Member/Collect symmetry:

PredicateDirectionBehavior
MemberList → BindingsNon-deterministic: generates one solution per element
CollectBindings → ListDeterministic: runs sub-query to exhaustion, produces one solution

distinct — deduplication within a scope

distinct eliminates duplicate values for a specified variable while preserving backtracking for everything else:

Example: JavaScript
distinct("v:city",
  triple("v:person", "city", "v:city")
)
// Returns each city once, even if multiple people live there

Scoping behavior: distinct observes the values of the specified variable as the inner query generates them, and suppresses solutions that would repeat a previously seen value. The inner query still runs normally — distinct is a filter on the output stream.

not — negation as failure

not succeeds when its inner query fails (produces zero solutions). It does not bind any variables:

Example: JavaScript
and(
  isa("v:person", "Person"),
  triple("v:person", "name", "v:name"),
  not(triple("v:person", "email", "v:email"))
)
// People without an email address

Critical scoping rule: Variables that appear only inside not are existentially quantified. not checks whether any solution exists — it does not expose which specific values matched. If you need to know which values were excluded, use a different pattern (e.g., opt + filter).

opt — optional matching

opt tries to match its inner query. If the inner query succeeds, its bindings flow to the outer scope. If it fails, the outer query continues with those variables unbound:

Example: JavaScript
and(
  triple("v:person", "name", "v:name"),
  opt(triple("v:person", "email", "v:email"))
)
// All people, with email if they have one

Scoping behavior: Unlike not, opt does expose inner variable bindings when the inner query succeeds. When it fails, those variables remain unbound (empty in the result).


Structuring Queries for Performance

Principle 1: Put the most selective constraint first

The engine evaluates left to right. Constraints that eliminate more possibilities early mean less backtracking overall:

Example: JavaScript
// Slow: scan all triples, then filter
and(
  triple("v:person", "name", "v:name"),
  triple("v:person", "city", "v:city"),
  triple("v:person", "age", "v:age"),
  greater("v:age", 60)
)

// Faster: if few people are over 60, this is easiest to reason about
// (But only if you can — triple doesn't support filtering and will materialize every triple for the non-grounded variable)

In practice, triple always scans all matching triples for a given predicate. The real optimization opportunity is choosing between triple and triple_slice.

Principle 2: Use triple_slice for range constraints

triple iterates every triple for a predicate. triple_slice binary-searches the sorted value dictionary and only touches values in the range:

Example: JavaScript
// O(n) — scans all timestamps, then filters
and(
  triple("v:doc", "timestamp", "v:time"),
  greater("v:time", literal("2025-01-01T00:00:00Z", "xsd:dateTime")),
  less("v:time", literal("2025-02-01T00:00:00Z", "xsd:dateTime"))
)

// O(log n) — binary search, only touches matching values
triple_slice("v:doc", "timestamp", "v:time",
  "2025-01-01T00:00:00Z", "2025-02-01T00:00:00Z")

Both return the same results. The performance difference grows with dataset size. For time-series data or large numeric ranges, triple_slice is almost always the right choice.

See the triple_slice guide for full details on binding modes, type inference, and worked examples.

Principle 3: Understand what streams and what materializes

WOQL predicates fall into two performance categories:

Streaming predicates yield one solution at a time, using constant memory regardless of how many total solutions exist:

  • triple / quad
  • triple_slice / quad_slice
  • member
  • sequence
  • path
  • and, or (composition)

Materializing predicates must collect all solutions from their sub-query before producing output. Memory usage scales with the number of inner solutions:

  • group_by — groups all solutions, then emits groups one at a time
  • Collect — collects all template values into a list in memory
  • distinct — tracks all seen values to detect duplicates
  • order_by — must see all solutions before it can sort
  • aggregate / count / sum / length (on query results)

The practical consequence: If a sub-query inside group_by or Collect produces millions of solutions, those millions are held in memory. For very large result sets, consider whether you can narrow the sub-query with limit, triple_slice, or more selective constraints.

Principle 4: Streaming responses — results flow as they complete

When you execute a WOQL query via the API, TerminusDB can stream results back as they are found. The client receives the first solution as soon as it is computed — it does not have to wait for all solutions to be found.

This means:

  • Streaming queries (built from triple, and, or, etc.) start returning results immediately. Time-to-first-result is fast even for queries with millions of solutions.
  • Materializing queries (containing group_by, order_by, etc.) must finish their inner computation before the first result streams out. Time-to-first-result depends on the inner query's total execution time.

Designing for responsiveness: If your application can process results incrementally (e.g., displaying a table row by row), prefer streaming query shapes. If you need aggregation, do it as late as possible in the query so the streaming portion runs first.

Principle 5: Leverage bound variables for index access

triple uses different internal access patterns depending on which arguments are bound, examples:

SubjectPredicateObjectAccess pattern
boundboundunboundDirect lookup — fast
boundunboundunboundIterate predicates for one subject
unboundboundunboundIterate subjects for one predicate
unboundunboundunboundFull scan — slowest

The engine is fastest when it can use bound arguments to narrow the search. A common pattern is to bind the subject first (via a previous triple or isa), then look up specific properties:

Example: JavaScript
// Good: v.person is bound by isa, then used to look up properties directly
and(
  isa("v:person", "Person"),
  triple("v:person", "name", "v:name"),
  triple("v:person", "age", "v:age")
)

When you use and, the shared variable v:person is bound by the first predicate and used as an index key by the subsequent ones. This is the natural join pattern — and it is efficient because each subsequent triple does a direct lookup rather than a scan.

Principle 6: Compose Collect with list operators

Rather than building custom aggregation logic, use Collect to bridge query results into lists, then apply existing list operators:

Example: JavaScript
// Min salary
and(
  WOQL.collect(
    WOQL.var("salary"),
    WOQL.triple(WOQL.var("emp"), "salary", WOQL.var("salary")),
    WOQL.var("salaries")
  ),
  WOQL.range_min(WOQL.var("salaries"), WOQL.var("lowest"))
)

// Count of results
and(
  WOQL.collect(
    WOQL.var("doc"),
    WOQL.isa(WOQL.var("doc"), "Person"),
    WOQL.var("docs")
  ),
  WOQL.length(WOQL.var("docs"), WOQL.var("count"))
)

// Window of results
and(
  WOQL.collect(
    WOQL.var("x"),
    WOQL.sequence(WOQL.var("x"), 0, 100),
    WOQL.var("all")
  ),
  WOQL.slice(WOQL.var("all"), WOQL.var("window"), 10, 20)
)

This pattern keeps the API surface small — one new predicate (Collect) unlocks all existing list operators (RangeMin, RangeMax, Slice, Length, Sum, ListToSet, SetDifference, SetIntersection, SetUnion) for use over query results.


Predicate Quick Reference: Control Flow Properties

PredicateDeterministic?Streams?Scopes variables?Description
andYesNo (shared scope)All constraints must hold
orNoYesNo (shared scope)Any branch can hold; one row per successful branch
tripleNoYesNoPattern match on subject-predicate-object
triple_sliceNoYesNoRange-constrained triple with O(log n) access
triple_slice_revNoYesNoRange-constrained triple with O(log n) access, reversed
triple_firstNoYesNoConvenience range function, it's a limit(1) of the slice
triple_lastNoYesNoConvenience range function, it's a limit(1) of the reverse slice
memberNoYesNoGenerates one binding per list element
sequenceNoYesNoGenerates values in a range
pathNoYesNoGraph traversal with regex-like path expressions
selectYesYes — hides non-selected variablesFilter which variables are exposed
distinctPartialNoDeduplicates; must track seen values
notYesYes — inner variables are existentialSucceeds when inner query fails
optYesPartial — bindings escape on successOptional match; continues on failure
group_byYesNo (materializes)Yes — template variables are scopedGroups results, collects into lists
collectYesNo (materializes)Yes — sub-query is self-containedGathers all template values into a list
order_byNo (materializes)NoSorts all solutions
limitYesNoCaps the number of solutions
eqYesNoUnifies two values
greater / lessYesNoStrict comparison filter

Patterns and Anti-Patterns

Pattern: Narrow early, expand late

Example: JavaScript
// Good: isa + triple_slice narrows to a small set, then join
and(
  isa("v:doc", "SensorReading"),
  triple_slice("v:doc", "timestamp", "v:time",
    "2025-01-01T00:00:00Z", "2025-01-02T00:00:00Z"),
  triple("v:doc", "temperature", "v:temp")
)

Anti-pattern: Collect everything, then filter

Example: JavaScript
// Bad: materializes all readings into memory, then filters client-side
WOQL.collect(
  WOQL.var("temp"),
  WOQL.var("all_temps")
  WOQL.triple(WOQL.var("doc"), "temperature", WOQL.var("temp")),
)
// Then filter all_temps in application code

Instead, filter inside the sub-query:

Example: JavaScript
// Better: only collect what you need
WOQL.collect(
  WOQL.var("temp"),
  WOQL.var("daily_temps"),
  WOQL.and(
    WOQL.triple_slice(WOQL.var("doc"), "timestamp", WOQL.var("time"),
      "2025-01-01T00:00:00Z", "2025-01-02T00:00:00Z"),
    WOQL.triple(WOQL.var("doc"), "temperature", WOQL.var("temp"))
  ),
)

Pattern: Use Member and Collect as inverses

Example: JavaScript
// Generate a list from a query
WOQL.collect(
  WOQL.var("city"),
  WOQL.var("all_cities"),
  WOQL.and(
    WOQL.isa(WOQL.var("person"), "Person"),
    WOQL.triple(WOQL.var("person"), "city", WOQL.var("city"))
  ),
)

// Later: iterate over that list
WOQL.member(WOQL.var("city"), WOQL.var("unique_cities"))

Pattern: Streaming joins for responsive UIs

When your UI can display results incrementally, prefer streaming query shapes:

Example: JavaScript
// Streams results immediately — first row arrives fast
and(
  isa("v:person", "Person"),
  triple("v:person", "name", "v:name"),
  triple("v:person", "age", "v:age")
)

Over materializing shapes:

Example: JavaScript
// Must compute all results before returning any
order_by("v:age", "asc",
  and(
    isa("v:person", "Person"),
    triple("v:person", "name", "v:name"),
    triple("v:person", "age", "v:age")
  )
)

Both return the same data. The first starts streaming immediately; the second waits until all results are sorted.


Summary

ConceptKey takeaway
BacktrackingThe engine explores all possibilities by going forward on success and backward on failure
Choice pointsNon-deterministic predicates (triple, member, or, sequence) create branches
Variable scopeVariables are global by default; select, group_by, Collect, and not create scoped boundaries
Streaming vs. materializingtriple, member, and stream; group_by, Collect, order_by materialize
PerformanceUse triple_slice for ranges, bind variables early, filter inside sub-queries
Member/Collect symmetryMember unpacks lists into bindings; Collect packs bindings into lists

Understanding these mechanics lets you write WOQL queries that are not just correct, but efficient — queries that leverage the engine's strengths rather than fighting against them.

Was this helpful?