The WOQL beginners guide
The thinking behind the Web Object Query Language is well explained on the WOQL explanation page. As mentioned there, it is a formal language for querying and updating TerminusDB databases. As a language is builds on a declarative datalog foundation, and is evaluated by binding variables as the abstract syntax tree is read.
This means that some logical behaviours depend on the order of binding the variables to values in the current instance or schema graph that is addressed.
A foundation to get started
This page provides examples of WOQL queries and explanations for how to use it in practice. In general we recommend using the functional style as if has been shown to be easier to reason about in general for practitioners. Senior WOQLers tend to use both based on elegance and the best way to express a particular problem.
In the tutorial we will use the Javascript/Typescript dialect of WOQL. The logic is the same, but the syntax for Python follows traditional Python patterns. For details, refer to the syntax of each.
All examples have been tested in the TerminusDB Logical Studio provided by DFRNT. The reader will be expected to have already built a few first classes and stored documents in TerminusDB before embarking on the examples in the tutorial.
Predicates, literals and variables in WOQL
WOQL as a language has three main parts, predicates, literals and variables.
Predicates, the "functions" of WOQL
Predicates such as triple()
, eq()
and read_document()
enable logical constraints to be put to match information stored in TerminusDB. The retrieved information can be seen as the full set of combinations, possibilities or solutions from the query that is sent.
Literals, the "values" of WOQL
Literals are values of a datatype. A variable can be bound to a value using eq("v:MyVariable", "John Doe")
to let the variable MyVariable
be bound to the literal John Doe
. Valid solutions are only those where the variable MyVariable
will be equal to the literal or value John Doe
.
Variables, the "glue" of WOQL
Variables can be specified in two ways in the Typescript and Python TerminusDB SDK, either they can be generated using the Vars
function as a native variable in the language to leverage syntax coloring and a pure approach. Or, they can be specified using the v:
prefix. In this tutorial we will use the v:
prefix as we will need less boilerplate code and it's easy to copy and paste.
Think of each "binding" with a set of bound variables that is returned from TerminusDB as a solution to the logical constraints that were put in the query. The logical constraints are not limited to reading information, updates, creations and deletions can also be made in the same query.
How WOQL sees the world
The world that one WOQL query operates on is a closed world as defined by the layer that the query is active on, usually the most recently defined layer, but it could also operate in read-only mode on a commit in the past.
All operations happen atomically in one go in the query and nothing is created, updated, or deleted in the world that the query operates on. Every action is recorded during the query and are applied in creating the next layer in the instance and schema graphs. This means that a query can reference many layers in the past, but can only create a new world that exists after the query completes.
This means a query can't operate on information that comes out of the current query. How the query logically sees the world is an important consideration when working with declarative logic in a datalog language like WOQL.
Reading documents and triples from the instance graph
Information stored in TerminusDB is stored as hierarchical documents with a defined shape, similar in structure to JSON-LD documents. Each document is made up of frames, sets of triples that are connected, that some call such frames field sets or FieldSets. Triples that are disconnected from documents are not allowed.
All triples that form documents exist in the instance graph. There are two graph types in TerminusDB, the instance graph and the schema graph. The instance graph contains the technical documents that are stored in the database as a graph of connected triples of a certain shape.
The minimal document, a single defined triple
Documents can be minimal. The smallest possible document to define consists of a single Resource Description Framework triple. It contains the @id
of the document as the subject, a field or property called rdf:type
as the predicate and as the object of the triple, the type of the document, such as @schema:Entity
.
The @schema:
prefix to Entity
is a prefix that is shared between schema and instance graphs of TerminusDB data products. It enables the properties of types to be looked up in the schema graph using quad()
that will be addressed separately. The schema graph contains the schema that defines the data model for all documents stored in the instance graph.
The schema enforces a structure, the shape of documents, that all triples connecting the information stored in the instance graph. Reading documents and triples from the instance graph is done using the read_document()
predicate.
Triples are the foundation for the WOQL language
Documents are the basic unit to exchange facts and hierarchical records. They can be read into WOQL variables. WOQL however mostly operates on triples once the TerminusDB engine has ensured that a new instance layer can be recorded for the future.
This means we need a bit of foundation for what a document is, and what triple is. A triple is in essence three glued part that mirrors the world or that stand on their own: a subject, a predicate and an object. The subject is identified by an @id
. The predicate indicates something about the subject, and the object is the value of the predicate.
Example of a minimal document of the Person
type:
{
"@id": "Person/JohnDoe",
"@type": "Person"
}
This is how the document is seen by WOQL in the triple store when the prefixes have been applied from the context.
Subject | Predicate | Object |
---|---|---|
Person/JohnDoe | rdf:type | @schema:Person |
This is the triple as it is actually stored on file.
Subject | Predicate | Object |
---|---|---|
terminusdb:///data/Person/JohnDoe | rdf:type | terminusdb:///schema#Person |
The first table shows how default prefixes in TerminusDB enable WOQL to make for concise queries without the full IRI. The second row shows the full URL that is used internally inthe instance graph if the context has not been altered.
If we would state that John Doe has an age, this is how it would look to WOQL. 34 is a literal value and could be an integer or one of the other datatypes in TerminusDB.
Subject | Predicate | Object |
---|---|---|
Person/JohnDoe | age | 34 |
We could also use the equivalent values below when quering, or even write out the full schema IRI, depending on our needs.
Subject | Predicate | Object |
---|---|---|
terminusdb:///data/Person/JohnDoe | @schema:age | 34 |
The default context for TerminusDB is shown below.
{
"@base": "terminusdb:///data/",
"@schema": "terminusdb:///schema#"
}
The reason this is important is that the foundation of TerminusDB rests with the Semantic Web, but with a closed world interpretation of the Resource Description Framework, RDF. This means that the context defines the data that is stored in TerminusDB, and that TerminusDB is authoritative for the information within it's world, it is expected to have correct answers reasoning about that world it knows about, a "closed world".
WOQL operates as a datalog semantic query language that enables declarative logic to be used for the semantic web and to traverse documents no matter their shape or how they are connected. The variables in WOQL use either shorthand IRIs, prefixed IRIs or the full IRI when resolving connected triples.
Let's continue the exporation by quering for documents.
Reading the minimal document as a triple
First, let's look at how documents are structured in TerminusDB. A document is assembled from triples by TerminusDB into a set of JSON-LD-like documents, without the context object, including @id
and @type
for each frame or field set. Each frame is represented as a JSON object.
Documents in TerminusDB are hierarchical, where concrete types that can exist on their own are called documents, and documents that form parts of a document are called subdocuments. They are nested framed objects in the structure of a document. What connects the "levels" of a hierarchical document are triples that connect one document to a subdocument.
Querying a document in TerminusDB
Querying a document in TerminusDB is done using the read_document("v:id", "v:document")
predicate. It takes two arguments, the first is the document id to read and the second is the variable to bind the read document to in the response. Both arguments are variables, and both will be returned in the bindings.
The whole nested document will be resolved into a complete object in the bindings WOQL result, with all the subdocuments assembled automatically. In a regular relational database, the client bears the responsibility to resolve the nested structure of documents using various kinds of joins as part of the query, where as the unfolding structure of documents can be specified elegantly in the TerminusDB schema.
To only get the v:read_document_variable
back, leverage the select()
predicate that filters the bindings to only include the variables that specified by the arguments to select()
. This is a common pattern when only interested in a specific part of a document.
Here we want just a list of documents, so we select the v:docs
variable to not get the data in the v:docId
variable back in our query bindings.
WOQL.select("v:docs").read_document("v:docId", "v:docs");
If the individual WOQL keywords have been added to the Javascript context, the query can be written as below in the fluent style, where the WOQL parts are joined together in a flow-like manner. The easiest way to think of it is that the next WOQL keyword is either the last argument or flows from the first predicate to the next.
select("v:docs").read_document("v:docId", "v:docs");
The query can also be written as below in the functional style that we will be using mainly. Note the use of v:
for specifying variables. Variables used in multiple places are bound by the how deterministic the predicate is and whether the variable becomes "grounded".
select("v:docs", read_document("v:docId", "v:docs"));
In this example, v:docs
was bound by the read_document()
predicate ("read into the variable"), and v:docId
is free floating. It should be noted that the first argument of read_document()
is not fully floating, it only matches the first document it finds, unless it is preceded by a predicate that is fully free floating, such as triple()
or quad()
that bind variables to content in the triple store.
To match all documents thus, the following query can be used to match the document equal to the @id
of Person/JohnDoe
, read_document it into the variable v:docs
, and filter out to only retrieve v:docs
in the bindings of the query.
select("v:docs").and(
eq("v:docId", "Person/JohnDoe")
triple("v:docId", "rdf:type", "@schema:Person"),
read_document("v:docId", "v:docs")
);
Changing the order of the eq, triple and read_document predicates can change performance characteristics and sometimes also the meaning of the query.
Using and, or and opt and not in queries
An important part of querying is to use boolean logic to filter out the data you are interested in. WOQL provides the and()
, or()
, opt()
, and not()
predicates to enable this.
The and() predicate
The and()
predicate is used to combine multiple predicates into a single predicate where all predicates must be true for its solutions to match and return. The use of and()
is straightforward and is a good way to structure the query into logical blocks that must hold true.
No row will be returned for the variable v:var
from this example:
and(
eq("v:var", "a"),
eq("v:var", "b")
)
One row will be returned for the variable v:var
from this example, where v:var_a
is grounded to "a" and v:var_b
is grounded to "b".
and(
eq("v:var_a", "a"),
eq("v:var_b", "b")
)
The or() predicate
The or()
predicate is used to combine multiple predicates into a single predicate where at least one predicate must be true for its solutions to match and return. The use of or()
can be a bit tricky and sometimes it is better to use the opt()
, optional, predicate.
The use of the or()
predicate drives the cardinality of the solutions, where each possible solution stands on its own, meaning that more than one solution to the query will be returned. It is often the case that the query author means to use that certain parts of the query are optional and that the variable can be left ungrounded if the predicate does not hold true or fill a variable with a value.
Two rows will be returned for the variable v:var
from this example:
or(
eq("v:var_a", "a"),
eq("v:var_b", "b")
)
One row where v:var_a
is grounded to "a" and one row where v:var_b
is grounded to "b". The values where v:var_a
and v:var_b
are ungrounded have no values.
The opt() predicate
The opt()
predicate means that the predicate is options, if possible, it will ground the variable, but if it is already grounded or does not hold true, the predicate will be skipped. An example is to fill up the ungrounded values of the or statement, which could be done as follows:
and(
or(
eq("v:var_a", "a"),
eq("v:var_b", "b")
),
opt().eq("v:var_a", "v:var_b")
)
Here we get one solution where both v:var_a
and v:var_b
are grounded to "a". And one solution where v:var_a
isand v:var_b
are grounded to "b". The opt()
predicate is optional, and applied where possible, i.e. filling the ungrounded value through equality. This is useful for handling optional values and to avoid "exploding" cardinality.
The not() predicate
The not()
predicate is used to negate a predicate, i.e. to make sure that something should not match. This can be used to find all document subjects, where we need to filter out lists and subdocuments:
select(
"v:subject",
"v:type",
and(
triple("v:subject", "rdf:type", "v:type"),
not(quad("v:type", "sys:subdocument", "v:select", "schema")),
not(eq("v:type", "rdf:List")),
),
)
Here we select subject and type for the resulting bindings. We match all subjects of documents, but there are certain special types of documents, such as Lists and Subdocuments, that are not top level documents and should thus be exluded from the result.
Further Reading
WOQL Explanation
WOQL Explanation for a more in-depth explanation of WOQL.
WOQL Reference
JavaScript and Python WOQL Reference guides
How-to guides
See the How-to Guides for further examples of using WOQL.
Documents
Documents in a knowledge graph and how to use them.