Fork me on GitHub

Oak SQL-2 Query Grammar


Query

SELECT [ DISTINCT ] { * | { column [ , ... ] } }
FROM { selector [ join ... ] }
[ WHERE constraint ]
[ UNION [ ALL ] query ]
[ ORDER BY { ordering [ , ... ] } ]
[ queryOptions ]

All queries should have a path restriction (even if it’s just, for example, “/content”), as this allows to shrink indexes.

“distinct” ensures each row is only returned once.

“union” combines the result of this query with the results of another query, where “union all” does not remove duplicates.

“order by” may use an index. If there is no index for the given sort order, then the result is fully read in memory and sorted before returning the first row.

Examples:

select * from [sling:Folder] as a where [sling:resourceType] = 'x' and isdescendantnode(a, '/content')
select [jcr:path] from [oak:QueryIndexDefinition] as a where [type] = 'lucene' and isdescendantnode(a, '/') order by [reindexCount] desc
select [jcr:path], [jcr:score], * from [nt:base] as a where [type] = 'report' and isdescendantnode(a, '/etc') option(traversal fail)

Column

{ [ selectorName . ] { propertyName | * }
| EXCERPT([selectorName])
| rep:spellcheck()
} [ AS aliasName ]

It is recommended to enclose property names in square brackets.

Not listed above are “special” properties such as “[jcr:path]” (the path), “[jcr:score]” (the score), “[rep:suggest()]”.

Examples:

*
[jcr:path]
[jcr:score]
a.*
a.[sling:resourceType]

Selector

nodeTypeName [ AS selectorName ]

The nodetype name can be either a primary nodetype or a mixin nodetype. It is recommended to specify the nodetype name in square brackes.

Examples:

[sling:Folder] as a

Join

{ INNER | { LEFT | RIGHT } OUTER } JOIN rightSelector ON
{ selectorName . propertyName = joinSelectorName . joinPropertyName }
| { ISSAMENODE( selectorName , joinSelectorName [ , selectorPathName ] ) }
| { ISCHILDNODE( childSelectorName , parentSelectorName ) }
| { ISDESCENDANTNODE( descendantSelectorName , ancestorSelectorName ) }

An “inner join” only returns entries if nodes are found on both the left and right selector. A “left outer join” will return entries that don’t have matching nodes on the right selector. A “right outer join” will return entries that don’t have matching nodes on the left selector. For outer joins, all the properties of the selector that doesn’t have a matching node are null.

Examples:

All nodes below /oak:index that don’t have a child node:

select a.* from [oak:QueryIndexDefinition] as a 
  left outer join [nt:base] as b on ischildnode(b, a)
  where isdescendantnode(a, '/oak:index') 
  and b.[jcr:primaryType] is null 
  order by a.[jcr:path]

Constraint

andCondition [ { OR andCondition } [...] ]

“or” conditions of the form “[x]=1 or [x]=2” are automatically converted to “[x] in(1, 2)”, and can use the same an index.

“or” conditions of the form “[x]=1 or [y]=2” are more complicated. Oak will try two options: first, what is the expected cost to use a “union” query (one query with x=1, and a second query with y=2). If using “union” results in a lower estimated cost, then “union” is used. This can be the case, for example, if there are two distinct indexes, one on x, and another on y.


And Condition

condition [ { AND condition } [...] ]

A special case (not found in relational databases) is “and” conditions of the form “[x]=1 and [x]=2”. They will match nodes with multi-valued properties, where the property value contains both 1 and 2.


Condition

comparison
inComparison
| NOT constraint
| ( constraint )
| [ selectorName . ] propertyName IS [ NOT ] NULL
| CONTAINS( { { [ selectorName . ] propertyName } | { selectorName . * } } , staticOperand )
| { ISSAMENODE | ISCHILDNODE | ISDESCENDANTNODE } ( [ selectorName , ] pathString )
| SIMILAR ( [ selectorName . ] { propertyName | * } , staticOperand )
| NATIVE ( [ selectorName , ] language , staticOperand )
| SPELLCHECK ( [ selectorName , ] staticOperand )
| SUGGEST ( [ selectorName , ] staticOperand )

“not” conditions can not typically use an index.

“contains”: see Full-Text Queries.

“similar”: see Similarity Queries.

“native”: see Native Queries.

“spellcheck”: see Spellchecking.

“suggest”: see Suggestions.

Examples:

select [jcr:path] from [nt:base] where similar(*, '/test/a') 
select [jcr:path] from [nt:base] where native('solr', 'name:(Hello OR World)')
select [rep:suggest()] from [nt:base] where suggest('in ') and issamenode('/')
select [rep:spellcheck()] from [nt:base] as a where spellcheck('helo') and issamenode(a, '/')

Comparison

dynamicOperand { = | <> | < | <= | > | >= | LIKE } staticOperand

“like”: when comparing with LIKE, the wildcards characters are ‘‘ (any one character) and ’%‘ (any characters). An index is used, except if the operand starts with a wildcard. To search for the characters ’%‘ and ’’, the characters need to be escaped using ’' (backslash).

Comparison using <, >, >=, and <= can use an index if the property in the index is ordered.

Examples:

[name] like '%: 100 \%'

In Comparison

dynamicOperand IN ( staticOperand [, ...] )

Examples:

[status] in('active', 'inactive')

Static Operand

literal
| $ bindVariableName
| CAST ( literal AS type )

A string (text) literal starts and ends with a single quote. Two single quotes can be used to create a single quote inside a string.

Example:

'John''s car'
$uuid
cast('2020-12-01T20:00:00.000' as date)

Ordering

dynamicOperand [ ASC | DESC ]

Ordering by an indexed property will use that index if possible. If there is no index that can be used for the given sort order, then the result is fully read in memory and sorted there.

As a special case, sorting by “jcr:score” in descending order is ignored (removed from the list), as this is what the fulltext index does anyway (and if no fulltext index is used, then the score doesn’t apply). If for some reason you want to enforce sorting by “jcr:score”, then you can use the workaround to order by “LOWER([jcr:score]) DESC”.

Examples:

[lastName]
[price] desc

Dynamic Operand

[ selectorName . ] propertyName
| LENGTH( dynamicOperand )
| { NAME | LOCALNAME | SCORE } ( [ selectorName ] )
| { LOWER | UPPER } ( dynamicOperand )
| COALESCE ( dynamicOperand1, dynamicOperand2 )
| PROPERTY ( propertyName, type )

The selector name is only needed if the query contains multiple selectors.

“coalesce”: this returns the first operand if it is not null, and the second operand otherwise. @since Oak 1.8

“property”: This feature is rarely used. It allows to filter for all properties with a given type. Example: the condition property(*, Reference) = $uuid will search for any property of type Reference.

“lower”, “upper”, “length”: Indexes on functions are supported @since Oak 1.6, see OAK-3574.

Examples:

lower([firstName])
coalesce([lastName], name())
length(coalesce([lastName], name()))

Type


STRING
| BINARY
| DATE
| LONG
| DOUBLE
| DECIMAL
| BOOLEAN
| NAME
| PATH
| REFERENCE
| WEAKREFERENCE
| URI

This is the list of all JCR property types.


Options

OPTION( {
   TRAVERSAL { OK | WARN | FAIL | DEFAULT } |
   INDEX TAG tagName
} [ , ... ] )

“traversal”: by default, queries without index will log a warning, except if the configuration option QueryEngineSettings.failTraversal is changed The traversal option can be used to change the behavior of the given query: “ok” to not log a warning, “warn” to log a warning, “fail” to fail the query, and “default” to use the default setting.

“index tag”: by default, queries will use the index with the lowest expected cost (as in relational databases). To only consider some of the indexes, add tags (a multi-valued String property) to the index(es) of choice, and specify this tag in the query.

Examples:

option(traversal fail)

Explain Query

EXPLAIN [MEASURE] { query }

Does not run the query, but only computes and returns the query plan. With EXPLAIN MEASURE, the expected cost is calculated as well. In both cases, the query result will only have one column called ‘plan’, and one row that contains the plan.

Examples:

explain measure 
select * from [nt:base] where [jcr:uuid] = 1

Result:

plan = [nt:base] as [nt:base] 
/* property uuid = 1 where [nt:base].[jcr:uuid] = 1 */  
cost: { "nt:base": 2.0 } 

This means the property index named “uuid” is used for this query. The expected cost (roughly the number of uncached I/O operations) is 2.


Measure

MEASURE { query }

Runs the query, but instead of returning the result, returns the number of rows traversed. The query result has two columns, one called ‘selector’ and one called ‘scanCount’. The result has at least two rows, one that represents the total (selector set to ‘query’), and one per selector used in the query.

Examples:

measure 
select * from [nt:base] where [jcr:uuid] = 1

Result:

selector = query
scanCount = 0
selector = nt:base
scanCount = 0

In this case, the scanCount is zero because the query did not find any nodes.