Is useful whenever there is a query with a property constraint that is not full-text:
SELECT * FROM [nt:base] WHERE [jcr:uuid] = $id
To define a property index, you have to add an index definition node that:
It is recommended to index one property per index. (If multiple properties are indexed within one index, then the index contains all nodes that has either one of the properties, which can make the query less efficient, and can make the query pick the wrong index.)
Optionally you can specify:
Example:
{ NodeBuilder index = root.child("oak:index"); index.child("uuid") .setProperty("jcr:primaryType", "oak:QueryIndexDefinition", Type.NAME) .setProperty("type", "property") .setProperty("propertyNames", "jcr:uuid") .setProperty("declaringNodeTypes", "mix:referenceable") .setProperty("unique", true) .setProperty("reindex", true); }
or to simplify you can use one of the existing IndexUtils#createIndexDefinition helper methods:
{ NodeBuilder index = IndexUtils.getOrCreateOakIndex(root); IndexUtils.createIndexDefinition(index, "myProp", true, false, ImmutableList.of("myProp"), null); }
Usually, reindexing is only needed if the configuration of an index is changed, such that the index should contain more or different data. For example, reindexing is needed if the property to be indexed is changed, if a nodetype is added to declaringNodeTypes, or if includedPaths is changed. It is not strictly needed if less data is to be indexed, for example if a nodetype is removed. However, to save space, it might make sense to reindex even in that case. Typically, if a query does not return the expected result, reindexing does not help; more likely, the reason in somewhere else to be found, and disabling the index should be tried first.
Reindexing a property index happens synchronously by setting the reindex flag to true. This means that the first #save call will generate a full repository traversal with the purpose of building the index content and it might take a long time.
Asynchronous reindexing of a property index is available as of OAK-1456. The way this works is by pushing the property index updates to a background job and when the indexing process is done, the property definition will be switched back to a synchronous updates mode. To enable this async reindex behaviour you need to first set the reindex-async and reindex flags to true (call #save). You can verify the initial setup worked by refreshing the index definition node and looking for the async = async-reindex property. Next you need to start the dedicated background job via a jmx call to the PropertyIndexAsyncReindex#startPropertyIndexAsyncReindex MBean.
Example:
{ NodeBuilder index = root.child("oak:index"); index.child("property") .setProperty("reindex-async", true) .setProperty("reindex", true); }
When recovering a failed async reindex special care needs to be taken wrt. the created checkpoint and the async property. The checkpoint should be released via the CheckpointManager mbean, and the async property needs to be manually deleted while also setting the reindex flags to true to make sure the index returns to a consistent state, in sync with the head revision.
When running a query, the property index reports its estimated cost to the query engine, and then the query engine picks the index with the lowest cost (cost-based query optimization). The algorithm to calculate the estimated cost is roughly as follows (a bit simplified):
For example, for a query with path restriction “/content/products/t-shirts” and property restriction “color = ‘red’”, if there is an index for the property “color”, then the entry count approximation is read from the index. Let’s say it is 10’000 for this value. Then the approximate number of nodes in the subtree “/content/products/t-shirts” is read (let’s say it is 20’000), and the approximate number of nodes in the repository (let’s say it is 1 million). Therefore, the estimated number of entries is scaled down (divided by 50) from 10’000 to 200. The estimated cost is therefore 202, due to the overhead of 2.