Filtering
A ISliceFilter is a way to restrict the data to be considered on a per-column basis. The set of filters is quite small:
AndFilter: anANDboolean operation over underlyingsISliceFilter. If there is no underlying, this is a.matchAll.OrFilter: anORboolean operation over underlyingsISliceFilter. If there is no underlying, this is a.matchNone.NotFilter: an!orNOTboolean operation over underlyingISliceFilter.IColumnFilter: an operator over a specific column with givenIValueMatcher.
A IValueMatcher applies to any Object. The variety of IValueMatcher is quite large, and easily extendible:
EqualsMatcher: true if the input is equal to some pre-definedObject.NullMatcher: true if the input is null.LikeMatcher: true if the input.toStringrepresentation matching the registeredLIKEexpression. www.w3schools.comRegexMatcher: true if the input.toStringrepresentation matching the registeredregexexpression.- etc
Implementing custom operators
Adhoc provides a StandardOperatorFactory including generic operators (e.g. SUM).
- It can refer to custom operators by referring them by their
Class.getName()as key. - Your custom
IAggregation/ICombination/IDecomposition/ISliceFilterEditorshould then have: - Either an empty constructor
- Or a
Map<String, ?>single-parameter constructor.
One may also define a custom IOperatorsFactory:
- by extending it
- by creating your own
IOperatorsFactoryand combining withCompositeOperatorFactory - by adding a fallback strategy with
DummyOperatorFactory
About performance
Humans are generally happier when things goes faster. Adhoc enables split-second queries over the underlying table. Very large queries can be performed with limited resources (e.g. a JVM with a few GB of RAM) and may take seconds/minutes.
The limiting factor in term of performance is generally the under table, which executes aggregations at the granularity requested by Adhoc, induced by the User GROUP BY, and those implied by some formulas (e.g. a Partitionor by Currency).
Hence, we do not target absolute performance in Adhoc. In other words, we prefer things to remains slightly slower, as long as it enables this project to remains simpler, given a query is generally slow due to the underlying ITableWrapper.
Adhoc performances can be improved by:
- Scale horizontally: each Adhoc instance is stateless, and can operate a User-query independently of other shards. There is no plan to enable a single Adhoc query to be dispatched through a cluster on Adhoc instance, but it may be considered if some project would benefit from such a feature.
- Enable caching (e.g. CubeQueryStep caching).
Concurrency
Concurrency is enabled by default (StandardQueryOptions.CONCURRENT), but it can be disabled through StandardQueryOptions.SEQUENTIAL.
Non-concurrent queries are executed in the calling-thread (e.g. MoreExecutors.newDirectExecutorService()).
Concurrent queries are executed in Adhoc own Executors.newWorkStealingPool. It can be customized through AdhocUnsafe.adhocCommonPool.
Concurrent sections are:
- subQueries in a
CompositeCubesTableWrapper: each subCube may be queried concurrently. CubeQueryStepsin a DAG: independent tasks may be executed concurrently.- tableQueries induced by leaves
CubeQUerySteps: independent tableQueries may be executed concurrently.
If you encounter a case which performance would be much improved by multi-threading, please report its specificities through a new issue. A benchmark / unitTest demonstrating the case would be very helpful.
parallelism can be configured in AdhocUnsafe.parallelism or through -Dadhoc.parallelism=16.