Foundations
Adhoc reuses ideas from several mature areas of computer science. You can use it without ever naming them, but knowing which concept underpins which API makes the surface dramatically more predictable. This page is the short reference: the four families of ideas you will see again and again across the codebase, and where in the source they actually surface.
This page does not restate the API. For that, see Concepts and Lexicon. The point here is the why behind the design — the assumptions custom-measure authors must keep in mind, and the failure modes that arise when those assumptions don't hold for a particular use case.
OLAP basics
Online Analytical Processing. A class of read-only queries that compute aggregated indicators over multidimensional data. The mental model:
- Dimensions (a.k.a. columns, or in some literature axes): categorical
attributes used to slice the data — e.g.
country,city,asOf. In Adhoc these are represented byIAdhocColumn. - Coordinates: a value picked along a dimension —
country=FR,asOf=2026-05-04. - Slice: a tuple of coordinates, one per dimension in scope —
{country=FR, city=Paris}. Implemented in Adhoc asISlice, backed byIAdhocMap. See Slice and IAdhocMap. - Measure: a numeric (or domain-typed) indicator computed per slice — e.g.
revenue = SUM(amount). In Adhoc these implementIMeasure. - Cube: the conceptual multidimensional array indexed by slices, with one or more
measures at each cell. In Adhoc, an
ICubeWrapperpairs a table (the data) with a measure forest (the indicator definitions).
A query, in this model, is a (filter, groupBy, measures) triple: which slices of the cube
should be returned, grouped at what granularity, and which measure values to compute on each.
Why it matters for custom-measure authors
When writing a custom measure, the key decision is: for each output slice, what do I need from underlying measures? The answer falls in one of three categories:
- Same slice, transformed value. The output value at slice S is some function of the
underlying value(s) at slice S. This is the
Combinatorshape — see ICombination. - Different slice(s), values composed. The output value at slice S depends on values
at other slices — typically S with a coordinate edited (
Shiftor), or a wider slice that S is a part of (Unfiltrator), or a narrower partition (Partitionor). - Routing/dispatch. The output value at slice S is the value of one of N underlyings at S, with the choice depending on S itself. See Routing Measure.
Each shape maps cleanly onto a built-in archetype or a custom measure. The wrong shape
costs you in getUnderlyingSteps() complexity.
DAGs and the measure tree
Adhoc evaluates queries as a directed acyclic graph (DAG) of CubeQuerySteps. Every
node is a (measure, filter, groupBy, options) tuple — the smallest unit of work the engine
can plan, cache, or skip. Edges go from a measure to its underlyings.
user query: pnl.routed @ groupBy=desk @ filter=date >= 2026-01-01
│
▼
┌────────────────┐
│ pnl.routed │ ◄─ root step (the user's measure)
└────────┬───────┘
│ getUnderlyingSteps()
┌──────────┴──────────┐
▼ ▼
┌────────────────┐ ┌────────────────┐
│ pnl.legacy │ │ pnl.new │ ◄─ underlying steps
│ filter=before │ │ filter=after │ (filters narrowed
└────────┬───────┘ └────────┬───────┘ by the routing logic)
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Aggregator │ │ Aggregator │ ◄─ leaves: actual table queries
└────────────────┘ └────────────────┘
The DAG is built in two passes — the Cube DAG (measure logic) and the Table DAG (database queries). See CubeQueryEngine for the wiring; for the purposes of this page, what matters is:
Why DAG, not tree
Two distinct measures can share the same underlying step. If pnl.fr_share = pnl.fr / pnl.total
and pnl.us_share = pnl.us / pnl.total, the pnl.total step is a single node referenced
twice — computed once, consumed by both. The "same (measure, filter, groupBy, options)"
identity rule is what makes this deduplication possible. Custom measures benefit from it
for free as long as their getUnderlyingSteps() returns canonical, equality-comparable
steps (the standard CubeQueryStep.edit(step).measure(...).build() builder does this).
Why acyclic
A measure that depended on itself would either loop forever or require a fixed-point
iteration, neither of which the engine offers. The forest's addMeasure rejects cycles at
registration time. If you need conditional self-reference (e.g. "use this measure unless we
already computed it for a parent slice"), the right pattern is Shiftor or a custom
measure that re-issues a step for a different slice — same measure, different step, no
cycle.
Why "step", not "measure"
The same logical measure (e.g. pnl.legacy) can appear at multiple steps with different
filters and groupBys. Caching, induced-query optimization, and ratio-style measures (which
re-issue an underlying with a different filter — see
RatioByCombinator)
all rely on this. The DAG node is the step, not the measure.
Why this matters for custom-measure authors
getUnderlyingSteps() is your declaration of edges in the DAG. Three rules:
- Build steps via
CubeQueryStep.edit(step), notCubeQueryStep.builder()from scratch. The former preserves identity for fields you don't change (filter, groupBy, options) so the engine can dedupe across measures. The latter would create a "different" step that happens to be equal — which fails identity-based caching. - Return a stable list per step. The same step, called twice, must produce the same underlying steps. Random ordering or non-determinism breaks the DAG's idempotence and makes cached results non-reproducible.
- Underlyings declared in
getUnderlyingNames()are a superset of whatgetUnderlyingSteps()may return. The planner uses the names list to build the DAG's shape; the steps list is consulted at evaluation time. If your runtime steps reference a measure not in the names list, the DAG has no node for it and the engine fails.
Boolean algebra of filters
ISliceFilter is a Boolean algebra over slices. A slice either matches a filter or
doesn't; filters compose with the standard operators:
| Adhoc construct | Boolean meaning | Identity element |
|---|---|---|
AndFilter |
conjunction (AND) | MATCH_ALL (empty AND) |
OrFilter |
disjunction (OR) | MATCH_NONE (empty OR) |
NotFilter |
negation (NOT) | — |
ColumnFilter |
atom: predicate on one column | — |
MATCH_ALL |
tautology (⊤) | unit of AND |
MATCH_NONE |
contradiction (⊥) | unit of OR |
The standard identities all hold: AND(x, MATCH_ALL) ≡ x, OR(x, MATCH_NONE) ≡ x,
AND(x, MATCH_NONE) ≡ MATCH_NONE, OR(x, MATCH_ALL) ≡ MATCH_ALL, NOT(NOT(x)) ≡ x,
De Morgan, etc. FilterBuilder.optimize() applies these rewrites and several others; do
not bypass it when constructing filters in custom measures.
Why this matters for custom-measure authors
Custom measures that manipulate filters (e.g. routing, partitioning, shifting) routinely construct ANDs of the user's filter with a measure-side filter:
ISliceFilter narrowed = FilterBuilder.and(step.getFilter(), measureSideFilter).optimize();
optimize() is where:
MATCH_NONEshort-circuits propagate up — e.g. if the user filtered tocountry=FRand the measure-side filter iscountry=US, the AND collapses toMATCH_NONE. Catch this before issuing an underlying step (seeRatioByCombinatorQueryStep's pattern of detecting the empty branch).- Equivalent filters become structurally equal, which lets the DAG dedupe steps that arose from different measure-construction paths.
Filters are also useful as values you can read inside routeFunction of a
RoutingMeasure: FilterHelpers.asMap(step.getFilter()) extracts an
AND-of-equalities into a Map<String, Object>, which is the simplest case to dispatch on.
For non-AND-of-equalities (ranges, OR, regex), you'll need to inspect the filter tree
directly via FilterHelpers.visit(filter, visitor).
Filters are not slices
A common confusion: a ColumnFilter carries a column name and a value matcher; an
ISlice carries column-to-coordinate bindings. They look similar — both attach values to
columns — but they answer different questions:
- A filter answers "does this slice belong to my set?"
- A slice answers "what are my coordinates?"
FilterHelpers.asMap exists because for the special case of AND-of-equalities, a filter is
shaped like a slice. Most of the time it isn't, and treating it as one in your custom
measure is the wrong abstraction — see Filtering for the full surface.
Linearity and why SUM is special
An aggregation f is linear when, given any partition of the rows into disjoint
groups A and B, f(A ∪ B) = f(A) + f(B) for some merging operator +. SUM and COUNT
are linear — the merge operator is itself addition. Most other useful aggregations are not.
| Aggregation | Linear? | Merge of disjoint partitions |
|---|---|---|
SUM |
yes | f(A∪B) = f(A) + f(B) |
COUNT |
yes | f(A∪B) = f(A) + f(B) |
AVG |
no | needs sum + count separately, then divide |
MIN / MAX |
yes (idempotent merge) | min(f(A), f(B)) / max(...) |
RANK / top-K |
no | needs the raw values from both partitions |
MEDIAN, PERCENTILE |
no | needs the full distribution |
STDDEV |
no (without extra state) | needs sum of squares + sum + count |
This is more than vocabulary. Several patterns in Adhoc — and several use cases users want to write — work cleanly only when the aggregation is linear:
- Decomposition into disjoint sub-queries. A custom measure that splits a query into
two filters whose union is the original (e.g.
country=FR ∨ country=US), runs them separately, and sums the results, is doing this for free with SUM. Trying the same trick with MAX of two disjoint partitions still works (because MAX is idempotent on the merge:max(max(A), max(B)) = max(A∪B)). Trying it with MEDIAN does not — there is no way to recover the median ofA∪Bfrom the medians ofAandBalone. - Caching at intermediate levels. A linear aggregation can be cached at one granularity and rolled up to a coarser one by re-applying the merge. Non-linear aggregations cannot — caching them is only valid for the exact granularity they were computed at.
- Routing across boundaries.
RoutingMeasureexplicitly does not support cross-boundary queries when the routing column isn't in the groupBy. The reason is precisely linearity: even if the underlying measure sums cleanly, the routing measure would have to combine values from two different physical aggregations — which is only safe when those aggregations are linear in the same merge sense.
Why this matters for custom-measure authors
When you design a custom measure, ask:
- "Can I express my output as a function of underlying measure values at the same slice?" → Combinator-shaped, no linearity question, see ICombination.
- "Do I need to combine values from different slices into one output?" → Linearity matters. If your underlying is SUM you're fine; if it's MAX you need to think about whether merge-by-MAX gives the right answer; if it's MEDIAN, refuse the operation loudly rather than producing a silently-wrong number.
- "Does my measure need to partition the data and aggregate the partitions?" → That's
Partitionor's job — see Partitionor. It works because the engine
delegates the per-partition aggregation to the same
IAggregationas the underlying, which preserves correctness as long as that aggregation is associative.
The single biggest correctness trap is to treat all aggregations as if they were SUM. SUM is associative, commutative, has an identity (0), distributes over partitions, caches at any granularity, and is its own merge operator. Almost no other aggregation has all these properties. Whenever you see code that combines values across slices or partitions in a custom measure, the question to ask is: "would this still be correct if the underlying were RANK?" If the answer is no, document the constraint.
See also
- Concepts — concrete architecture and query flow.
- Lexicon — Adhoc-specific terms and ActivePivot/Atoti translations.
- Filtering — full surface of
ISliceFilterandIValueMatcher. - Custom Aggregations — implementing
IAggregation(associativity is the only hard requirement). - Custom Measures — the abstract pattern guide.
- Routing Measure — concrete walkthrough that exercises every concept above.
- CubeQueryEngine — two-DAG workflow (Cube DAG of measures, Table DAG of database queries).