Cube Analytics

While Ted Codds, 12 Rules of OLAP have stood the test of time, they have run into problems with the 13th item that was not mentioned: Accuracy, or more specifically point-in-time accuracy, and ability to recognize changes. Early OLAP data stores (three decades ago) achieved on-line performance by pre-calculating almost every conceivable aggregate value over-night (MOLAP) from reconciled batch extracts from operational data stores, but quickly ran into problems with Rule:12 (Unlimited dimensions and aggregation levels) as data grew exponentially. The next generation of OLAP data stores introduced Relational OLAP (ROLAP) where data is only calculated at the most detailed level and stored as Facts in a Star Schema, with references to Dimension or a Snowflake schema for hierarchies of Dimensions. The third generation added hybrid caching (HOLAP) of aggregated in addition to the star schema database. HOLAP has had mixed results because of the difficulty refreshing the HOLAP cache when the ROLAP store changes.


Hiperspace's Cube functionality addressed the 12 rules plus the 13th (Accuracy) by storing the Dimension/Fact data with the raw transactional information and uses Hiperspace History to capture point-in-time changes and streamed updates to aggregates. Hiperspace introduces a 14th rule history of changes to Facts and aggregates.

Rule Name Hiperspace feature
1 multi-dimensional @CubeDimension can be added to any entity
2 Transparency _Fact and _Cube elements created for each @CubeFact entity
3 Accessibility Cubes/Facts are accessible using SQL queries or ODATA for BI
4 Performance Calculated Cubes/Facts are stored in Hiperspace for fast access
5 Remote Accessible via Hiperspace.DB
6 Generic There are no special dimensions
7 Sparse Data Facts are calculated at the lowest level and synchronized, Cube summaries are updated when used
8 Multi-User Via Hiperspace.DB
9 Unrestricted Any SQL operation can be used, querying Cube and other elements
10 Intuitive ODATA support via Hiperspace.DB provides Excel, PowerBI, Tableau live access
11 Flexible reporting Navigational access via C#/F# in browser, desktop server, or parquet SQL extract
12 Drilldown @CubeHierarchy provides unlimited level of aggregation for any dimension drilling down to other dimensions
13 Accuracy Hiperspace.DB aggregates whenever data changes
14 History Every Cube/Fact retains history until pruned

Enhanced OLAP

Rules {13,14} are difficult to provide using traditional OLAP in a different era when durable storage was extremely expensive and computer memory limited. The challenge of aggregating a cube was enough without considering history.

Hiperspace achieves history by using the cheapest tier of raw SSD storage for all data, using RocksDB for access to Exabytes of data (with transparent partitioned storage) using google protobuf encoding and Snappy compression.

History is the secret sauce that enables Hiperspace.DB to stream updates to Facts and Cubes from changes to raw data, without slowing down overall performance. When a source fact changes, a delta is calculated by projecting the before-image and after-image as an OLAP Fact to determine what change needs to be made without a full aggregation. The History of Fact and Cube enable changes to be made without locking all aggregates for the duration of the update.

Unrestricted is highlighted above because Hiperspace supports two additional aggregations that are not possible using most OLAP stores: Standard Deviation and Percentile

Name Note
Standard Deviation At every level of aggregation, every observation must be considered )
Percentile The 95% percentile is commonly used in risk aggregation for Expected Loss (PFE)

Example

The Cube trading example includes a simple case of database with a number of elements

Name Type Note
Contract @CubeFact base transactions
Account
Customer
Instrument
Price Contract Value is dependent on price
Country @CubeDimension
Product @CubeDimension Hierarchy of Products in Tiers
Portfolio @CubeDimension Hierarchical grouping of contract into books
Sector @CubeDimension Hierarchical grouping of Customer sectors

From the model above, Cube.Contact_Fact and Cube.Contract_Cube are created by the HiLang model compiler to hold the Fact level of aggregation (The Red/Yellow/Blue box in the cube above with a bold border) and Cube aggregates.

Aggregate _Fact and _Cube elements are created in Hiperspace for every element that has a @CubeFact property, with a Dimension reference for every path from a @CubeFact element to a @CubeDimension that could yield a single value:

  • Cube.Portfolio is a dimension of Cube.Contact_Fact and Cube.Contract_Cube for Cube.Contract -> Cube.Portfolio
  • Cube.Country is a dimension of Cube.Contact_Fact and Cube.Contract_Cube for Cube.Contract -> Cube.Instrument -> Cube.Country
  • Cube.Product is a dimension of Cube.Contact_Fact and Cube.Contract_Cube for Cube.Contract -> Cube.Instrument -> Cube.Product
  • Cube.Sector is a dimension of Cube.Contact_Fact and Cube.Contract_Cube for Cube.Contract -> Cube.Account -> Cube.Customer -> Cube.Sector

If Cube.Contract had the @CubeDimension property, there would be a direct equivilence between Cube.Contract and Cube.Contract_Fact, but without it each row in Cube.Contract_Fact can represent millions of Contracts.

Change triggering

The Cube.CubeSpace subspace contains meta-data for all the dependencies between elements to trigger changes:

  • Amending the quantity of a single contract triggers a calculation of the before and after delta for the cube, and an update to the Contact_Fact (without recalculating aggregates for the millions of other contract in the Fact), and all current aggregates in Contract_Cube that reference the Fact.
  • Restructuring the Portfolio Book hierarchy will trigger the re-calculation of all Contract_Cube elements that refer to the change
  • Changing an Instrument's product will change the Contract_Cube for Country and Product
  • Changing a Customer will change the Contract_Cube for Sector
  • Changing the Price of an instrument will change all Contract_Facts for those instruments

Change triggering is tracked by Hiperspace.DB and recalculated using its Calculation service, which also schedules periodic re-aggregation of Facts and cubes


Other considerations

If there are separate database views configured (using ContextLabel and Horizon filters) for different departments like {trading, operations, risk, treasury, compliance} they will each see only the Facts and Cubes that are visible from their Horizon.

If the model includes Graph views for Node and Edge, Cube.Contract_Cube will also appear in Graph results (e.g. using HiperspaceDB's GPU-optimised Graph Search) with the EdgeType "Cube.Drilldown" from Node to Cube and "Cube.Dimension" from Cube to Node

Copyright © Cepheis 2024