Hiperspace

High Performance Memory

Posted by steve on July 03, 2023

Introduction

Hiperspace is an Object technology that uses a key-addressable store to expand an application data-model beyond the limits of memory that can be directly referenced in main memory. Elements are not duplicated or changing to match database shapes.

Elements are serialized directly using “Protocol Buffers” to and from key/value structure for storage in memory stores including CXL expanded and pooled memory, shared cache , local SSD or key-value durable databases. Elements that are not currently being used are released from main memory, and transparently (and quickly) reloaded when referenced. Memory stores allows exabytes of data to be addressed.

Hiperspace uses compile-time code generation to map domain Elements to key/value stores, with lazy loading of references, and Entity Framework consistent context for complex queries

Mutability

Modern high-performance systems avoid mutability for concurrency, but emulate the experience of mutability for applications. In memory “copy of right” to avoid partial changes, while databases write to new locations and update indexes once complete.

Hiperspace uses two techniques to provide the appearance of mutability, whilst retaining the lockless concurrency. Aspect for single properties added later, and Segment for multiple values added as needed.
Immutability avoids the need for complex database systems at runtime.

Use Cases

FRTB

Fundamental Review of the Trading Book is a challenging financial services regulation because it requires the historical retention of information for back testing of model changes. Back Testing provides banks and regulators with confidence that value at risk projections are consistent with observed risk once time movesd on. The standard approach is to w arehouse each daily datasets separately in a data lake and reuse as needed Hiperspace addresses the need using:

  • Schema evolution: historical data is never changed or copied, but can be extended or projected as views.
  • Version evolution: changes to prices and portfolios are not duplicated, but filtered as needed. If an illiquid price has not changed for a long time, it is not duplicated.
  • Horizon filters allow global as-at views to be seen without the need to copy data. Hiperspace is able to maintain multiple views of the same data because underlying storage is so fast, and access paths are direct. To examine a model 200-day with data from 200 days ago, a Horizon Filter can be applied to exclude and changed after that date. Subspace are used for side-by-side comparisons with difference horizons.

Durable Memory

Hiperspace uses the fasted available serialization technology to convert in-memory objects to key/value for memory drivers. Summary or simulation models that are computationally intensive to build can be stored in local hiperspace and re-used as needed, irrespective of size (when large CXL or SSD is used).

Time-Series Data

Hiperspace uses Low latency LSM stores such as RocksDB for storage, using performant and efficient serialization.

Document Database

Hiperspace stores arbitrarily complex objects from simple key/value facts to arbitrarily complex hierarchical documents that can run to many gigabytes in size for each object. Unlike conventional document databases, a document can be stored as:

  • A single large object
  • A table-of-contents document, with each section stored as a separate segment
  • A document with common tables and references stored once, with multiple references without duplication
  • A document with references to other documents stored without duplication

Where a conventional document database will update an entire document, Hiperspace can store only the parts that have changed. For a complex contract document whose price changes many times, the price can be marked as a version segment, with only the additional price being stored for each change.

Features

Constraints and Horizon Filtering

With Hiperspace rules like “a Customer name is mandatory” are not enforced as constraints when an Element is created, but using two complementary features:

  • Expression attributes are derived when an Element is bound to a Hiperspace
  • Horizon filters are applied before an element is retrieved from or added to a store Constrains are implemented using an expression attribute to highlight validation, and a horizon filter to ensure that only valid elements are added. The combination ensures only valid data are normally stored, but the Element structure can be used for error handling.

Schema Evolution

Hiperspace supports multiple versions of data schema through Views that allow the structure of an entity to appear to be updated without altering the data. Hiperspace support evolution in two ways:

  • Additional properties can be added as Aspect extents that are not known by older code and are effectively stored as addendum with new code
  • Views that present the new structure, but different implementations for different versions

Adding an additional Aspect, does not changed the stored structure of an entity because it is stored as an addendum - older code is unaware that the aspect is available

entity Instrument (…) {…} [RWA : Valuation];

If an Aspect needs a different implementation for different versions of the schema, views can be used

entity Instrument #1 (…) {…}; 

becomes

view Instument #2 {…, IPV};
entity InstrumentV1 = Instrument () #1 (…) {…} [IPV = retroPV(Price)];
entity InstrumentV2 = Instrument () #3 (…) {…} [IPV : Valuation];

Older code is unaware of the change, while newer code adds to InstrumentV2, reading Instrument yields all entities that present the view.

Version Evolution

Version evolutions uses segments to contain each version. Because versioning is such as common case, there is build in support for Versioned

entity Instrument (…) {…} [TermSheet : Versioned<ByDate<TermSheet>];

In this scenario TermSheet is a complex document that uses the Versioned<> segment (with a ByDate<> timestamp) and stores every version in a segment. The Value property always returns the latest, and Versions is a set of dated historical Termsheets. As-at historical views are possible using horizon filters

Architecture

HiperSpace

The core runtime Hiperspace component is very small (20k), containing the minimal classes and interfaces that are needed to connect a domain model to an underlying storage driver. Hiperspace includes a utility driver (GenerationSpace) to chain together multiple read drivers with a single write driver to allow different storage tiers to be used for historical data

Drivers

All drivers inherit from the HiperSpace abstract base class to insulate domain models from the underlying technology

Hiperspace.Rocks

RocksDB is a remarkable technology, originally developed by Google (LevelDB) and optimized by Facebook for absolutely lowest possible latency writing to SSD devices. RocksDB used Log-structured-Merge (LSM) to stream updates while maintaining fast key access. It is used both as a key/value database, and also as a driver for relational-databases, message-stores, blockchain and various analytical services. The use of LSM optimizes performance and life of SSD devices. Hiperspace.Rocks uses RockDB to store elements in durable SSD memory

Hiperspace.Heap

The Heap driver provides the simplest hiperspace, storing objects in the managed process heap, it exists for testing purposes, but also for benchmark performance of other drivers. The Heap driver uses more memory, and is slower than the Rocks driver.

Hiperspace.Redis

The Redis driver uses the shared in-memory caching technology provided as a service in Azure, AWS and GCP and other cloud platforms. For durable caching, Redis used RocksDB to optimize performance of SSD devices – for this reason Redis should only be used for transient elements.

Hiperspace.PIM

At the cutting edge of technology is “High Bandwidth Memory” and “Processing-in-Memory” that brings together

  • High Bandwidth Memory that stacks memory chips in a 3D structure to provide terabytes of memory rather than gigabytes
  • Compute Express Link (CXL) to connect HBM with CPU and GPU either in-server or in dedicated memory servers
  • Processing-in-Memory: a combination of CXL and HBM with a local GPGPU to pre-filter memory before returning to the host CPU/GUP. PIM uses a Key/Value interface to offload search to memory devices

PIM is intended for the huge quantities of data need for AI training, but can be applied to any data-processing problem that uses key/value access. While PIM is currently niche, it will provide a foundation for a future generation of high-performance databases. Hiperspace.PIM will provide is bridge from the high-level view of Hiperspace to emerging KV-SSD

Language

HiLang is a minimal high-level language to describe the schema of a domain, taking inspiration from protobuf (.proto models) for hierarchical structures and SQL DML for entities, relations and views.

Elements can have keys (…), values {…} and extensions […] of arbitrary complexity, an example shows the benefit of the language:

entity Customer = Node (SKey = Skey, Name = Name, NodeType = "Customer" ) #100
(
    Id      : int #1
)
{
    Name    : string  #2,
    Address : Address #3,
    SIC     : int     #4,
    valid   = Id = null || Name = null || Limit = null ? false : true #5
}
[
    Accounts    : Account (Owner = self)    #105
    Orders      : Order (Customer = self)   #106
];

In this example Customer can viewed as a Node, has an Id key, several data properties, a derived property for validation, and extension relationships to Account and Orders. Address is either stored with Customer, or as a key-reference (depending on declaration of Address). Hilang distinguishes between key and values to enable code generation of serialization, and lazy loading of referenced elements. Extensions appear as properties Customer with lookup code behind.

"#2" indicates the proto number used to serialize Name to and from the store.

Tools

HiLang

The Hilang language parser/validator/transform/generation is implemented in F# with Roslyn Source Generator integration for simple integration into development projects.

.NET developers (the primary target) can include hilang files within a standard C# project with a reference to the Hilang nuget package with any source errors highlighted directly within the development environment.

Compile-time generation enables older versions of code to reference to the stored data, while newer versions use updated (but compatible) schema.

hilangc

For targets other than .NET, hilangc is a command-line program to generates source code for {Java, C++, XMI, documentation}