Blisteringly fast Butterfly

GPGPU technology for Graph problems

Posted by steve on April 28, 2021

Graph theory is a mathematical topic that has become incredibly important in recent years to Information Systems. Graphs allows complex problems to be described as things (nodes, points, objects, vertices) that are related (edges, links, association) to other things. You can infer for example that if Bob is related to Mary is related to Jane then Bob is also related to Jane (Bob->Mary->Jane => Bob->[Mary;Jane]). The topic became mainstream when Google applied graph theory to the problem of a web search engine with PageRank and launched a billion dollar company.

Graph databases have become popular, in part because of the opportunity to replicate Google search within organisations, but also because hierarchical structures (e.g. Bill of Materials) are difficult to process using relational databases. Graph engines do not fail when relationships are cyclic (Bob->Mary->Jane->Bob will fail in SQL) and avoid joins to expand from one level to another.

Loading data into a Graph database is often a better solution when you need to query complex relationships.

What is a graph problem?

Mathematically every problem is a graph problem; all question can be expressed as a graph whether it is “Pizza restaurants in Napoli, open on a Sunday”, “transaction trails in Financial Crime” or “Pandemic contact tracking”. However, a Graph database is not always the best solution:

  • Graph databases are excellent for Web-search.

  • Graph databases are excellent for forensic financial crime analysis, but perform poorly if the need is real-time.

  • Graph databases are poor for pandemic contact tracking due to the quantity of data and rate of change.

For Financial Crime the most common approach is to cache all recent transactions in memory and use fast multi-processor computers to search for patterns in real-time.

Butterfly

Functionally graph problems are either:

  • Trivial (e.g. Bob->Mary->Jane): Don't need to be described as a graph to be understand.

  • Finite (e.g. Bill of Materials) : Can be processed in memory.

  • Unbounded (e.g. Contact tracing) : Cannot generally be processed in memory

Butterfly addresses finite graphs and uses modern commodity cloud servers, originally designed for AI/ML. General Purpose Graphical Processors Units (GPGPU) utilise thousands of processors to perform operations in parallel; cheap enough for mobile phones, powerful enough for climate simulation, and flexible enough for Artificial Intelligence

Butterfly uses an in-memory cache of edges, GPGPU kernel functions and ray-tracing techniques to trace every possible connection from a Source through every edge via a Path before filtering to the Target of interest.

The efficiency of the algorithms and performance of modern technology means it is quicker to scan all relationships in memory than to query a special purpose graph database or data-warehouse. Butterfly presents the result of the graph search as familiar tables that can be used by traditional tools like Power BI and Tableau

> Tests show significant savings in time, development and infrastructure.