Low Latency.

AI engine written in C++ from the ground up. No slow components, no bloat.

Modular.

Actor model inspired platform with zero-allocation, zero-copy message passing between modules.

Journaled.

Internal messages can be persisted to disk for auditability and analysis.

Calculating.

Lightweight template tensor library included, for AI and other calculations. CPU + CUDA support.

Transformational.

Transformer model implementation with support for real-time Perceiver setups, 4.25 bit weight quantization, LoRA, and loading Llama/Mistral LLMs. Optimized memory usage for inference, fine-tuning and online training.

Proven.

Successfully powers Neuryte LLLM.

Contact us!

Contact us at [email protected] for more information.