Lightweight Modular Transformer Engine.
Developed from scratch in-house.
Low Latency.
AI engine written in C++ from the ground up. No slow components, no bloat.
Modular.
Actor model inspired platform with zero-allocation, zero-copy message passing between modules.
Journaled.
Internal messages can be persisted to disk for auditability and analysis.
Calculating.
Lightweight template tensor library included, for AI and other calculations. CPU + CUDA support.
Transformational.
Transformer model implementation with support for real-time Perceiver setups, 4.25 bit weight quantization, LoRA, and loading Llama/Mistral LLMs. Optimized memory usage for inference, fine-tuning and online training.
Proven.
Successfully powers Neuryte LLLM.