Neuryte AI | Transformer Engine

Low Latency.

AI engine written in C++ from the ground up. No slow components, no bloat.

Actor model inspired platform with zero-allocation, zero-copy message passing between modules.

Internal messages can be persisted to disk for auditability and analysis.

Lightweight template tensor library included, for AI and other calculations. CPU + CUDA support.

Transformer model implementation with support for real-time Perceiver setups, 4.25 bit weight quantization, LoRA, and loading Llama/Mistral LLMs. Optimized memory usage for inference, fine-tuning and online training.

Successfully powers Neuryte LLLM.