Start in 10 minutes
Install, load a GGUF model, and stream your first response.
Open quickstartDart + Flutter local inference runtime
Documentation for product engineers and maintainers shipping local LLM features across Android, iOS, macOS, Linux, Windows, and web.
Start here
Install, load a GGUF model, and stream your first response.
Open quickstartBuild tool calling, structured chat prompts, and streaming UX.
Read guidesChoose backends and tune context/runtime parameters.
Tune runtimeExpose local models over HTTP for existing OpenAI clients.
See server exampleCore guides
Predictable loading/unloading flow and resource cleanup patterns.
Lifecycle guideToken streaming patterns for CLI apps, servers, and Flutter UIs.
Streaming guideImage + text prompting with platform-specific constraints.
Multimodal guideUnderstand native/web support boundaries before shipping.
Support matrixTune context length, threads, and generation settings safely.
Tuning guideFast fixes for model loading, runtime, and platform issues.
Debug issuesMaintainers
Repository ownership map and routine responsibilities.
Maintainer docsWhere to change native runtime, web bridge, and assets.
Ownership boundariesVersioning, docs cut, and post-release verification sequence.
Release workflow