Skip to main content

Dart + Flutter local inference runtime

Build offline-ready AI features with llamadart

Documentation for product engineers and maintainers shipping local LLM features across Android, iOS, macOS, Linux, Windows, and web.

  • Single Dart API across native and browser targets
  • GGUF model lifecycle and streaming-first generation
  • OpenAI-compatible local server example included
AndroidiOSmacOSLinuxWindowsWeb

Start here

Choose a path based on what you are shipping

Start in 10 minutes

Install, load a GGUF model, and stream your first response.

Open quickstart

Ship chat and tools

Build tool calling, structured chat prompts, and streaming UX.

Read guides

Tune for production

Choose backends and tune context/runtime parameters.

Tune runtime

Run OpenAI-style server

Expose local models over HTTP for existing OpenAI clients.

See server example

Core guides

Reference docs for real production workflows

Model lifecycle

Predictable loading/unloading flow and resource cleanup patterns.

Lifecycle guide

Generation and streaming

Token streaming patterns for CLI apps, servers, and Flutter UIs.

Streaming guide

Multimodal

Image + text prompting with platform-specific constraints.

Multimodal guide

Platform matrix

Understand native/web support boundaries before shipping.

Support matrix

Performance tuning

Tune context length, threads, and generation settings safely.

Tuning guide

Troubleshooting

Fast fixes for model loading, runtime, and platform issues.

Debug issues

Maintainers

llamadart-specific maintenance and release operations

Maintainer overview

Repository ownership map and routine responsibilities.

Maintainer docs

Release checklist

Versioning, docs cut, and post-release verification sequence.

Release workflow