Skip to main content

Introduction

llamadart is a Dart and Flutter plugin for running llama.cpp models with GGUF files across native and web targets.

Who this is for

App developers building local-first AI features in Dart/Flutter.
Teams that need OpenAI-style HTTP compatibility from local models.
Maintainers who need predictable native/web runtime integration.

Core primitives

LlamaEngine: stateless generation API.
ChatSession: stateful chat wrapper over LlamaEngine.
LlamaBackend: platform backend abstraction used by the engine.

Read by workflow

First setup: Installation
First inference: Quickstart
Multi-turn chat: First Chat Session
Function calling: Tool Calling
Template diagnostics: Chat Templates and Parsing
Template internals: Template Engine Internals
LoRA runtime workflows: LoRA Adapters
Performance work: Performance Tuning
Platform/backend planning: Platform & Backend Matrix
Upgrade planning: Upgrade Checklist
Maintainer operations: Maintainer Overview

Who this is for
Core primitives
Read by workflow