Low-Level vs High-Level API
llamadart exposes two API layers:
- High-level API (
LlamaEngine+ChatSession) for most application code. - Backend API (
LlamaBackend) for advanced runtime control.
High-Level API
Use this by default. It handles model lifecycle, template routing, streaming, and chat history management.
Key Components:
LlamaEngine: Loads/unloads models and runs stateless chat completions.ChatSession: Keeps message history for multi-turn conversation flows.
Advantages:
- Simplicity: Work with message/content objects instead of low-level backend calls.
- Template-aware: Uses model chat templates and parsing behavior automatically.
- Tool support: Works with structured tool-call outputs.
import 'package:llamadart/llamadart.dart';
Future<void> main() async {
final LlamaEngine engine = LlamaEngine(LlamaBackend());
try {
await engine.loadModel('model.gguf');
final ChatSession session = ChatSession(engine)
..systemPrompt = 'You are a concise assistant.';
await for (final LlamaCompletionChunk chunk in session.create([
LlamaTextContent('Hello! Give me one sentence about local inference.'),
])) {
final String? text = chunk.choices.first.delta.content;
if (text != null) {
print(text);
}
}
} finally {
await engine.dispose();
}
}
Low-Level API
LlamaBackend gives direct access to model/context handles and raw generation
streams.
Key Components:
LlamaBackend: Exposes explicit model/context creation and byte-stream generation.
Advantages:
- Granular control: Manage handles and pipeline steps directly.
- Integration flexibility: Useful for specialized runtime integrations.
import 'dart:convert';
import 'package:llamadart/llamadart.dart';
Future<void> main() async {
final LlamaBackend backend = LlamaBackend();
final ModelParams modelParams = const ModelParams();
final int modelHandle = await backend.modelLoad('model.gguf', modelParams);
final int contextHandle = await backend.contextCreate(
modelHandle,
modelParams,
);
try {
final Stream<String> textStream = backend
.generate(
contextHandle,
'Hello from low-level API',
const GenerationParams(),
)
.transform(const Utf8Decoder());
await for (final String text in textStream) {
print(text);
}
} finally {
await backend.contextFree(contextHandle);
await backend.modelFree(modelHandle);
await backend.dispose();
}
}
Which should you choose?
Start with the high-level API. Move down to LlamaBackend only when you need
explicit handle-level control that LlamaEngine/ChatSession do not provide.