Skip to main content

Quickstart

This quickstart uses the core LlamaEngine API.

Minimal generation example

import 'package:llamadart/llamadart.dart';

Future<void> main() async {
final LlamaEngine engine = LlamaEngine(LlamaBackend());

try {
await engine.loadModel('path/to/model.gguf');

await for (final String token in engine.generate(
'Write one short sentence about local inference.',
)) {
print(token);
}
} finally {
await engine.dispose();
}
}

Stateless chat completions

For OpenAI-style message arrays, use engine.create(...):

final messages = [
LlamaChatMessage.fromText(
role: LlamaChatRole.user,
text: 'Give me three bullet points about Dart.',
),
];

await for (final chunk in engine.create(messages)) {
final text = chunk.choices.first.delta.content;
if (text != null) {
print(text);
}
}

Next steps