Skip to main content

Quickstart

This quickstart uses the core LlamaEngine API.

Minimal generation example

import 'package:llamadart/llamadart.dart';

Future<void> main() async {
  final LlamaEngine engine = LlamaEngine(LlamaBackend());

  try {
    await engine.loadModel('path/to/model.gguf');

    await for (final String token in engine.generate(
      'Write one short sentence about local inference.',
    )) {
      print(token);
    }
  } finally {
    await engine.dispose();
  }
}

Stateless chat completions

For OpenAI-style message arrays, use engine.create(...):

final messages = [
  LlamaChatMessage.fromText(
    role: LlamaChatRole.user,
    text: 'Give me three bullet points about Dart.',
  ),
];

await for (final chunk in engine.create(messages)) {
  final text = chunk.choices.first.delta.content;
  if (text != null) {
    print(text);
  }
}

Next steps

Use First Chat Session for automatic history.
Tune Runtime Parameters.
Add tools with Tool Calling.

Minimal generation example
Stateless chat completions
Next steps