Chat Templates and Parsing

llamadart routes chat rendering/parsing through template handlers aligned to llama.cpp behavior.

Parity model

llamadart reimplements llama.cpp-style template detection, rendering, workarounds, grammar wiring, and parse behavior in Dart. This is why engine.create(...) and engine.chatTemplate(...) can keep consistent behavior across native and web backends.

Template rendering is powered by dinja, the Dart Jinja runtime used by llamadart for llama.cpp-compatible template execution.

For internals and pipeline details, see Template Engine Internals.

Core API

Use engine.chatTemplate(...) when you need:

prompt preview,
grammar and stop-sequence inspection,
format-aware rendering diagnostics.

final result = await engine.chatTemplate(
  messages,
  tools: tools,
  toolChoice: ToolChoice.auto,
  parallelToolCalls: false,
  customTemplate: null,
  chatTemplateKwargs: const {'use_builtin_tools': true},
);

print(result.prompt);
print(result.format);

Useful parameters

customTemplate: per-call template override.
chatTemplateKwargs: additional template globals.
templateNow: deterministic time injection for tests.
sourceLangCode / targetLangCode: TranslateGemma style metadata.
responseFormat: structured-output schema hints.

When to inspect template output

Inspect template output when debugging:

tool-call shape mismatches,
stop-sequence behavior,
model-specific reasoning/content boundaries,
template routing differences after upgrades.

Custom template overrides

For application code, the supported customization path is customTemplate on engine.chatTemplate(...).

import 'package:llamadart/llamadart.dart';

const String customTemplate = '''
{% for message in messages %}
{{ message['role'] }}: {{ message['content'] }}
{% endfor %}
Assistant:
''';

Future<void> main() async {
  final LlamaEngine engine = LlamaEngine(LlamaBackend());

  try {
    await engine.loadModel('model.gguf');
    final messages = [
      LlamaChatMessage.fromText(
        role: LlamaChatRole.user,
        text: 'Explain local inference in one sentence.',
      ),
    ];

    final rendered = await engine.chatTemplate(
      messages,
      customTemplate: customTemplate,
      addAssistant: true,
    );

    print(rendered.prompt);
    print(rendered.stopSequences);
  } finally {
    await engine.dispose();
  }
}

About custom handlers

ChatTemplateHandler is an internal extension point used by built-in format implementations.

There is currently no public API to register custom handlers globally from application code. If you need first-class support for a new template format, open an issue with a minimal reproducible template and sample outputs.

Parity model​

Core API​

Useful parameters​

When to inspect template output​

Custom template overrides​

About custom handlers​