Template Engine Internals
llamadart reimplements the llama.cpp chat-template/render/parse stack in
Dart so routing and parser behavior stay consistent across native and web
targets.
Design goal
The template system prioritizes llama.cpp parity:
- format detection behavior
- handler routing logic
- tool-grammar attachment rules
- parse behavior for thinking and tool-call envelopes
End-to-end pipeline
Main components
1. Format detection
ChatTemplateEnginedetects format from template signatures.- Each detected format maps to a concrete
ChatTemplateHandler. - Handlers live under
lib/src/core/template/handlers/.
2. Template capabilities and routing
TemplateCapsestimates whether a template supports system role, tools, parallel tool calls, typed content, and thinking channels.JinjaAnalyzeraugments regex checks with AST analysis and probe rendering.- Routing workarounds mirror llama.cpp behavior for schema mode, tool-choice behavior, and system-message adaptation.
3. Render stage
- Handler
render(...)builds the final prompt and metadata payload. - Result includes:
- prompt text
- stop sequences
- optional grammar
- optional PEG parser payload
- preserved tokens and lazy grammar triggers
4. Parse stage
- During streaming, partial output is parsed incrementally for content/thinking deltas and tool-call envelopes.
- On completion, final parse produces stable tool-call structures and finish reason semantics.
- PEG-backed parse paths are used when parser payloads are present.
dinja integration
llamadart uses dinja, the Dart Jinja
runtime used as the execution layer for model-provided chat templates
(tokenizer.chat_template).
dinja was built in the llamadart ecosystem as a Dart port of the
llama.cpp-style minimal Jinja execution model, then used as the foundation of
the template engine in this package.
Inside llamadart, the jinja/ integration layer acts as the Dinja-plugin
surface: it wires llama.cpp-specific globals and capability analysis into
template execution.
Why this matters:
- no Python runtime dependency in app environments
- on-device template rendering in pure Dart
- reusable lexer/parser access for capability analysis (
JinjaAnalyzer)
In practice, our template integration stack is:
dinjatemplate execution for render.llamadartrouting/parity logic around it.llamadartparser/grammar infrastructure for streamed output.
Practical debugging flow
- Call
engine.chatTemplate(...)to inspect prompt/format/stops. - Verify tool schema and grammar expectations before generation.
- Compare parsed output in partial vs final streaming stages.
- Re-test after model/runtime upgrades to catch routing shifts early.