LoRA Adapters
This guide covers practical LoRA usage in llamadart with runtime adapter
management APIs.
llamadart itself is an inference/runtime library. LoRA training is done in a
separate training workflow, then adapters are loaded at inference time.
Runtime API surface
LlamaEngine exposes three LoRA operations:
setLora(path, scale: ...): load or update an adapter scale.removeLora(path): remove one adapter from the active set.clearLoras(): remove all active adapters from the current context.
Basic runtime flow
import 'package:llamadart/llamadart.dart';
Future<void> main() async {
final engine = LlamaEngine(LlamaBackend());
try {
await engine.loadModel('/models/base-model.gguf');
await engine.setLora('/models/lora/domain.gguf', scale: 0.7);
await for (final chunk in engine.generate(
'Answer as a domain specialist in one paragraph.',
)) {
print(chunk);
}
} finally {
await engine.dispose();
}
}
Stacking adapters
You can activate multiple adapters on the same loaded model:
await engine.setLora('/models/lora/style.gguf', scale: 0.35);
await engine.setLora('/models/lora/domain.gguf', scale: 0.70);
- Calling
setLora(...)again with the same path updates scale. - Use
removeLora(path)to disable one adapter. - Use
clearLoras()to reset to base model behavior.
Training your own LoRA adapters
For end-to-end training + conversion, start with the official notebook:
Recommended workflow:
- Pick a base model family that you will also serve in
llamadart. - Train LoRA weights (for example, QLoRA/PEFT flow in the notebook).
- Export adapter artifacts from training.
- Convert adapter artifacts into llama.cpp-compatible GGUF adapter files.
- Validate outputs in a native test run, then load adapters with
setLora(...).
Practical compatibility checks:
- Keep tokenizer/model family aligned between base model and adapter.
- Validate adapter behavior on the same quantized base model class you deploy.
- Keep a small golden-prompt set to compare base vs adapter output drift.
Scale tuning guidance
- Start around
0.4to0.8for first-pass evaluation. - Lower scales (
0.1to0.3) help preserve base-model behavior. - Higher scales can over-steer outputs; validate with representative prompts.
Lifecycle notes
- LoRA activation is tied to the active context.
unloadModel()ordispose()releases model/context resources and clears active adapter state.- Re-apply adapters after reloading a model.
Platform notes
- Native backends implement runtime LoRA operations.
- Web bridge runtime currently exposes no-op LoRA operations in this release; do not assume LoRA effect on web targets yet.
Troubleshooting
- If
setLora(...)fails, verify the adapter path is accessible at runtime. - Ensure adapter/base-model compatibility (architecture/family alignment).
- When behavior seems unchanged, confirm you are testing on a native target and not a web fallback path.