OKR Micro-Model (ASMS)
A 15M parameter decoder-only transformer trained from scratch to handle OKR (Objectives and Key Results) management via tool calls to the Keyflow MCP API.
Built using Agent-Specific Model Synthesis (ASMS) โ a pipeline that treats large LLMs as compilers, not runtimes. Instead of routing every OKR query through Claude or GPT-4, this micro-model handles workflow routing and tool-call generation locally on Apple Silicon.
Model Details
| Property | Value |
|---|---|
| Architecture | Decoder-only Transformer (GPT-style) |
| Parameters | 15M (13.5M after quantization overhead) |
| Layers | 6 |
| Hidden Dim | 384 |
| Attention Heads | 6 |
| FFN Dim | 768 |
| Max Sequence Length | 512 |
| Vocabulary | 6,000 tokens (task-specific BPE) |
| Framework | Apple MLX |
| Quantized Size | 10.1 MB (INT4) |
| FP16 Size | 26.9 MB |
| Training Data | 5,759 synthetic examples |
| Training Time | 58 minutes on M3 Pro |
| Training Cost | $0 (local hardware, in-session corpus generation) |
Performance
| Metric | Score |
|---|---|
| Workflow Routing | 8/10 (80%) |
| Valid JSON Tool Calls | 5/10 (50%) |
| Best Val Loss | 1.14 |
| Inference Latency | 100-250ms on M3 Pro |
The model correctly routes queries to 6 OKR workflows (goal_to_okr, view_okrs, check_in, reports, onboard, align) and generates valid, parseable Keyflow MCP tool calls:
{"tool": "objective", "action": "list", "params": {"cycleId": "cyc_q2_2026", "ownerId": "usr_107"}}
{"tool": "key_result", "action": "check_in", "params": {"keyResultId": "kr_102", "value": 1}}
{"tool": "report", "action": "health_check", "params": {"cycleId": "cyc_q2_2026"}}
How to Use
With MLX (recommended)
import mlx.core as mx
import mlx.nn as nn
import sentencepiece as spm
import json
# Load model
from architecture import OKRModelConfig, create_model
with open("config.json") as f:
config_dict = json.load(f)
config = OKRModelConfig(**{k: v for k, v in config_dict.items() if k in OKRModelConfig.__dataclass_fields__})
model = create_model(config)
weights = mx.load("model.safetensors")
model.load_weights(list(weights.items()))
mx.eval(model.parameters())
# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.Load("okr_tokenizer.model")
# Inference
query = "Show me my OKRs"
context = json.dumps({"userId": "usr_001", "activeCycleId": "cyc_q2_2026"})
text = f"QUERY: {query} CONTEXT: {context} "
tokens = mx.array([[sp.bos_id()] + sp.Encode(text)])
output = model.generate(tokens, max_new_tokens=256, temperature=0.0)
print(sp.Decode(output[0].tolist()))
With the ASMS Server
git clone https://github.com/chan4lk/timm
cd timm
uv sync
uv run deploy/server.py model/checkpoints/best
# Open http://localhost:8800 for the chat UI
Files
| File | Description | Size |
|---|---|---|
model.safetensors |
FP16 model weights | 51 MB |
model_q4.safetensors |
INT4 quantized weights | 10 MB |
config.json |
Model architecture config (FP16) | 181 B |
config_q4.json |
Model architecture config (INT4) | 242 B |
okr_tokenizer.model |
SentencePiece BPE tokenizer (6K vocab) | 325 KB |
architecture.py |
MLX model definition | - |
Training
Trained using the ASMS (Agent-Specific Model Synthesis) pipeline:
- Role Specification: 5 Keyflow MCP tools, 20 operations, 6 workflows, ~500 effective decision paths
- Corpus Generation: 5,759 synthetic examples generated by Claude Sonnet 4.6 agents (80% normal, 15% edge, 5% adversarial)
- Tokenizer: SentencePiece BPE, 6,000 vocabulary tokens
- Architecture: 6-layer decoder-only transformer, 384 hidden dim, 6 heads, SwiGLU FFN, RoPE, RMSNorm
- Training: Curriculum learning (normal โ edge โ adversarial), AdamW with cosine LR schedule, 30 epochs, batch_size=16
- Hardware: Apple M3 Pro, MLX with Metal acceleration, 58 minutes total
Key Findings
- Model capacity matters more than data volume. Scaling from 5.7M to 15M params on the same data improved routing +60% and valid JSON +150%.
- Tokenizer must be frozen. Rebuilding the tokenizer between corpus versions resets all learned patterns.
- Early stopping is essential. Best checkpoint at epoch 3-4, not final epoch.
Citation
@article{ranaweera2026asms,
title={Agent-Specific Model Synthesis: Compiling Task-Bounded Intelligence from Large Language Models into CPU-Deployable Micro-Models},
author={Ranaweera, Chandima},
year={2026},
note={Draft v0.2, Bistec Global}
}
Paper
Agent-Specific Model Synthesis (ASMS) โ Ranaweera, C. (2026). Draft v0.2.
License
Apache 2.0
- Downloads last month
- 235
Hardware compatibility
Log In to add your hardware
Quantized
Evaluation results
- Workflow Routing Accuracyself-reported80.000
- Valid JSON Tool Call Rateself-reported50.000