OKR Micro-Model (ASMS)

A 15M parameter decoder-only transformer trained from scratch to handle OKR (Objectives and Key Results) management via tool calls to the Keyflow MCP API.

Built using Agent-Specific Model Synthesis (ASMS) — a pipeline that treats large LLMs as compilers, not runtimes. Instead of routing every OKR query through Claude or GPT-4, this micro-model handles workflow routing and tool-call generation locally on Apple Silicon.

Model Details

Property	Value
Architecture	Decoder-only Transformer (GPT-style)
Parameters	15M (13.5M after quantization overhead)
Layers	6
Hidden Dim	384
Attention Heads	6
FFN Dim	768
Max Sequence Length	512
Vocabulary	6,000 tokens (task-specific BPE)
Framework	Apple MLX
Quantized Size	10.1 MB (INT4)
FP16 Size	26.9 MB
Training Data	5,759 synthetic examples
Training Time	58 minutes on M3 Pro
Training Cost	$0 (local hardware, in-session corpus generation)

Performance

Metric	Score
Workflow Routing	8/10 (80%)
Valid JSON Tool Calls	5/10 (50%)
Best Val Loss	1.14
Inference Latency	100-250ms on M3 Pro

The model correctly routes queries to 6 OKR workflows (goal_to_okr, view_okrs, check_in, reports, onboard, align) and generates valid, parseable Keyflow MCP tool calls:

{"tool": "objective", "action": "list", "params": {"cycleId": "cyc_q2_2026", "ownerId": "usr_107"}}
{"tool": "key_result", "action": "check_in", "params": {"keyResultId": "kr_102", "value": 1}}
{"tool": "report", "action": "health_check", "params": {"cycleId": "cyc_q2_2026"}}

How to Use

With MLX (recommended)

import mlx.core as mx
import mlx.nn as nn
import sentencepiece as spm
import json

# Load model
from architecture import OKRModelConfig, create_model

with open("config.json") as f:
    config_dict = json.load(f)
config = OKRModelConfig(**{k: v for k, v in config_dict.items() if k in OKRModelConfig.__dataclass_fields__})
model = create_model(config)
weights = mx.load("model.safetensors")
model.load_weights(list(weights.items()))
mx.eval(model.parameters())

# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.Load("okr_tokenizer.model")

# Inference
query = "Show me my OKRs"
context = json.dumps({"userId": "usr_001", "activeCycleId": "cyc_q2_2026"})
text = f"QUERY: {query} CONTEXT: {context} "
tokens = mx.array([[sp.bos_id()] + sp.Encode(text)])
output = model.generate(tokens, max_new_tokens=256, temperature=0.0)
print(sp.Decode(output[0].tolist()))

With the ASMS Server

git clone https://github.com/chan4lk/timm
cd timm
uv sync
uv run deploy/server.py model/checkpoints/best
# Open http://localhost:8800 for the chat UI

Files

File	Description	Size
`model.safetensors`	FP16 model weights	51 MB
`model_q4.safetensors`	INT4 quantized weights	10 MB
`config.json`	Model architecture config (FP16)	181 B
`config_q4.json`	Model architecture config (INT4)	242 B
`okr_tokenizer.model`	SentencePiece BPE tokenizer (6K vocab)	325 KB
`architecture.py`	MLX model definition	-

Training

Trained using the ASMS (Agent-Specific Model Synthesis) pipeline:

Role Specification: 5 Keyflow MCP tools, 20 operations, 6 workflows, ~500 effective decision paths
Corpus Generation: 5,759 synthetic examples generated by Claude Sonnet 4.6 agents (80% normal, 15% edge, 5% adversarial)
Tokenizer: SentencePiece BPE, 6,000 vocabulary tokens
Architecture: 6-layer decoder-only transformer, 384 hidden dim, 6 heads, SwiGLU FFN, RoPE, RMSNorm
Training: Curriculum learning (normal → edge → adversarial), AdamW with cosine LR schedule, 30 epochs, batch_size=16
Hardware: Apple M3 Pro, MLX with Metal acceleration, 58 minutes total

Key Findings

Model capacity matters more than data volume. Scaling from 5.7M to 15M params on the same data improved routing +60% and valid JSON +150%.
Tokenizer must be frozen. Rebuilding the tokenizer between corpus versions resets all learned patterns.
Early stopping is essential. Best checkpoint at epoch 3-4, not final epoch.

Citation

@article{ranaweera2026asms,
  title={Agent-Specific Model Synthesis: Compiling Task-Bounded Intelligence from Large Language Models into CPU-Deployable Micro-Models},
  author={Ranaweera, Chandima},
  year={2026},
  note={Draft v0.2, Bistec Global}
}

Paper

Agent-Specific Model Synthesis (ASMS) — Ranaweera, C. (2026). Draft v0.2.

License

Apache 2.0

Downloads last month: 235

MLX

Hardware compatibility

Quantized

Evaluation results

Workflow Routing Accuracy
self-reported

80.000
Valid JSON Tool Call Rate
self-reported

50.000