OKR Micro-Model (ASMS)

A 15M parameter decoder-only transformer trained from scratch to handle OKR (Objectives and Key Results) management via tool calls to the Keyflow MCP API.

Built using Agent-Specific Model Synthesis (ASMS) โ€” a pipeline that treats large LLMs as compilers, not runtimes. Instead of routing every OKR query through Claude or GPT-4, this micro-model handles workflow routing and tool-call generation locally on Apple Silicon.

Model Details

Property Value
Architecture Decoder-only Transformer (GPT-style)
Parameters 15M (13.5M after quantization overhead)
Layers 6
Hidden Dim 384
Attention Heads 6
FFN Dim 768
Max Sequence Length 512
Vocabulary 6,000 tokens (task-specific BPE)
Framework Apple MLX
Quantized Size 10.1 MB (INT4)
FP16 Size 26.9 MB
Training Data 5,759 synthetic examples
Training Time 58 minutes on M3 Pro
Training Cost $0 (local hardware, in-session corpus generation)

Performance

Metric Score
Workflow Routing 8/10 (80%)
Valid JSON Tool Calls 5/10 (50%)
Best Val Loss 1.14
Inference Latency 100-250ms on M3 Pro

The model correctly routes queries to 6 OKR workflows (goal_to_okr, view_okrs, check_in, reports, onboard, align) and generates valid, parseable Keyflow MCP tool calls:

{"tool": "objective", "action": "list", "params": {"cycleId": "cyc_q2_2026", "ownerId": "usr_107"}}
{"tool": "key_result", "action": "check_in", "params": {"keyResultId": "kr_102", "value": 1}}
{"tool": "report", "action": "health_check", "params": {"cycleId": "cyc_q2_2026"}}

How to Use

With MLX (recommended)

import mlx.core as mx
import mlx.nn as nn
import sentencepiece as spm
import json

# Load model
from architecture import OKRModelConfig, create_model

with open("config.json") as f:
    config_dict = json.load(f)
config = OKRModelConfig(**{k: v for k, v in config_dict.items() if k in OKRModelConfig.__dataclass_fields__})
model = create_model(config)
weights = mx.load("model.safetensors")
model.load_weights(list(weights.items()))
mx.eval(model.parameters())

# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.Load("okr_tokenizer.model")

# Inference
query = "Show me my OKRs"
context = json.dumps({"userId": "usr_001", "activeCycleId": "cyc_q2_2026"})
text = f"QUERY: {query} CONTEXT: {context} "
tokens = mx.array([[sp.bos_id()] + sp.Encode(text)])
output = model.generate(tokens, max_new_tokens=256, temperature=0.0)
print(sp.Decode(output[0].tolist()))

With the ASMS Server

git clone https://github.com/chan4lk/timm
cd timm
uv sync
uv run deploy/server.py model/checkpoints/best
# Open http://localhost:8800 for the chat UI

Files

File Description Size
model.safetensors FP16 model weights 51 MB
model_q4.safetensors INT4 quantized weights 10 MB
config.json Model architecture config (FP16) 181 B
config_q4.json Model architecture config (INT4) 242 B
okr_tokenizer.model SentencePiece BPE tokenizer (6K vocab) 325 KB
architecture.py MLX model definition -

Training

Trained using the ASMS (Agent-Specific Model Synthesis) pipeline:

  1. Role Specification: 5 Keyflow MCP tools, 20 operations, 6 workflows, ~500 effective decision paths
  2. Corpus Generation: 5,759 synthetic examples generated by Claude Sonnet 4.6 agents (80% normal, 15% edge, 5% adversarial)
  3. Tokenizer: SentencePiece BPE, 6,000 vocabulary tokens
  4. Architecture: 6-layer decoder-only transformer, 384 hidden dim, 6 heads, SwiGLU FFN, RoPE, RMSNorm
  5. Training: Curriculum learning (normal โ†’ edge โ†’ adversarial), AdamW with cosine LR schedule, 30 epochs, batch_size=16
  6. Hardware: Apple M3 Pro, MLX with Metal acceleration, 58 minutes total

Key Findings

  1. Model capacity matters more than data volume. Scaling from 5.7M to 15M params on the same data improved routing +60% and valid JSON +150%.
  2. Tokenizer must be frozen. Rebuilding the tokenizer between corpus versions resets all learned patterns.
  3. Early stopping is essential. Best checkpoint at epoch 3-4, not final epoch.

Citation

@article{ranaweera2026asms,
  title={Agent-Specific Model Synthesis: Compiling Task-Bounded Intelligence from Large Language Models into CPU-Deployable Micro-Models},
  author={Ranaweera, Chandima},
  year={2026},
  note={Draft v0.2, Bistec Global}
}

Paper

Agent-Specific Model Synthesis (ASMS) โ€” Ranaweera, C. (2026). Draft v0.2.

License

Apache 2.0

Downloads last month
235
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results