Instructions to use answerdotai/ModernBERT-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use answerdotai/ModernBERT-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="answerdotai/ModernBERT-base")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base") model = AutoModelForMaskedLM.from_pretrained("answerdotai/ModernBERT-base") - Notebooks
- Google Colab
- Kaggle
TemporalMesh Transformer: 29.4 PPL at 48% compute — beats Mamba, new open-source architecture
#86
by vigneshwar234 - opened
New transformer architecture: TMT — dynamic graph attention + per-token adaptive depth
TemporalMesh Transformer (TMT) achieves 29.4 PPL on WikiText-2 at 48% compute (120M params) — outperforming Mamba (31.8), RWKV (33.1), Longformer (39.6), and vanilla transformer (42.1).
5 innovations unified in one forward pass:
| Innovation | What it does | Cost |
|---|---|---|
| Mesh Attention | Dynamic kNN graph per layer from cosine similarity | O(S·k) vs O(S²) |
| Temporal Decay | Learned multiplicative attenuation post-softmax | ~0 overhead |
| Adaptive Exit | Per-token gate: punctuation exits layer 2, rare words layer 12 | −52% compute |
| Dual-Stream FFN | Syntax + semantic parallel MLP streams | Same FLOPs |
| EMA Anchors | 16 persistent fast-weight vectors, β=0.99 | 32KB params |
Cross-benchmark results:
- WikiText-103: 36.1 PPL vs 38.4 Mamba
- LongBench: 53.4 vs 51.3 Mamba
- C4: 27.4 PPL vs 30.1 Mamba
- The Pile: 35.8 PPL
- 226 tests passing, 3 seeds (42/1337/2024), full ablations
Superadditive synergy: Combined gain = 12.7 PPL vs 8.6 from summing individual components.
📄 Paper: https://zenodo.org/records/20287390 (DOI: 10.5281/zenodo.20287197)
💻 Code: https://github.com/vignesh2027/TemporalMesh-Transformer
🎮 Live demo: https://huggingface.co/spaces/vigneshwar234/TemporalMesh-Transformer-Demo
🤗 Model + benchmarks: https://huggingface.co/vigneshwar234/TemporalMesh-Transformer