Pre-6T-era datasets and other legacy work kept for reproducibility. Not part of current pipelines.
AI & ML interests
None defined yet.
Recent Activity
View all activity
Top-K influence results and analysis figures. Full score matrices are HF Buckets (see inventory.json).
Stratified working samples (5 sizes, seed=42) plus 100K random preconditioner sample.
Pre-6T-era datasets and other legacy work kept for reproducibility. Not part of current pipelines.
OLMES benchmark evaluation results across OLMo-3-7B and SmolLM-3-3B model variants.
Top-K influence results and analysis figures. Full score matrices are HF Buckets (see inventory.json).
OLMES evaluation queries as attribution targets for OLMo-3-7B variants.
Stratified working samples (5 sizes, seed=42) plus 100K random preconditioner sample.
Dolma3 6T source corpus, dedup state, and unified per-doc manifests for the attribution project.