ModelLens — Trained Recommender Checkpoint

📄 Paper: ModelLens: Finding the Best Model for Your Task from Myriads of Models  ·  🤗 Collection: luisrui/modellens  ·  💻 Code: github.com/luisrui/ModelLens

This is the released ModelLens checkpoint — a metric-aware ranker that, given a dataset description + task + metric, returns a ranked list of HuggingFace models likely to perform well on it. No fine-tuning, no forward pass on the target dataset.

This repo only ships the weights. For:

What's in here

File Size Description
ModelLens.pt ~709 MB Trained recommender weights (slim — inference-ready, ~3 unused parent-class buffers dropped)
args.json ~2 KB Training-time hyperparameters (model dims, num_models / num_tasks / num_metrics / etc.)

Provenance

  • Trained on: luisrui/ModelLens-corpus-v2 — 1,807,133 (model × dataset × metric × value) records
  • Coverage: 47,242 HuggingFace models · 2,581 tasks · 3,714 metrics · ~86k datasets
  • Architecture: MLPMetricFull (the paper model — see github repo)
  • Loss: ensemble (listwise + pairwise + pointwise, λ_list=0.5, λ_pair=1.0, w_point=0.1)
  • Training: 30 epochs, DDP × 4 GPUs, bs=8, lr=1e-3, wd=1e-4, learnable τ
  • Slimmed checkpoint: inference-unused parent-class buffers + train-set dataset_desc_matrix stripped (load with strict=False).

Loading

from huggingface_hub import hf_hub_download
import torch, json

ckpt_path = hf_hub_download("luisrui/ModelLens", "ModelLens.pt")
args_path = hf_hub_download("luisrui/ModelLens", "args.json")

args  = json.load(open(args_path))
state = torch.load(ckpt_path, map_location="cpu")

# Build the model from source (see github.com/luisrui/ModelLens) and load:
# model = MLPMetricFull(**args_to_kwargs(args))
# model.load_state_dict(state, strict=False)   # strict=False is intentional

For a complete, ready-to-run setup including the candidate model pool + metadata, see inference_lib.py and recommend.py in the Space.

How it works

  1. The dataset description is embedded with OpenAI text-embedding-3-small (1536-dim — same encoder used at training time).
  2. The ranker scores every candidate model conditioned on (dataset_embedding, task_id, metric_id, model_size_bucket, model_family_id, model_id).
  3. Returns the top-K candidates, optionally filtered by param count / "HF-hosted only" / "official pretrained only".

Intended use

  • Picking a starting model for a new task / dataset, without running every candidate.
  • Cheap pre-filter ahead of a more expensive transferability estimator or partial fine-tune.

Limitations

  • Knowledge is bounded by what's in corpus-v2 (up to early 2026).
  • Models / datasets that don't appear in the corpus fall back to text similarity over their descriptions — useful but weaker than the full signal available for in-corpus entities.
  • Scores are relative — the ranking is what matters; the absolute numbers are not calibrated to any specific metric scale.

Citation

@article{cai2026modellens,
  title={ModelLens: Finding the Best for Your Task from Myriads of Models},
  author={Cai, Rui and Mo, Weijie Jacky and Wen, Xiaofei and Ma, Qiyao and Zhu, Wenhui and Chen, Xiwen and Chen, Muhao and Zhao, Zhe},
  journal={arXiv preprint arXiv:2605.07075},
  year={2026}
}

License

MIT.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including luisrui/ModelLens

Paper for luisrui/ModelLens