Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
Grayson D.'s picture

Grayson D.

GraysonD-XYZ
ยท
https://GraysonD.XYZ
  • GraysonD-XYZ
  • graysondodson

AI & ML interests

None yet

Recent Activity

reacted to salma-remyx's post with ๐Ÿš€ 3 days ago
In that benchmark comparison, do you even have the sample size to distinguish two models, or are you making decisions based on statistical noise? "Resolution Diagnostics for Paired LLM Evaluation" offers a simple check: a per-pair resolution ratio q = N/N* that flags when a displayed ranking sits below the resolution floor regardless of p-value. arXiv: https://arxiv.org/abs/2605.30315v1 Outrider automatically matched this paper to our fork of lm-evaluation-harness and opened a PR implementing the diagnostic. Configure the action to find new methods tailored to your repo: https://github.com/remyxai/outrider
View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs