interview-eval

university

https://github.com/interview-eval/interview-eval

AI & ML interests

None defined yet.

Recent Activity

EunsuKim authored a paper about 18 hours ago

Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation

EunsuKim authored a paper about 18 hours ago

"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration

EunsuKim submitted a paper about 21 hours ago

"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration

View all activity

authored 2 papers about 18 hours ago

Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation

Paper • 2506.19352 • Published Jun 24, 2025

"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration

Paper • 2605.21363 • Published 3 days ago • 2

submitted a paper to Daily Papers about 21 hours ago

"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration

Paper • 2605.21363 • Published 3 days ago • 2

submitted a paper to Daily Papers about 2 months ago

Composer 2 Technical Report

Paper • 2603.24477 • Published Mar 25 • 17

authored a paper 6 months ago

World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models

Paper • 2511.22787 • Published Nov 27, 2025 • 10

authored a paper 7 months ago

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues

Paper • 2510.19028 • Published Oct 21, 2025 • 8

authored 4 papers 12 months ago

Diffusion Models Through a Global Lens: Are They Culturally Inclusive?

Paper • 2502.08914 • Published Feb 13, 2025

When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts

Paper • 2503.16826 • Published Mar 21, 2025

MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language

Paper • 2505.14395 • Published May 20, 2025 • 6

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation

Paper • 2506.00482 • Published May 31, 2025 • 8

authored 2 papers about 1 year ago

Crosslingual Reasoning through Test-Time Scaling

Paper • 2505.05408 • Published May 8, 2025 • 8

ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published Apr 29, 2025 • 54

authored 3 papers about 1 year ago

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models

Paper • 2410.17578 • Published Oct 23, 2024 • 1

LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

Paper • 2412.10424 • Published Dec 10, 2024 • 2

Trillion 7B Technical Report

Paper • 2504.15431 • Published Apr 21, 2025 • 38

authored a paper over 1 year ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31, 2025 • 126

authored 3 papers over 1 year ago

Uncovering Factor Level Preferences to Improve Human-Model Alignment

Paper • 2410.06965 • Published Oct 9, 2024

BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

Paper • 2406.09948 • Published Jun 14, 2024 • 2

LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

Paper • 2412.10424 • Published Dec 10, 2024 • 2

updated a Space over 1 year ago

Configuration Card Sharing Space