Precise-Debugging-Benchmarking/PDB-Single-Hard Viewer ⢠Updated about 18 hours ago ⢠5.75k ⢠48
Precise Debugging Benchmark: Is Your Model Debugging or Regenerating? Paper ⢠2604.17338 ⢠Published 3 days ago ⢠2
Precise-Debugging-Benchmarking/PDB-Single-Hard Viewer ⢠Updated about 18 hours ago ⢠5.75k ⢠48
Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems Paper ⢠2210.15037 ⢠Published Oct 26, 2022 ⢠1
TLDR: Token-Level Detective Reward Model for Large Vision Language Models Paper ⢠2410.04734 ⢠Published Oct 7, 2024 ⢠19
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper ⢠2410.10563 ⢠Published Oct 14, 2024 ⢠37
VisualLens: Personalization through Visual History Paper ⢠2411.16034 ⢠Published Nov 25, 2024 ⢠18