Inbox

oeohomos 's Collections

Embedding

Inbox

Multimode

Reasoning

Qwen

Deepseek Papers

RAG

updated Oct 17, 2025

Upvote

RuCCoD: Towards Automated ICD Coding in Russian

Paper • 2502.21263 • Published Feb 28, 2025 • 133
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7, 2025 • 124
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7, 2025 • 27
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3, 2025 • 32
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6, 2025 • 21
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

Paper • 2503.05639 • Published Mar 7, 2025 • 26
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7, 2025 • 38
Learning from Failures in Multi-Attempt Reinforcement Learning

Paper • 2503.04808 • Published Mar 4, 2025 • 18
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

Paper • 2503.04872 • Published Mar 6, 2025 • 15
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Paper • 2503.05638 • Published Mar 7, 2025 • 20
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

Paper • 2503.05652 • Published Mar 7, 2025 • 11
ProReflow: Progressive Reflow with Decomposed Velocity

Paper • 2503.04824 • Published Mar 5, 2025 • 9
An Empirical Study on Eliciting and Improving R1-like Reasoning Models

Paper • 2503.04548 • Published Mar 6, 2025 • 9
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

Paper • 2503.05447 • Published Mar 7, 2025 • 8
LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding

Paper • 2503.04359 • Published Mar 6, 2025 • 6
SAGE: A Framework of Precise Retrieval for RAG

Paper • 2503.01713 • Published Mar 3, 2025 • 7
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Paper • 2503.01840 • Published Mar 3, 2025 • 9
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles

Paper • 2502.18968 • Published Feb 26, 2025 • 3
LoRACode: LoRA Adapters for Code Embeddings

Paper • 2503.05315 • Published Mar 7, 2025 • 13
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

Paper • 2503.04504 • Published Mar 6, 2025 • 5
YuE: Scaling Open Foundation Models for Long-Form Music Generation

Paper • 2503.08638 • Published Mar 11, 2025 • 72
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Paper • 2503.08625 • Published Mar 11, 2025 • 27
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

Paper • 2503.07703 • Published Mar 10, 2025 • 37
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12, 2025 • 77
Multimodal Language Modeling for High-Accuracy Single Cell Transcriptomics Analysis and Generation

Paper • 2503.09427 • Published Mar 12, 2025 • 6
Video Action Differencing

Paper • 2503.07860 • Published Mar 10, 2025 • 33
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization

Paper • 2503.08619 • Published Mar 11, 2025 • 20
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

Paper • 2503.08686 • Published Mar 11, 2025 • 19
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Paper • 2503.08644 • Published Mar 11, 2025 • 16
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru

Paper • 2503.07587 • Published Mar 10, 2025 • 11
"Principal Components" Enable A New Language of Images

Paper • 2503.08685 • Published Mar 11, 2025 • 12
^RFLAV: Rolling Flow matching for infinite Audio Video generation

Paper • 2503.08307 • Published Mar 11, 2025 • 9
BiasEdit: Debiasing Stereotyped Language Models via Model Editing

Paper • 2503.08588 • Published Mar 11, 2025 • 7
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models

Paper • 2503.08417 • Published Mar 11, 2025 • 8
AI-native Memory 2.0: Second Me

Paper • 2503.08102 • Published Mar 11, 2025 • 13
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol

Paper • 2503.05860 • Published Mar 7, 2025 • 11
LocAgent: Graph-Guided LLM Agents for Code Localization

Paper • 2503.09089 • Published Mar 12, 2025 • 13
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents

Paper • 2503.08684 • Published Mar 11, 2025 • 5
Referring to Any Person

Paper • 2503.08507 • Published Mar 11, 2025 • 7
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG

Paper • 2503.04388 • Published Mar 6, 2025 • 17
Quantizing Large Language Models for Code Generation: A Differentiated Replication

Paper • 2503.07103 • Published Mar 10, 2025 • 8
Cost-Optimal Grouped-Query Attention for Long-Context LLMs

Paper • 2503.09579 • Published Mar 12, 2025 • 5
Self-Taught Self-Correction for Small Language Models

Paper • 2503.08681 • Published Mar 11, 2025 • 15
Multi Agent based Medical Assistant for Edge Devices

Paper • 2503.05397 • Published Mar 7, 2025 • 9
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System

Paper • 2503.09600 • Published Mar 12, 2025 • 4
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?

Paper • 2503.05333 • Published Mar 7, 2025 • 8
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14, 2025 • 28
API Agents vs. GUI Agents: Divergence and Convergence

Paper • 2503.11069 • Published Mar 14, 2025 • 36
Group-robust Machine Unlearning

Paper • 2503.09330 • Published Mar 12, 2025 • 1
Personalize Anything for Free with Diffusion Transformer

Paper • 2503.12590 • Published Mar 16, 2025 • 44
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16, 2025 • 35
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation

Paper • 2503.13070 • Published Mar 17, 2025 • 10
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18, 2025 • 154
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

Paper • 2503.12271 • Published Mar 15, 2025 • 9
Pensez: Less Data, Better Reasoning -- Rethinking French LLM

Paper • 2503.13661 • Published Mar 17, 2025 • 5
PyGDA: A Python Library for Graph Domain Adaptation

Paper • 2503.10284 • Published Mar 13, 2025 • 4
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving

Paper • 2503.08683 • Published Mar 11, 2025 • 2
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
STEVE: AStep Verification Pipeline for Computer-use Agent Training

Paper • 2503.12532 • Published Mar 16, 2025 • 17
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction

Paper • 2503.11227 • Published Mar 14, 2025 • 25
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Paper • 2503.15478 • Published Mar 19, 2025 • 14
ELTEX: A Framework for Domain-Driven Synthetic Data Generation

Paper • 2503.15055 • Published Mar 19, 2025 • 6
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20, 2025 • 96
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Paper • 2503.16365 • Published Mar 20, 2025 • 41
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Paper • 2503.16219 • Published Mar 20, 2025 • 52
Why Do Multi-Agent LLM Systems Fail?

Paper • 2503.13657 • Published Mar 17, 2025 • 49
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving

Paper • 2503.16905 • Published Mar 21, 2025 • 54
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization

Paper • 2503.16874 • Published Mar 21, 2025 • 45
Can Large Vision Language Models Read Maps Like a Human?

Paper • 2503.14607 • Published Mar 18, 2025 • 10
A Comprehensive Survey on Long Context Language Modeling

Paper • 2503.17407 • Published Mar 20, 2025 • 49
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27, 2025 • 62
Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Paper • 2503.21460 • Published Mar 27, 2025 • 83
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation

Paper • 2503.21729 • Published Mar 27, 2025 • 29
Exploring the Evolution of Physics Cognition in Video Generation: A Survey

Paper • 2503.21765 • Published Mar 27, 2025 • 11
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation

Paper • 2503.22675 • Published Mar 28, 2025 • 36
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29, 2025 • 10
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31, 2025 • 305
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Paper • 2504.02587 • Published Apr 3, 2025 • 32
One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7, 2025 • 110
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2, 2025 • 87
MM-IFEngine: Towards Multimodal Instruction Following

Paper • 2504.07957 • Published Apr 10, 2025 • 35
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11, 2025 • 33
WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published Apr 16, 2025 • 35
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Paper • 2504.13122 • Published Apr 17, 2025 • 20
ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16, 2025 • 49
UFO2: The Desktop AgentOS

Paper • 2504.14603 • Published Apr 20, 2025 • 29
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21, 2025 • 35
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks

Paper • 2504.15521 • Published Apr 22, 2025 • 64
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22, 2025 • 64
MR. Video: "MapReduce" is the Principle for Long Video Understanding

Paper • 2504.16082 • Published Apr 22, 2025 • 5
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22, 2025 • 36
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24, 2025 • 124
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8, 2025 • 187
Flow-GRPO: Training Flow Matching Models via Online RL

Paper • 2505.05470 • Published May 8, 2025 • 88
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges

Paper • 2505.04769 • Published May 7, 2025 • 10
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 263
Understanding Tool-Integrated Reasoning

Paper • 2508.19201 • Published Aug 26, 2025 • 32
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6, 2025 • 130
Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 129

Upvote

Collection guide
Browse collections