my_read_book
updated
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper
• 2407.08083
• Published • 34
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
• 2408.11039
• Published • 63
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper
• 2408.15237
• Published • 42
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper
• 2409.11355
• Published • 30
OmniGen: Unified Image Generation
Paper
• 2409.11340
• Published • 115
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic
reasoning
Paper
• 2409.12183
• Published • 39
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
• 2409.12568
• Published • 50
Imagine yourself: Tuning-Free Personalized Image Generation
Paper
• 2409.13346
• Published • 69
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published • 140
MaskBit: Embedding-free Image Generation via Bit Tokens
Paper
• 2409.16211
• Published • 17
Emu3: Next-Token Prediction is All You Need
Paper
• 2409.18869
• Published • 98
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
Scale Fusion
Paper
• 2412.09626
• Published • 21
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published • 108
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper
• 2412.11815
• Published • 26
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
• 2412.18319
• Published • 39
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper
• 2501.06186
• Published • 65
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published • 55
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published • 302
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper
• 2501.06751
• Published • 32
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published • 447
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D
Assets Generation
Paper
• 2501.12202
• Published • 50
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient
Long-Context LLM Inference
Paper
• 2502.00299
• Published • 3
Region-Adaptive Sampling for Diffusion Transformers
Paper
• 2502.10389
• Published • 53
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent
Image Generation
Paper
• 2502.18364
• Published • 36
Transformers without Normalization
Paper
• 2503.10622
• Published • 172
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
Paper
• 2503.18886
• Published • 24
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
• 2504.09454
• Published • 11
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Paper
• 2503.10772
• Published • 19
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion
Transformers via In-Context Reflection
Paper
• 2503.12271
• Published • 9
From Reflection to Perfection: Scaling Inference-Time Optimization for
Text-to-Image Diffusion Models via Reflection Tuning
Paper
• 2504.16080
• Published • 15
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture
Design in Text to Image Generation
Paper
• 2503.10618
• Published • 19
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
• 2504.20966
• Published • 31
Flow-GRPO: Training Flow Matching Models via Online RL
Paper
• 2505.05470
• Published • 88
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper
• 2505.04588
• Published • 65
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision
Encoders for Multimodal Learning
Paper
• 2505.04601
• Published • 29
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published • 191
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Paper
• 2506.14603
• Published • 19
Medical World Model: Generative Simulation of Tumor Evolution for
Treatment Planning
Paper
• 2506.02327
• Published • 20
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction
and Planning
Paper
• 2506.09985
• Published • 31
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow
Development
Paper
• 2506.05010
• Published • 80
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
Paper
• 2506.17450
• Published • 64
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper
• 2508.05004
• Published • 131
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed
Inference
Paper
• 2508.02193
• Published • 138
Representation Shift: Unifying Token Compression with FlashAttention
Paper
• 2508.00367
• Published • 16
Qwen-Image Technical Report
Paper
• 2508.02324
• Published • 274
Task structure and nonlinearity jointly determine learned
representational geometry
Paper
• 2401.13558
• Published
DCPO: Dynamic Clipping Policy Optimization
Paper
• 2509.02333
• Published • 22
DoPE: Denoising Rotary Position Embedding
Paper
• 2511.09146
• Published • 98
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Paper
• 2511.20714
• Published • 50
Distribution Matching Distillation Meets Reinforcement Learning
Paper
• 2511.13649
• Published • 6
SD3.5-Flash: Distribution-Guided Distillation of Generative Flows
Paper
• 2509.21318
• Published • 11
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Paper
• 2512.05150
• Published • 76
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
Paper
• 2512.04810
• Published • 26
Distribution Matching Variational AutoEncoder
Paper
• 2512.07778
• Published • 29
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper
• 2601.02204
• Published • 63
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation
Paper
• 2601.04823
• Published • 7
Phi-4-reasoning-vision-15B Technical Report
Paper
• 2603.03975
• Published • 20