SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images

Authors: Jinyuan Qu, Hongyang Li, and Lei Zhang.

Overview

SegVGGT is a unified feed-forward framework for joint 3D reconstruction and 3D instance segmentation from unposed multi-view RGB images. It integrates object queries into a geometry-grounded transformer and introduces the FADA module to guide instance-aware attention, enabling accurate reconstruction and segmentation in a single forward pass.

Ciatation

If you find this work helpful for your research, please cite:

@article{qu2026segvggt,
  title={SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images},
  author={Qu, Jinyuan and Li, Hongyang and Zhang, Lei},
  journal={arXiv preprint arXiv:2603.19926},
  year={2026}
}

License

See the LICENSE file for details about the license under which this code is made available.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for JinyuanQu/SegVGGT

SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images

Paper • 2603.19926 • Published Mar 20