SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images
Paper • 2603.19926 • Published
Authors: Jinyuan Qu, Hongyang Li, and Lei Zhang.
SegVGGT is a unified feed-forward framework for joint 3D reconstruction and 3D instance segmentation from unposed multi-view RGB images. It integrates object queries into a geometry-grounded transformer and introduces the FADA module to guide instance-aware attention, enabling accurate reconstruction and segmentation in a single forward pass.
If you find this work helpful for your research, please cite:
@article{qu2026segvggt,
title={SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images},
author={Qu, Jinyuan and Li, Hongyang and Zhang, Lei},
journal={arXiv preprint arXiv:2603.19926},
year={2026}
}
See the LICENSE file for details about the license under which this code is made available.