FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation
Abstract
FaithfulFaces is a pose-faithful facial identity preservation framework that improves identity consistency in text-to-video generation through pose-shared alignment and explicit Euler angle embeddings.
Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial pose variations or facial occlusions. In this paper, we propose FaithfulFaces, a pose-faithful facial identity preservation learning framework to improve IPT2V in complex dynamic scenes. The key of FaithfulFaces is a pose-shared identity aligner that refines and aligns facial poses across distinct views via a pose-shared dictionary and a pose variation-identity invariance constraint. By mapping single-view inputs into a global facial pose representation with explicit Euler angle embeddings, FaithfulFaces provides a pose-faithful facial prior that guides generative foundations toward robust identity-preserving generation. In particular, we develop a specialized pipeline to curate a high-quality video dataset featuring substantial facial pose diversity. Extensive experiments demonstrate that FaithfulFaces achieves state-of-the-art performance, maintaining superior identity consistency and structural clarity even as pose changes and occlusions occur.
Community
Pose-Faithful Facial Identity Preservation for Text-to-Video Generation
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AnyID: Ultra-Fidelity Universal Identity-Preserving Video Generation from Any Visual References (2026)
- Identity-Consistent Video Generation under Large Facial-Angle Variations (2026)
- LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation (2026)
- Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation (2026)
- AnyCrowd: Instance-Isolated Identity-Pose Binding for Arbitrary Multi-Character Animation (2026)
- Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision (2026)
- Gloria: Consistent Character Video Generation via Content Anchors (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.04702 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper