BioKinema: Physically Grounded Generative Modeling of All-Atom Biomolecular Dynamics
Introduction
BioKinema is a physically grounded generative model that predicts continuous-time, all-atom biomolecular trajectories at a fraction of the cost of traditional molecular dynamics (MD) simulations. It is built on top of Protenix (ByteDance's AlphaFold 3 reproduction) and extends it with a temporal-attention mechanism derived from Langevin dynamics, so a single model can roll out MD-like trajectories at arbitrary, possibly non-uniform frame intervals.
The temporal-attention bias follows a stretched-exponential decay B_ij = -λ |t_i - t_j|^β, where λ is a per-head learnable decay (ALiBi-initialised) and β is a fixed time-scaling exponent selected per model variant.
This HuggingFace repository hosts the released weights and processed data. For installation, inference, training, the data pipeline, and the manuscript benchmark code, see the BioKinema GitHub repository.
Repository Contents
| File | Description | Size |
|---|---|---|
BioKinema_atlas+misato+mdposit_sqrt.pt |
sqrt checkpoint (EMA). For protein–ligand complexes and short-time MD. Trained on Atlas + MISATO + MDposit with β = 0.5. |
~3.9 GB |
BioKinema_CATH+octapeptide_beta0.25.pt |
beta=0.25 checkpoint (EMA). For long-time, single-chain protein MD. Trained on MSR (CATH / MegaSim / octapeptides) with β = 0.25; adds a TICA-dynamics loss. |
~3.9 GB |
biokinema_codec_bundle.tar |
Processed MISATO / MDposit / unbinding data in a lossless compressed codec (one template bioassembly per trajectory + a stacked-coordinate array). Used by sqrt training. |
~41 GB |
Which Checkpoint to Use
- Complexes / protein–ligand, or short MD →
BioKinema_atlas+misato+mdposit_sqrt.pt(run withβ = 0.5). - Long single-chain protein MD / kinetics →
BioKinema_CATH+octapeptide_beta0.25.pt(run withβ = 0.25).
The exponent β must match the checkpoint at inference time (pass it via --beta).
Usage
Clone the BioKinema repository, install the environment, then run inference:
bash inference.sh \
--checkpoint_path ./checkpoints/BioKinema_atlas+misato+mdposit_sqrt.pt \
--dump_dir ./output \
--input_file ./experiments/atlas_benchmark/init_frames/7lp1_A_R1_0.cif \
--beta 0.5
DDP checkpoints (with a module. prefix) are handled automatically by the inference runner.
Codec Bundle
tar -xf biokinema_codec_bundle.tar -C $BIOKINEMA_UNBINDING_ROOT
# -> $BIOKINEMA_UNBINDING_ROOT/{misato_codec,mdposit_codec,unbinding_codec}
Point each dataset's bioassembly_dict_dir at the corresponding *_codec/ directory; the data loader auto-detects and decompresses on the fly.
Training Data
- Atlas — public ATLAS MD database (preprocessing scripts shipped in the code repository).
- MSR (CATH / Octapeptides / MegaSim) — Zenodo:
10.5281/zenodo.15629740,10.5281/zenodo.15641199,10.5281/zenodo.15641184. - MISATO / MDposit / unbinding — released here as the compressed codec bundle above.
Citation
@article{feng2026physically,
title={Physically Grounded Generative Modeling of All-Atom Biomolecular Dynamics},
author={Feng, Bin and Zhang, Jiying and Zhang, Xinni and Zhang, Ming and Barth, Patrick and Liu, Zijing and Li, Yu},
journal={bioRxiv},
pages={2026--02},
year={2026},
publisher={Cold Spring Harbor Laboratory}
}
Acknowledgements
This project was built based on Protenix, an open-source biomolecular structure prediction framework developed by ByteDance.
Contact
For questions or collaborations, please open an issue or contact us at fengbin@idea.edu.cn.