AI-driven protein design | Nature Reviews Bioengineering

Ebrahimi, S. B. & Samanta, D. Engineering protein-based therapeutics through structural and chemical design. Nat. Commun. 14, 2411 (2023).

Article

Google Scholar

Chen, K. & Arnold, F. H. Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proc. Natl Acad. Sci. USA 90, 5618–5622 (1993).

Article

Google Scholar

Lajoie, M. J. et al. Genomically recoded organisms expand biological functions. Science 342, 357–360 (2013).

Article

Google Scholar

Listov, D., Goverde, C. A., Correia, B. E. & Fleishman, S. J. Opportunities and challenges in design and optimization of protein function. Nat. Rev. Mol. Cell Biol. 25, 639–653 (2024).

Article

Google Scholar

Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019). UniRep is one of the first protein language models to learn rich evolutionary, structural and biophysical representations from raw, unlabelled protein sequences, demonstrating how such models can power a diverse suite of artificial intelligence-driven tools.

Article

Google Scholar

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). AlphaFold 2 is the first model to regularly predict protein 3D structures from amino-acid sequences with near-experimental accuracy, and its high-fidelity structural predictions now underpin artificial intelligence-driven protein design workflows.

Article

Google Scholar

Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

Article

Google Scholar

Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022). ProteinMPNN solves the inverse folding challenge by generating amino-acid sequences for fixed backbones with accuracy well above physics-based methods and at high throughput, making it a widely adopted cornerstone in artificial intelligence-driven rational design workflows.

Article

Google Scholar

Watson, J. L. et al. De novo design of protein structure and function with RFDiffusion. Nature 620, 1089–1100 (2023). RFDiffusion generates protein backbones that meet specified structural or functional objectives with high success rates across diverse, experimentally validated design settings, including de novo design.

Article

Google Scholar

Hamamsy, T. et al. Protein remote homology detection and structural alignment using deep learning. Nat. Biotechnol. 42, 975–985 (2024).

Article

Google Scholar

van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).

Article

Google Scholar

Wayment-Steele, H. K. et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 625, 832–839 (2024).

Article

Google Scholar

Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).

Article

Google Scholar

Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

Article

Google Scholar

Hutchison, C. A. et al. Mutagenesis at a specific position in a DNA sequence. J. Biol. Chem. 253, 6551–6560 (1978).

Article

Google Scholar

Alber, T., Sun, D. P., Nye, J. A., Muchmore, D. C. & Matthews, B. W. Temperature-sensitive mutations of bacteriophage T4 lysozyme occur at sites with low mobility and low solvent accessibility in the folded protein. Biochemistry 26, 3754–3758 (1987).

Article

Google Scholar

Marshall, S. A., Lazar, G. A., Chirino, A. J. & Desjarlais, J. R. Rational design and engineering of therapeutic proteins. Drug Discov. Today 8, 212–221 (2003).

Article

Google Scholar

Davey, J. A., Damry, A. M., Goto, N. K. & Chica, R. A. Rational design of proteins that exchange on functional timescales. Nat. Chem. Biol. 13, 1280–1285 (2017).

Article

Google Scholar

Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).

Article

Google Scholar

Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).

Article

Google Scholar

Hie, B. L. et al. Efficient evolution of human antibodies from general protein language models. Nat. Biotechnol. 42, 275–283 (2024).

Article

Google Scholar

Koh, H. Y., Nguyen, A. T. N., Pan, S., May, L. T. & Webb, G. I. Physicochemical graph neural network for learning protein–ligand interaction fingerprints from sequence data. Nat. Mach. Intell. 6, 673–687 (2024).

Article

Google Scholar

Chowdhury, R. et al. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. 40, 1617–1623 (2022).

Article

Google Scholar

Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

Article

Google Scholar

Chai Discovery Team et al. Chai-1: decoding the molecular interactions of life. Preprint at bioRxiv (2024).

Bryant, D. H. et al. Deep diversification of an AAV capsid protein by machine learning. Nat. Biotechnol. 39, 691–696 (2021). This study applies AI-driven directed evolution to generate and screen ~10¹⁰ AAV2 capsid variants, yielding 110,689 viable mutants that exceed natural serotype diversity, and positions AI-driven capsid diversification as a new paradigm in gene-therapy vector engineering.

Article

Google Scholar

Ogden, P. J., Kelsic, E. D., Sinai, S. & Church, G. M. Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design. Science 366, 1139–1143 (2019).

Article

Google Scholar

Jiang, K. et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 387, eadr6006 (2024). This study optimizes artificial intelligence-driven directed evolution by integrating protein language-model embeddings with sequence-based activity predictors, achieving up to 100-fold improvements in protein activity across diverse targets and streamlining modern directed evolution workflows.

Article

Google Scholar

Yang, J. et al. Active learning-assisted directed evolution. Nat. Commun. 16, 714 (2025).

Article

Google Scholar

Gainza, P. et al. De novo design of protein interactions with learned surface fingerprints. Nature 617, 176–184 (2023). This study developed a unified artificial intelligence-driven rational design workflow that integrates 3D geometric network for binding-site prediction, structural database mining and motif-based binder design to generate de novo protein binders against targets such as the SARS-CoV-2 spike with nanomolar affinities.

Article

Google Scholar

Grøn, H., Bech, L. M., Branner, S. & Breddam, K. A highly active and oxidation-resistant subtilisin-like enzyme produced by a combination of site-directed mutagenesis and chemical modification. Eur. J. Biochem. 194, 897–901 (1990).

Article

Google Scholar

Fleishman, S. J. et al. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332, 816–821 (2011).

Article

Google Scholar

Varadi, M. et al. AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).

Article

Google Scholar

Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). This study introduces ESM2, one of the most widely adopted protein language models, and ESMFold, which matches AlphaFold 2’s accuracy using only single‐sequence inputs without multiple‐sequence alignments, enabling substantially faster structure prediction.

Article
MathSciNet

Google Scholar

Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science 387, 850–858 (2025).

Article

Google Scholar

Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).

Article

Google Scholar

Ravindra et al. Multiplexed Cre-dependent selection yields systemic AAVs for targeting distinct brain cell types. Nat. Methods 17, 541–550 (2020).

Article

Google Scholar

Silva, D.-A. et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature 565, 186–191 (2019).

Article

Google Scholar

Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

Article

Google Scholar

Kaminski, K., Ludwiczak, J., Pawlicki, K., Alva, V. & Dunin-Horkawicz, S. pLM-BLAST: distant homology detection based on direct comparison of sequence representations from protein language models. Bioinformatics 39, btad579 (2023).

Article

Google Scholar

Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).

Article

Google Scholar

Holm, L. Dali server: structural unification of protein families. Nucleic Acids Res. 50, W210–W215 (2022).

Article

Google Scholar

The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).

Article

Google Scholar

Hopf, T. A. et al. The EVcouplings Python framework for coevolutionary sequence analysis. Bioinformatics 35, 1582–1584 (2019).

Article

Google Scholar

Burley, S. K. et al. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 51, D488–D508 (2023).

Article

Google Scholar

Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).

Article

Google Scholar

Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv (2022).

Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 10, 221–227 (2013).

Article

Google Scholar

Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

Article

Google Scholar

Weinstein, E. N. et al. Manufacturing-aware generative model architectures enable biological sequence design and synthesis at petascale. Preprint at bioRxiv (2024).

Packer, M. S. & Liu, D. R. Methods for the directed evolution of proteins. Nat. Rev. Genet. 16, 379–394 (2015).

Article

Google Scholar

Biswas, S., Khimulya, G., Alley, E. C., Esvelt, K. M. & Church, G. M. Low-N protein engineering with data-efficient deep learning. Nat. Methods 18, 389–396 (2021).

Article

Google Scholar

Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023). ProGen shows that large protein language models conditioned on ‘tags’ (short textual annotations such as enzyme function) can generate functional protein sequences across diverse families, enabling rapid tag-driven protein design without explicit structural input.

Article

Google Scholar

Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023). This study integrates AI tools such as structure prediction, sequence design and virtual screening into a unified AI-driven rational design workflow to create de novo luciferases that catalyse DTZ chemiluminescence with exceptional specificity.

Article

Google Scholar

Cao, L. et al. Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022).

Article

Google Scholar

Shanker, V. R., Bruun, T. U. J., Hie, B. L. & Kim, P. S. Unsupervised evolution of protein and antibody complexes with a structure-informed language model. Science 385, 46–53 (2024).

Article

Google Scholar

Röthlisberger, D. et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190–195 (2008).

Article

Google Scholar

Lauko, A. et al. Computational design of serine hydrolases. Science 388, eadu2454 (2025).

Article

Google Scholar

Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).

Article

Google Scholar

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

Article

Google Scholar

Llinares-López, F., Berthet, Q., Blondel, M., Teboul, O. & Vert, J.-P. Deep embedding and alignment of protein sequences. Nat. Methods 20, 104–111 (2023).

Article

Google Scholar

Liu, W. et al. PLMSearch: protein language model powers accurate and fast sequence search for remote homology. Nat. Commun. 15, 2775 (2024).

Article

Google Scholar

Kim, W. et al. Rapid and sensitive protein complex alignment with Foldseek-Multimer. Nat. Methods 22, 469–472 (2025).

Article

Google Scholar

van den Oord, A., Vinyals, O. & kavukcuoglu, K. Neural discrete representation learning. In Advances in Neural Information Processing Systems (eds Guyon, I. et a.) Vol. 30 (Curran Associates, 2017).

Eom, H. et al. Discovery of highly active kynureninases for cancer immunotherapy through protein language model. Nucleic Acids Res. 53, gkae1245 (2025).

Article

Google Scholar

Hu, M. et al. Advances in Neural Information Processing Systems Vol. 35 (Curran Associates, Inc., 2022).

Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

Article

Google Scholar

Ahdritz, G. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat. Methods 21, 1514–1524 (2024).

Article

Google Scholar

Ketata, M. A. et al. DiffDock-PP: rigid protein–protein docking with diffusion models. Preprint at (2023).

Qiao, Z., Nie, W., Vahdat, A., Miller, T. F. & Anandkumar, A. State-specific protein–ligand complex structure prediction with a multiscale deep generative model. Nat. Mach. Intell. 6, 195–208 (2024).

Article

Google Scholar

Guo, H.-B. et al. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci. Rep. 12, 10696 (2022).

Article

Google Scholar

Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).

Article

Google Scholar

Wang, J. et al. Scaffolding protein functional sites using deep learning. Science 377, 387–394 (2022).

Article

Google Scholar

He, J., Turzo, S. B. A., Seffernick, J. T., Kim, S. S. & Lindert, S. Prediction of intrinsic disorder using Rosetta ResidueDisorder and AlphaFold2. J. Phys. Chem. B 126, 8439–8446 (2022).

Article

Google Scholar

Kurgan, L. et al. Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins. Nat. Protoc. 18, 3157–3172 (2023).

Article

Google Scholar

Vander Meersche, Y., Cretin, G., de Brevern, A. G., Gelly, J.-C. & Galochkina, T. MEDUSA: prediction of protein flexibility from sequence. J. Mol. Biol. 433, 166882 (2021).

Article

Google Scholar

Mészáros, B., Erdős, G. & Dosztányi, Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46, W329–W337 (2018).

Article

Google Scholar

Hu, G. et al. flDPnn: accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat. Commun. 12, 4438 (2021).

Article

Google Scholar

Roney, J. P. & Ovchinnikov, S. State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101 (2022).

Article

Google Scholar

Pak, M. A. et al. Using AlphaFold to predict the impact of single mutations on protein stability and function. PLoS ONE 18, e0282689 (2023).

Article

Google Scholar

Pudžiuvelytė, I. et al. TemStaPro: protein thermostability prediction using sequence representations from protein language models. Bioinformatics 40, btae157 (2024).

Article

Google Scholar

Blaabjerg, L. M. et al. Rapid protein stability prediction using deep learning representations. eLife 12, e82593 (2023).

Article

Google Scholar

Zhou, Y., Pan, Q., Pires, D. E. V., Rodrigues, C. H. M. & Ascher, D. B. DDMut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Res. 51, W122–W128 (2023).

Article

Google Scholar

Yin, R., Feng, B. Y., Varshney, A. & Pierce, B. G. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Sci. 31, e4379 (2022).

Article

Google Scholar

Ferreiro, D. U., Komives, E. A. & Wolynes, P. G. Frustration in biomolecules. Q. Rev. Biophys. 47, 285–363 (2014).

Article

Google Scholar

del Alamo, D., Sala, D., Mchaourab, H. S. & Meiler, J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. eLife 11, e75751 (2022).

Article

Google Scholar

Guan, X. et al. Predicting protein conformational motions using energetic frustration analysis and AlphaFold2. Proc. Natl Acad. Sci. USA 121, e2410662121 (2024).

Article

Google Scholar

Chakravarty, D. et al. AlphaFold predictions of fold-switched conformations are driven by structure memorization. Nat. Commun. 15, 7296 (2024).

Article

Google Scholar

Jing, B., Berger, B. & Jaakkola, T. AlphaFold meets flow matching for generating protein ensembles. In Proc. 41st International Conference on Machine Learning Vol. 235, 22277–22303 (JMLR.org, 2024).

Wang, T. et al. Ab initio characterization of protein molecular dynamics with AI2BMD. Nature 635, 1019–1027 (2024).

Article

Google Scholar

Wang, Y. et al. Enhancing geometric representations for molecules with equivariant vector–scalar interactive message passing. Nat. Commun. 15, 313 (2024).

Article

Google Scholar

Arnold, C. AlphaFold touted as next big thing for drug discovery — but is it? Nature 622, 15–17 (2023).

Article

Google Scholar

Callaway, E. Major AlphaFold upgrade offers boost for drug discovery. Nature 629, 509–510 (2024).

Article

Google Scholar

Miller, E. B. et al. Enabling structure-based drug discovery utilizing predicted models. Cell 187, 521–525 (2024).

Article

Google Scholar

Jang, Y. J. et al. Accurate prediction of protein function using statistics-informed graph networks. Nat. Commun. 15, 6601 (2024).

Article

Google Scholar

You, R. et al. NetGO: improving large-scale protein function prediction with massive network information. Nucleic Acids Res. 47, W379–W387 (2019).

Article

Google Scholar

Yao, S. et al. NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information. Nucleic Acids Res. 49, W469–W475 (2021).

Article

Google Scholar

Wang, S., You, R., Liu, Y., Xiong, Y. & Zhu, S. NetGO 3.0: protein language model improves large-scale functional annotations. Genom. Proteom. Bioinform. 21, 349–358 (2023).

Article

Google Scholar

Le Guilloux, V., Schmidtke, P. & Tuffery, P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinform. 10, 168 (2009).

Article

Google Scholar

Porollo, A. & Meller, J. Prediction-based fingerprints of protein–protein interactions. Proteins Struct. Funct. Bioinform. 66, 630–645 (2007).

Article

Google Scholar

Murakami, Y. & Mizuguchi, K. Applying the naive Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics 26, 1841–1848 (2010).

Article

Google Scholar

Tubiana, J., Schneidman-Duhovny, D. & Wolfson, H. J. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat. Methods 19, 730–739 (2022).

Article

Google Scholar

Jiménez, J., Doerr, S., Martínez-Rosell, G., Rose, A. S. & De Fabritiis, G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics 33, 3036–3042 (2017).

Article

Google Scholar

Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and turns for molecular docking. In International Conference on Learning Representations (2023).

Elliott, S. et al. Enhancement of therapeutic protein in vivo activities through glycoengineering. Nat. Biotechnol. 21, 414–421 (2003).

Article

Google Scholar

Hunter, T. The age of crosstalk: phosphorylation, ubiquitination, and beyond. Mol. Cell 28, 730–738 (2007).

Article

Google Scholar

Ramazi, S. & Zahiri, J. Post-translational modifications in proteins: resources, tools and prediction methods. Database 2021, baab012 (2021).

Article

Google Scholar

Wang, D. et al. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics 33, 3909–3916 (2017).

Article

Google Scholar

Wang, D. et al. MusiteDeep: a deep-learning based webserver for protein post-translational modification site prediction and visualization. Nucleic Acids Res. 48, W140–W146 (2020).

Article

Google Scholar

Shrestha, P., Kandel, J., Tayara, H. & Chong, K. T. Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model. Nat. Commun. 15, 6699 (2024).

Article

Google Scholar

Yan, Y. et al. MIND-S is a deep-learning prediction model for elucidating protein post-translational modifications in human diseases. Cell Rep. Methods 3, 100430 (2023).

Article

Google Scholar

Shi, X.-X. et al. PTMdyna: exploring the influence of post-translation modifications on protein conformational dynamics. Brief. Bioinform. 23, bbab424 (2022).

Article

Google Scholar

Zhou, N. et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 20, 244 (2019).

Article

Google Scholar

Bloom, J. D., Labthavikul, S. T., Otey, C. R. & Arnold, F. H. Protein stability promotes evolvability. Proc. Natl Acad. Sci. USA 103, 5869–5874 (2006).

Article

Google Scholar

Meier, J. et al. Advances in Neural Information Processing Systems Vol. 34, 29287–29303 (Curran Associates, Inc., 2021).

Ferruz, N., Schmidt, S. & Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 13, 4348 (2022).

Article

Google Scholar

Unsal, S. et al. Learning functional properties of proteins with language models. Nat. Mach. Intell. 4, 227–245 (2022).

Article

Google Scholar

Ferruz, N. & Höcker, B. Controllable protein design with language models. Nat. Mach. Intell. 4, 521–532 (2022).

Article

Google Scholar

Truong, T. F. Jr & Bepler, T. PoET: A generative model of protein families as sequences-of-sequences. In Advances in Neural Information Processing Systems (eds Oh, A. et al.) Vol. 36 (Curran Associates, 2023).

Gligorijević, V. et al. Function-guided protein design by deep manifold sampling. Preprint at bioRxiv (2021).

Kucera, T., Togninalli, M. & Meng-Papaxanthos, L. Conditional generative modeling for de novo protein design with hierarchical functions. Bioinformatics 38, 3454–3461 (2022).

Article

Google Scholar

Ingraham, J., Garg, V., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) Vol. 32 (Curran Associates, 2019).

Hsu, C. et al. Learning inverse folding from millions of predicted structures. In Proc. 39th International Conference on Machine Learning 8946–8970 (PMLR, 2022).

Dauparas, J. et al. Atomic context-conditioned protein sequence design using LigandMPNN. Nat. Methods 22, 717–723 (2025).

Article

Google Scholar

McFerrin, L. & Ratan, U. Highlights from the AWS Life Sciences Executive Symposium 2023: accelerating pharma drug discovery with ML and generative AI. AWS Blogs (31 May 2023).

Goverde, C. A. et al. Computational design of soluble and functional membrane protein analogues. Nature 631, 449–458 (2024).

Article

Google Scholar

Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).

Article

Google Scholar

Gao, B. et al. Advances in Neural Information Processing Systems Vol. 36 (Curran Associates, Inc., 2023).

Ho, J., Jain, A. & Abbeel, P. Advances in Neural Information Processing Systems Vol. 33 (Curran Associates, Inc., 2020).

Trippe, B. L. et al. Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem. Int. Conf. Learn. Represent. ICLR 2022 (2022).

Luo, S. et al. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Adv. Neural Inf. Process. Syst. 35, 9754–9767 (2022).

Google Scholar

Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).

Article

Google Scholar

Bennett, N. R. et al. Improving de novo protein binder design with deep learning. Nat. Commun. 14, 2625 (2023).

Article

Google Scholar

Pacesa, M. et al. BindCraft: one-shot design of functional protein binders. Preprint at bioRxiv (2024).

Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).

Article

Google Scholar

Lisanza, S. L. et al. Multistate and functional protein design using RoseTTAFold sequence space diffusion. Nat. Biotechnol. 43, 1288–1298 (2024).

Article

Google Scholar

Chu, A. E. et al. An all-atom protein generative model. Proc. Natl Acad. Sci. USA 121, e2311500121 (2024).

Article

Google Scholar

McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).

Article

Google Scholar

Zhou, Z. et al. Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning. Nat. Commun. 15, 5566 (2024).

Article

Google Scholar

Hsu, C., Nisonoff, H., Fannjiang, C. & Listgarten, J. Learning protein fitness models from evolutionary and assay-labeled data. Nat. Biotechnol. 40, 1114–1122 (2022).

Article

Google Scholar

Frey, N. C. et al. Lab-in-the-loop therapeutic antibody design with deep learning. Preprint at bioRxiv (2025).

Wu, Z., Kan, S. B. J., Lewis, R. D., Wittmann, B. J. & Arnold, F. H. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc. Natl Acad. Sci. USA 116, 8852–8858 (2019).

Article

Google Scholar

Narayanan, H. et al. Machine learning for biologics: opportunities for protein engineering, developability, and formulation. Trends Pharmacol. Sci. 42, 151–165 (2021).

Article

Google Scholar

Gentiluomo, L. et al. Application of interpretable artificial neural networks to early monoclonal antibodies development. Eur. J. Pharm. Biopharm. 141, 81–89 (2019).

Article

Google Scholar

Gentiluomo, L., Roessner, D. & Frieß, W. Application of machine learning to predict monomer retention of therapeutic proteins after long term storage. Int. J. Pharm. 577, 119039 (2020).

Article

Google Scholar

Wang, C. & Zou, Q. Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE. BMC Biol. 21, 12 (2023).

Article

Google Scholar

Zhang, X. et al. PLM_Sol: predicting protein solubility by benchmarking multiple protein language models with the updated Escherichia coli protein solubility dataset. Brief. Bioinform. 25, bbae404 (2024).

Article

Google Scholar

Planas-Iglesias, J. et al. AggreProt: a web server for predicting and engineering aggregation prone regions in proteins. Nucleic Acids Res. 52, W159–W169 (2024).

Article

Google Scholar

Louros, N., Schymkowitz, J. & Rousseau, F. Mechanisms and pathology of protein misfolding and aggregation. Nat. Rev. Mol. Cell Biol. 24, 912–933 (2023).

Article

Google Scholar

Reynisson, B., Alvarez, B., Paul, S., Peters, B. & Nielsen, M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 48, W449–W454 (2020).

Article

Google Scholar

Hashemi, N. et al. Improved prediction of MHC-peptide binding using protein language models. Front. Bioinform. 3, 1207380 (2023).

Article

Google Scholar

Müller, M. et al. Machine learning methods and harmonized datasets improve immunogenic neoantigen prediction. Immunity 56, 2650–2663.e6 (2023).

Article

Google Scholar

Li, G., Iyer, B., Prasath, V. B. S., Ni, Y. & Salomonis, N. DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity. Brief. Bioinform. 22, bbab160 (2021).

Article

Google Scholar

Marks, C., Hummer, A. M., Chin, M. & Deane, C. M. Humanization of antibodies using a machine learning approach on large-scale repertoire data. Bioinformatics 37, 4041–4047 (2021).

Article

Google Scholar

Qiu, Y. & Cheng, F. Artificial intelligence for drug discovery and development in Alzheimer’s disease. Curr. Opin. Struct. Biol. 85, 102776 (2024).

Article

Google Scholar

Zambaldi, V. et al. De novo design of high-affinity protein binders with AlphaProteo. Preprint at (2024).

Ostrov, N. et al. Design, synthesis, and testing toward a 57-codon genome. Science 353, 819–822 (2016).

Article

Google Scholar

Liu, Y., Yang, Q. & Zhao, F. Synonymous but not silent: the codon usage code for gene expression and protein folding. Annu. Rev. Biochem. 90, 375–401 (2021).

Article

Google Scholar

Hanson, G. & Coller, J. Codon optimality, bias and usage in translation and mRNA decay. Nat. Rev. Mol. Cell Biol. 19, 20–30 (2018).

Article

Google Scholar

Fu, H. et al. Codon optimization with deep learning to enhance protein expression. Sci. Rep. 10, 17617 (2020).

Article

Google Scholar

Sidi, T., Bahiri-Elitzur, S., Tuller, T. & Kolodny, R. Predicting gene sequences with AI to study codon usage patterns. Proc. Natl Acad. Sci. USA 122, e2410003121 (2025).

Article

Google Scholar

Constant, D. A. et al. Deep learning-based codon optimization with large-scale synonymous variant datasets enables generalized tunable protein expression. Preprint at bioRxiv (2023).

Ren, Z. et al. CodonBERT: a BERT-based architecture tailored for codon optimization using the cross-attention mechanism. Bioinformatics 40, btae330 (2024).

Article

Google Scholar

Fallahpour, A., Gureghian, V., Filion, G. J., Lindner, A. B. & Pandi, A. CodonTransformer: a multispecies codon optimizer using context-aware neural networks. Nat. Commun. 16, 3205 (2025).

Article

Google Scholar

Weinstein, E. N. et al. Optimal design of stochastic DNA synthesis protocols based on generative sequence models. In Proc. 25th International Conference on Artificial Intelligence and Statistics 7450–7482 (PMLR, 2022).

Stark, H., Padia, U., Balla, J., Diao, C. & Church, G. CodonMPNN for organism specific and codon optimal inverse folding. Preprint at (2024).

Outeiral, C. & Deane, C. M. Codon language embeddings provide strong signals for use in protein engineering. Nat. Mach. Intell. 6, 170–179 (2024).

Article

Google Scholar

Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).

Article

Google Scholar

Russell, S. et al. Efficacy and safety of voretigene neparvovec (AAV2-hRPE65v2) in patients with RPE65-mediated inherited retinal dystrophy: a randomised, controlled, open-label, phase 3 trial. Lancet 390, 849–860 (2017).

Article

Google Scholar

Mendell, J. R. et al. Single-dose gene-replacement therapy for spinal muscular atrophy. N. Engl. J. Med. 377, 1713–1722 (2017).

Article

Google Scholar

Ding, F. & Steinhardt, J. Protein language models are biased by unequal sequence sampling across the tree of life. Preprint at bioRxiv (2024).

Volkov, M. et al. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks. J. Med. Chem. 65, 7946–7958 (2022).

Article

Google Scholar

Medina-Ortiz, D., Khalifeh, A., Anvari-Kazemabad, H. & Davari, M. D. Interpretable and explainable predictive machine learning models for data-driven protein engineering. Biotechnol. Adv. 79, 108495 (2025).

Article

Google Scholar

Simon, E. & Zou, J. InterPLM: discovering interpretable features in protein language models via sparse autoencoders. Preprint at bioRxiv (2025).

AI’s potential to accelerate drug discovery needs a reality check. Nature 622, 217–217 (2023).

Cuturello, F., Celoria, M., Ansuini, A. & Cazzaniga, A. Enhancing predictions of protein stability changes induced by single mutations using MSA-based language models. Bioinformatics 40, btae447 (2024).

Article

Google Scholar

Petti, S. et al. End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman. Bioinformatics 39, btac724 (2023).

Article

Google Scholar

Lu, W. et al. DynamicBind: predicting ligand-specific protein–ligand complex structure with a deep equivariant generative model. Nat. Commun. 15, 1071 (2024).

Article

Google Scholar

Wohlwend, J. et al. Boltz-1 democratizing biomolecular interaction modeling. Preprint at bioRxiv (2025).

Yu, T. et al. Enzyme function prediction using contrastive learning. Science 379, 1358–1363 (2023).

Article

Google Scholar

Luo, F., Wang, M., Liu, Y., Zhao, X.-M. & Li, A. DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 35, 2766–2773 (2019).

Article

Google Scholar

Nijkamp, E., Ruffolo, J. A., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978.e3 (2023).

Article

Google Scholar

Wang, T. et al. Improved fragment sampling for ab initio protein structure prediction using deep neural networks. Nat. Mach. Intell. 1, 347–355 (2019).

Article

Google Scholar

Marchand, A. et al. Targeting protein–ligand neosurfaces with a generalizable deep learning tool. Nature 639, 522–531 (2025).

Article

Google Scholar

Ahern, W. et al. Atom level enzyme active site scaffolding using RFdiffusion2. Preprint at bioRxiv (2025).

Wang, X., Terashi, G., Christoffer, C. W., Zhu, M. & Kihara, D. Protein docking model evaluation by 3D deep convolutional neural networks. Bioinformatics 36, 2113–2118 (2020).

Article

Google Scholar

Réau, M., Renaud, N., Xue, L. C. & Bonvin, A. M. J. J. DeepRank-GNN: a graph neural network framework to learn patterns in protein–protein interfaces. Bioinformatics 39, btac759 (2023).

Article

Google Scholar

Shuai, R. W., Ruffolo, J. A. & Gray, J. J. IgLM: infilling language modeling for antibody sequence design. Cell Syst. 14, 979–989.e4 (2023).

Article

Google Scholar

Montemurro, A. et al. NetTCR-2.0 enables accurate prediction of TCR–peptide binding by using paired TCRα and β sequence data. Commun. Biol. 4, 1–13 (2021).

Article

Google Scholar

Lam, J. H. et al. A deep learning framework to predict binding preference of RNA constituents on protein surface. Nat. Commun. 10, 4941 (2019).

Article

Google Scholar

Cheng, P. et al. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res. 34, 630–647 (2024).

Article

Google Scholar

Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (ed Pereira, F. et al.) Vol. 25 (Curran Associates, 2012).

Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

Article

Google Scholar

Vaswani, A. et al. Advances in Neural Information Processing Systems Vol. 30 (Curran Associates, Inc., 2017).

Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning 8748–8763 (PMLR, 2021).

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers) (eds Burstein, J. et al.) 4171–4186 (ACL, 2019).

Zhang, Z. et al. Protein representation learning by geometric structure pretraining. Int. Conf. Learn. Represent. ICLR 2022 (2022).

Wang, Y. et al. Self-play reinforcement learning guides protein engineering. Nat. Mach. Intell. 5, 845–860 (2023).

Article

Google Scholar

Lutz, I. D. et al. Top-down design of protein architectures with reinforcement learning. Science 380, 266–273 (2023).

Article

Google Scholar

Rumelhart, D. E. & McClelland, J. L. Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations 318–362 (MIT Press, 1987).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

Article

Google Scholar

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. Int. Conf. Learn. Represent. ICLR 2017 (2017).

Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A. & Vandergheynst, P. Geometric deep learning: going beyond Euclidean data. IEEE Signal. Process. Mag. 34, 18–42 (2017).

Article

Google Scholar

AI-driven protein design | Nature Reviews Bioengineering

More Stories

Nuclera and leadXpro Partner to Accelerate Structure-Based Drug Design for Complex Membrane Proteins

Architecture student earns top honors at national design competition

AI Swiftly Alters Molecular Structures, Learning From Each Design Attempt

Leave a Reply Cancel reply

COS Spring Capsule Wardrobe 2026: Sustainable Minimalist Luxury

Bundle House by Nomo Studio offers “an abundance of textures”

23 Stair Railing Ideas That Take Your Style to the Next Level

Nuclera and leadXpro Partner to Accelerate Structure-Based Drug Design for Complex Membrane Proteins

More Stories

Nuclera and leadXpro Partner to Accelerate Structure-Based Drug Design for Complex Membrane Proteins

Architecture student earns top honors at national design competition

AI Swiftly Alters Molecular Structures, Learning From Each Design Attempt

Leave a Reply Cancel reply

You may have missed

COS Spring Capsule Wardrobe 2026: Sustainable Minimalist Luxury

Bundle House by Nomo Studio offers “an abundance of textures”

23 Stair Railing Ideas That Take Your Style to the Next Level

Nuclera and leadXpro Partner to Accelerate Structure-Based Drug Design for Complex Membrane Proteins