# Accurate protein sequence design with CarbonDesign for robust results

Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020).

Bryan, C. M. et al. Computational design of a synthetic PD-1 agonist. Proc. Natl Acad. Sci. USA 118, 2102164118 (2021).

Article

Google Scholar

Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).

Article

Google Scholar

Dou, J. et al. De novo design of a fluorescence-activating beta-barrel. Nature 561, 485–491 (2018).

Article

Google Scholar

Vorobieva, A. A. et al. De novo design of transmembrane beta barrels. Science 371, 8182 (2021).

Article

Google Scholar

Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

Article

Google Scholar

Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature https://doi.org/10.1038/s41586-023-06415-8 (2023).

Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. In Proc. of the 40th International Conference on Machine Learning (eds Krause, A. et al.) 40001–40039 (PMLR, 2023).

Ingraham, J. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

Article

Google Scholar

Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).

Article

Google Scholar

Hsu, C. et al. Learning inverse folding from millions of predicted structures. In Proc. of the 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 8946–8970 (PMLR, 2022).

Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 746 (2022).

Article

Google Scholar

Liu, Y. et al. Rotamer-free protein sequence design based on deep learning and self-consistency. Nat. Comput. Sci. 2, 451–462 (2022).

Article

Google Scholar

Huang, B. et al. Accurate and efficient protein sequence design through learning concise local environment of residues. Bioinformatics 39, 122 (2023).

Article

Google Scholar

Ingraham, J. et al. Generative models for graph-based protein design. In Proc. of Advances in Neural Information Processing Systems (eds Wallach, H. et al) 15820–15831 (NeurlPS, 2019).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

Article

Google Scholar

Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

Article

Google Scholar

Carreira, J. et al. Human pose estimation with iterative error feedback. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (eds Bajcsy, R. et al.) 4733–4742 (IEEE, 2016).

Tu, Z. & Bai, X. Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1744–1757 (2010).

Article

Google Scholar

Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

Article
MathSciNet

Google Scholar

Robin, X. et al. Continuous Automated Model EvaluatiOn (CAMEO)—perspectives on the future of fully automated evaluation of structure prediction methods. Proteins 89, 1977–1986 (2021).

Article

Google Scholar

CASP15. Critical Assessment of Techniques for Protien Structure Prediction, 15th Round. Abstract Book (Protein Structure Prediction Center, 2022); https://predictioncenter.org/casp15/doc/CASP15_Abstracts.pdf

Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, 1988).

Wainwright, M. J. & Jordan, M. I. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–305 (2008).

Article

Google Scholar

Zhang, H. et al. Predicting protein inter-residue contacts using composite likelihood maximization and deep learning. BMC Bioinform. 20, 537 (2019).

Article

Google Scholar

Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. & Aurell, E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E 87, 012707 (2013).

Article

Google Scholar

Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, 1293–1301 (2011).

Article

Google Scholar

Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).

Article

Google Scholar

Henikoff, S. & Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA 89, 10915–10919 (1992).

Article

Google Scholar

Wang, W., Peng, Z. & Yang, J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat. Comput. Sci. 2, 804–814 (2022).

Article

Google Scholar

Chowdhury, R. et al. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. 40, 1617–1623 (2022).

Article

Google Scholar

Sakuma, K., Koike, R. & Ota, M. Dual-wield NTPases: a novel protein family mined from AlphaFold DB. Protein Science. 33, e4934 (2024).

Article

Google Scholar

Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, 439–444 (2022).

Article

Google Scholar

Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).

Article

Google Scholar

Shin, J.-E. et al. Protein design and variant prediction using autoregressive generative models. Nat. Commun. 12, 2403 (2021).

Article

Google Scholar

Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

Article

Google Scholar

Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).

Article

Google Scholar

Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. In Proc. of Advances in Neural Information Processing Systems (eds Ranzato, M. et al.) 29287–29303 (NeurlPS, 2021).

Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In Proc. of the 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 16990–17017 (PMLR, 2022).

Rao, R. M. et al. MSA transformer. In Proc. of the 38th International Conference on Machine Learning (eds Meila, M and Zhang, T.) 8844–8856 (PMLR, 2021).

Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).

Article

Google Scholar

Kotler, E. et al. A systematic p53 mutation library links differential functional impact to cancer mutation pattern and evolutionary conservation. Mol. Cell 71, 178–1908 (2018).

Article

Google Scholar

Mighell, T. L., Evans-Dutson, S. & O’Roak, B. J. A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotype relationships. Am. J. Hum. Genet. 102, 943–955 (2018).

Article

Google Scholar

Jia, X. et al. Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk. Am. J. Hum. Genet. 108, 163–175 (2021).

Article

Google Scholar

Pan, X. et al. Structure of the human voltage-gated sodium channel Nav1.4 in complex with beta1. Science 362, 2486 (2018).

Article

Google Scholar

Hennig, M., Darimont, B., Sterner, R., Kirschner, K. & Jansonius, J. N. 2.0 Å structure of indole-3-glycerol phosphate synthase from the hyperthermophile Sulfolobus solfataricus: possible determinants of protein stability. Structure 3, 1295–1306 (1995).

Article

Google Scholar

Banerjee, S. et al. Protonation state of an important histidine from high resolution structures of lytic polysaccharide monooxygenases. Biomolecules https://doi.org/10.3390/biom12020194 (2022).

Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

Article

Google Scholar

Leman, J. K. et al. Macromolecular modeling and design in rosetta: recent methods and frameworks. Nat. Methods 17, 665–680 (2020).

Article
MathSciNet

Google Scholar

Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01618-2 (2023).

Hie, B. L. et al. Efficient evolution of human antibodies from general protein language models. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01763-2 (2023).

Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).

Article

Google Scholar

Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 48, 570–578 (2020).

Google Scholar

Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, 170–176 (2017).

Article

Google Scholar

Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173–175 (2012).

Article

Google Scholar

Johnson, L. S., Eddy, S. R. & Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform. 11, 431 (2010).

Article

Google Scholar

Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. of the International Conference on Learning Representations (eds Bengio, Y. et al.) 210–219, (ICLR 2015).

Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Proc. of Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 8024–8035 (NeurlPS, 2019).

Ren, M., Yu, C., Bu, D. & Zhang, H. Accurate and robust protein sequence design with Carbondesign. Code Ocean https://doi.org/10.24433/CO.5915382.v2 (2024).