in

# Accurate protein sequence design with CarbonDesign for robust results

Accurate and robust protein sequence design with CarbonDesign

Summarise this content to 300 words

  • Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020).

    Article 

    Google Scholar
     

  • Bryan, C. M. et al. Computational design of a synthetic PD-1 agonist. Proc. Natl Acad. Sci. USA 118, 2102164118 (2021).

    Article 

    Google Scholar
     

  • Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).

    Article 

    Google Scholar
     

  • Dou, J. et al. De novo design of a fluorescence-activating beta-barrel. Nature 561, 485–491 (2018).

    Article 

    Google Scholar
     

  • Vorobieva, A. A. et al. De novo design of transmembrane beta barrels. Science 371, 8182 (2021).

    Article 

    Google Scholar
     

  • Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

    Article 

    Google Scholar
     

  • Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature https://doi.org/10.1038/s41586-023-06415-8 (2023).

  • Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. In Proc. of the 40th International Conference on Machine Learning (eds Krause, A. et al.) 40001–40039 (PMLR, 2023).

  • Ingraham, J. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

    Article 

    Google Scholar
     

  • Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).

    Article 

    Google Scholar
     

  • Hsu, C. et al. Learning inverse folding from millions of predicted structures. In Proc. of the 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 8946–8970 (PMLR, 2022).

  • Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 746 (2022).

    Article 

    Google Scholar
     

  • Liu, Y. et al. Rotamer-free protein sequence design based on deep learning and self-consistency. Nat. Comput. Sci. 2, 451–462 (2022).

    Article 

    Google Scholar
     

  • Huang, B. et al. Accurate and efficient protein sequence design through learning concise local environment of residues. Bioinformatics 39, 122 (2023).

    Article 

    Google Scholar
     

  • Ingraham, J. et al. Generative models for graph-based protein design. In Proc. of Advances in Neural Information Processing Systems (eds Wallach, H. et al) 15820–15831 (NeurlPS, 2019).

  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article 

    Google Scholar
     

  • Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article 

    Google Scholar
     

  • Carreira, J. et al. Human pose estimation with iterative error feedback. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (eds Bajcsy, R. et al.) 4733–4742 (IEEE, 2016).

  • Tu, Z. & Bai, X. Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1744–1757 (2010).

    Article 

    Google Scholar
     

  • Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article 
    MathSciNet 

    Google Scholar
     

  • Robin, X. et al. Continuous Automated Model EvaluatiOn (CAMEO)—perspectives on the future of fully automated evaluation of structure prediction methods. Proteins 89, 1977–1986 (2021).

    Article 

    Google Scholar
     

  • CASP15. Critical Assessment of Techniques for Protien Structure Prediction, 15th Round. Abstract Book (Protein Structure Prediction Center, 2022); https://predictioncenter.org/casp15/doc/CASP15_Abstracts.pdf

  • Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, 1988).

  • Wainwright, M. J. & Jordan, M. I. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–305 (2008).

    Article 

    Google Scholar
     

  • Zhang, H. et al. Predicting protein inter-residue contacts using composite likelihood maximization and deep learning. BMC Bioinform. 20, 537 (2019).

    Article 

    Google Scholar
     

  • Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. & Aurell, E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E 87, 012707 (2013).

    Article 

    Google Scholar
     

  • Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, 1293–1301 (2011).

    Article 

    Google Scholar
     

  • Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).

    Article 

    Google Scholar
     

  • Henikoff, S. & Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA 89, 10915–10919 (1992).

    Article 

    Google Scholar
     

  • Wang, W., Peng, Z. & Yang, J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat. Comput. Sci. 2, 804–814 (2022).

    Article 

    Google Scholar
     

  • Chowdhury, R. et al. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. 40, 1617–1623 (2022).

    Article 

    Google Scholar
     

  • Sakuma, K., Koike, R. & Ota, M. Dual-wield NTPases: a novel protein family mined from AlphaFold DB. Protein Science. 33, e4934 (2024).

    Article 

    Google Scholar
     

  • Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, 439–444 (2022).

    Article 

    Google Scholar
     

  • Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).

    Article 

    Google Scholar
     

  • Shin, J.-E. et al. Protein design and variant prediction using autoregressive generative models. Nat. Commun. 12, 2403 (2021).

    Article 

    Google Scholar
     

  • Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article 

    Google Scholar
     

  • Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).

    Article 

    Google Scholar
     

  • Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. In Proc. of Advances in Neural Information Processing Systems (eds Ranzato, M. et al.) 29287–29303 (NeurlPS, 2021).

  • Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In Proc. of the 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 16990–17017 (PMLR, 2022).

  • Rao, R. M. et al. MSA transformer. In Proc. of the 38th International Conference on Machine Learning (eds Meila, M and Zhang, T.) 8844–8856 (PMLR, 2021).

  • Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).

    Article 

    Google Scholar
     

  • Kotler, E. et al. A systematic p53 mutation library links differential functional impact to cancer mutation pattern and evolutionary conservation. Mol. Cell 71, 178–1908 (2018).

    Article 

    Google Scholar
     

  • Mighell, T. L., Evans-Dutson, S. & O’Roak, B. J. A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotype relationships. Am. J. Hum. Genet. 102, 943–955 (2018).

    Article 

    Google Scholar
     

  • Jia, X. et al. Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk. Am. J. Hum. Genet. 108, 163–175 (2021).

    Article 

    Google Scholar
     

  • Pan, X. et al. Structure of the human voltage-gated sodium channel Nav1.4 in complex with beta1. Science 362, 2486 (2018).

    Article 

    Google Scholar
     

  • Hennig, M., Darimont, B., Sterner, R., Kirschner, K. & Jansonius, J. N. 2.0 Å structure of indole-3-glycerol phosphate synthase from the hyperthermophile Sulfolobus solfataricus: possible determinants of protein stability. Structure 3, 1295–1306 (1995).

    Article 

    Google Scholar
     

  • Banerjee, S. et al. Protonation state of an important histidine from high resolution structures of lytic polysaccharide monooxygenases. Biomolecules https://doi.org/10.3390/biom12020194 (2022).

  • Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    Article 

    Google Scholar
     

  • Leman, J. K. et al. Macromolecular modeling and design in rosetta: recent methods and frameworks. Nat. Methods 17, 665–680 (2020).

    Article 
    MathSciNet 

    Google Scholar
     

  • Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01618-2 (2023).

  • Hie, B. L. et al. Efficient evolution of human antibodies from general protein language models. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01763-2 (2023).

  • Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).

    Article 

    Google Scholar
     

  • Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 48, 570–578 (2020).


    Google Scholar
     

  • Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, 170–176 (2017).

    Article 

    Google Scholar
     

  • Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173–175 (2012).

    Article 

    Google Scholar
     

  • Johnson, L. S., Eddy, S. R. & Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform. 11, 431 (2010).

    Article 

    Google Scholar
     

  • Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. of the International Conference on Learning Representations (eds Bengio, Y. et al.) 210–219, (ICLR 2015).

  • Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Proc. of Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 8024–8035 (NeurlPS, 2019).

  • Ren, M., Yu, C., Bu, D. & Zhang, H. Accurate and robust protein sequence design with Carbondesign. Code Ocean https://doi.org/10.24433/CO.5915382.v2 (2024).

  • Source link

    Source link: https://www.nature.com/articles/s42256-024-00838-2

    What do you think?

    Leave a Reply

    GIPHY App Key not set. Please check settings

    Top-performing mobile apps: What types are dominating the market? #MobileAppTrends

    Google Gemini

    Google integrates AI chatbot for Gemini users to control YouTube. #AIintegration