Menu
in

Leveraging large language models for drug molecule translation #innovation

Summarise this content to 300 words

  • Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853. https://doi.org/10.1001/jama.2020.1166 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Decker, S. & Sausville, E. A. Chapter 28: Drug discovery. in Principles of Clinical Pharmacology (Second Edition) (eds Atkinson, A. J., Abernethy, D. R., Daniels, C. E., Dedrick, R. L. & Markey, S. P.) (Academic Press, 2007), editionsecond edition edn. 439–447. https://doi.org/10.1016/B978-012369417-1/50068-7

  • Schneider, G. Automating drug discovery. Nat. Rev. Drug Discov. 17, 97–113. https://doi.org/10.1038/nrd.2017.232 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Sadybekov, A. V. & Katritch, V. Computational approaches streamlining drug discovery. Nature 616, 673–685. https://doi.org/10.1038/s41586-023-05905-z (2023).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Mehta, S. S. Commercializing Successful Biomedical Technologies (PublisherCambridge University Press, 2008).

    Book 

    Google Scholar
     

  • Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) 1877–1901 (Curran Associates Inc, 2020).


    Google Scholar
     

  • OpenAI et al. Gpt-4 technical report (2023). arXiv: 2303.08774

  • Touvron, H. et al. Llama: Open and efficient foundation language models (2023). arXiv: 2302.13971

  • Jiang, A. Q. et al. Mixtral of experts (2024). arXiv: 2401.04088

  • Porter, J. Chatgpt continues to be one of the fastest-growing services ever. https://www.theverge.com/2023/11/6/23948386/chatgpt-active-user-count-openai-developer-conference (2023). Accessed 31 Jan 2024.

  • Hu, K. Chatgpt sets record for fastest-growing user base: Analyst note. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/ (2023). Accessed 31 Jan 2024.

  • Chung, J., Kamar, E. & Amershi, S. Increasing diversity while maintaining accuracy: Text data generation with large language models and human interventions. in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) (Association for Computational Linguistics, 2023) 575–593. https://doi.org/10.18653/v1/2023.acl-long.34

  • Lee, N. et al. Factuality enhanced language models for open-ended text generation. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 34586–34599 (Curran Associates Inc, 2022).


    Google Scholar
     

  • Moslem, Y., Haque, R., Kelleher, J. D. & Way, A. Adaptive machine translation with large language models. in Proceedings of the 24th Annual Conference of the European Association for Machine Translation, (eds Nurminen, M. et al.) 227–237 (European Association for Machine Translation, 2023).

  • Mu, Y. et al. Augmenting large language model translators via translation memories. In Findings of the Association for Computational Linguistics: ACL 2023, 10287–10299, (Association for Computational Linguistics (eds Rogers, A. et al.) (2023). https://doi.org/10.18653/v1/2023.findings-acl.653.

    Chapter 

    Google Scholar
     

  • Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180. https://doi.org/10.1038/s41586-023-06291-2 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yu, X., Chen, Z. & Lu, Y. Harnessing LLMs for temporal data: A study on explainable financial time series forecasting. in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track (eds Wang, M. & Zitouni, I.) 739–753 (Association for Computational Linguistics, 2023). https://doi.org/10.18653/v1/2023.emnlp-industry.69

  • Gomez-Rodriguez, C. & Williams, P. A confederacy of models: A comprehensive evaluation of LLMs on creative writing. In Findings of the Association for Computational Linguistics: EMNLP 2023, 14504–14528 (eds Bouamor, H. et al.) (Association for Computational Linguistics, 2023). https://doi.org/10.18653/v1/2023.findings-emnlp.966.

    Chapter 

    Google Scholar
     

  • Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578. https://doi.org/10.1038/s41586-023-06792-0 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Weininger, D. Smiles, A chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36. https://doi.org/10.1021/ci00057a005 (1988).

    Article 
    CAS 

    Google Scholar
     

  • Lv, Q., Chen, G., Yang, Z., Zhong, W. & Chen, C.Y.-C. Meta learning with graph attention networks for low-data drug discovery. IEEE Trans. Neural Netw. Learn. Syst.https://doi.org/10.1109/TNNLS.2023.3250324 (2023).

    Article 
    PubMed 

    Google Scholar
     

  • Lv, Q., Chen, G., Yang, Z., Zhong, W. & Chen, C.Y.-C. Meta-molnet: A cross-domain benchmark for few examples drug discovery. IEEE Trans. Neural Netw. Learn. Syst.https://doi.org/10.1109/TNNLS.2024.3359657 (2024).

    Article 
    PubMed 

    Google Scholar
     

  • Paul, D. et al. Artificial intelligence in drug discovery and development. Drug Discov. Today 26, 80–93. https://doi.org/10.1016/j.drudis.2020.10.010 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bagal, V., Aggarwal, R., Vinod, P. K. & Priyakumar, U. D. Molgpt: Molecular generation using a transformer-decoder model. J. Chem. Inf. Model. 62, 2064–2076. https://doi.org/10.1021/acs.jcim.1c00600 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lu, J. & Zhang, Y. Unified deep learning model for multitask reaction predictions with explanation. J. Chem. Inf. Model. 62, 1376–1387. https://doi.org/10.1021/acs.jcim.1c01467 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Edwards, C. et al. Translation between molecules and natural language. in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processinghttps://doi.org/10.18653/v1/2022.emnlp-main.26 (2022).

  • Méndez-Lucio, O., Baillif, B., Clevert, D.-A., Rouquié, D. & Wichard, J. . De. novo generation of hit-like molecules from gene expression signatures using artificial intelligence. Nat. Commun.https://doi.org/10.1038/s41467-019-13807-w (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Merk, D., Friedrich, L., Grisoni, F. & Schneider, G. . De. novo design of bioactive small molecules by artificial intelligence. Mol. Inform.https://doi.org/10.1002/minf.201700153 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Han, X., Xie, R., Li, X. & Li, J. Smilegnn: Drug–drug interaction prediction based on the smiles and graph neural network. Life 12, 319. https://doi.org/10.3390/life12020319 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Self-referencing embedded strings (selfies): A 100% robust molecular string representation. Mach. Learn.: Sci. Technol. 1, 045024. https://doi.org/10.1088/2632-2153/aba947 (2020).

    Article 

    Google Scholar
     

  • Lv, Q. et al. Tcmbank: Bridges between the largest herbal medicines, chemical ingredients, target proteins, and associated diseases with intelligence text mining. Chem. Sci. 14, 10684–10701. https://doi.org/10.1039/d3sc02139d (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lv, Q. et al. Tcmbank-the largest tcm database provides deep learning-based Chinese-western medicine exclusion prediction. Sign. Transduct. Target. Ther.https://doi.org/10.1038/s41392-023-01339-1 (2023).

    Article 

    Google Scholar
     

  • Lv, Q., Chen, G., Zhao, L., Zhong, W. & Yu-Chian Chen, C. Mol2Context-vec: Learning molecular representation from context awareness for drug discovery. Brief. Bioinform. 22, bbab317. https://doi.org/10.1093/bib/bbab317 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lv, Q., Zhou, J., Yang, Z., He, H. & Chen, C.Y.-C. 3d graph neural network with few-shot learning for predicting drug-drug interactions in scaffold-based cold start scenario. Neural Netw. 165, 94–105. https://doi.org/10.1016/j.neunet.2023.05.039 (2023).

    Article 
    PubMed 

    Google Scholar
     

  • Luo, H. et al. Drug-drug interactions prediction based on deep learning and knowledge graph: A review. iScience 27, 109148. https://doi.org/10.1016/j.isci.2024.109148 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Morgan, H. L. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chem. Document. 5, 107–113. https://doi.org/10.1021/c160017a018 (1965).

    Article 
    CAS 

    Google Scholar
     

  • Capecchi, A., Probst, D. & Reymond, J.-L. One molecular fingerprint to rule them all: Drugs, biomolecules, and the metabolome. J. Cheminform.https://doi.org/10.1186/s13321-020-00445-4 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wigh, D. S., Goodman, J. M. & Lapkin, A. A. A review of molecular representation in the age of machine learning. WIREs Comput. Mol. Sci. 12, e1603. https://doi.org/10.1002/wcms.1603 (2022).

    Article 

    Google Scholar
     

  • Jaeger, S., Fulle, S. & Turk, S. Mol2vec: Unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 58, 27–35. https://doi.org/10.1021/acs.jcim.7b00616 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. in International Conference on Learning Representations (2013).

  • Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186 (Association for Computational Linguistics, 2019). https://doi.org/10.18653/v1/N19-1423

  • Fabian, B. et al. Molecular representation learning with language models and domain-relevant auxiliary tasks. in Machine Learning for Molecules (2020).

  • Chithrananda, S., Grand, G. & Ramsundar, B. Large-scale self-supervised pretraining for molecular property prediction, Chemberta (2020).

  • Yamada, M. & Sugiyama, M. Molecular graph generation by decomposition and reassembling. ACS Omega 8, 19575–19586. https://doi.org/10.1021/acsomega.3c01078 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ganea, O. et al. Geomol: Torsional geometric generation of molecular 3d conformer ensembles. In Advances in Neural Information Processing Systems Vol. 34 (eds Ranzato, M. et al.) 13757–13769 (Curran Associates Inc, 2021).


    Google Scholar
     

  • Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: A method for automatic evaluation of machine translation. in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, 311-318 (Association for Computational Linguistics, 2002). https://doi.org/10.3115/1073083.1073135

  • Miller, F. P., Vandome, A. F. & McBrewster, J. Levenshtein Distance: Information Theory, Computer Science, String (Computer Science), String Metric, Damerau?Levenshtein Distance (Hamming Distance (Alpha Press, Spell Checker, 2009).

  • Tanimoto, T. An Elementary Mathematical Theory of Classification and Prediction (International Business Machines Corporation, 1958).


    Google Scholar
     

  • Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of mdl keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280. https://doi.org/10.1021/ci010132r (2002).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Schneider, N., Sayle, R. A. & Landrum, G. A. Get your atoms in order-an open-source implementation of a novel and robust molecular canonicalization algorithm. J. Chem. Inf. Model. 55, 2111–2120. https://doi.org/10.1021/acs.jcim.5b00543 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754. https://doi.org/10.1021/ci100050t (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S. & Klambauer, G. Fréchet chemnet distance: A metric for generative models for molecules in drug discovery. J. Chem. Inf. Model. 58, 1736–1741. https://doi.org/10.1021/acs.jcim.8b00234 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Edwards, C., Zhai, C. & Ji, H. Text2Mol: Cross-modal molecule retrieval with natural language queries. in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F., Huang, X., Specia, L. & Yih, S. W.-T.) 595–607 (Association for Computational Linguistics, Online and Punta Cana, 2021). https://doi.org/10.18653/v1/2021.emnlp-main.47

  • Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, 74–81 (Association for Computational Linguistics, 2004).

  • Lin, C.-Y. & Hovy, E. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 150–157 (2003).

  • Lin, C.-Y. & Och, F. J. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. in Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), 605–612 (2004). https://doi.org/10.3115/1218955.1219032

  • Banerjee, S. & Lavie, A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (eds Goldstein, J., Lavie, A., Lin, C.-Y. & Voss, C.) 65–72 (Association for Computational Linguistics, 2005).

  • Thoppilan, R. et al. Lamda: Language models for dialog applications (2022). arXiv: 2201.08239.

  • Liu, C.-W. et al. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (eds Su, J., Duh, K. & Carreras, X.) 2122–2132 (Association for Computational Linguistics, 2016). https://doi.org/10.18653/v1/D16-1230

  • Abbasian, M. et al. Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative ai (2024). arXiv: 2309.12444.

  • Gu, A. & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces (2023). arXiv: 2312.00752.

  • Wishart, D. S. Drugbank: A comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res.https://doi.org/10.1093/nar/gkj067 (2006).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Davies, M. et al. Chembl web services: Streamlining access to drug discovery data and utilities. Nucleic Acids Res.https://doi.org/10.1093/nar/gkv352 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1–67 (2020).

    MathSciNet 

    Google Scholar
     

  • Sterling, T. & Irwin, J. J. Zinc 15- ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337. https://doi.org/10.1021/acs.jcim.5b00559 (2015).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Adilov, S. Generative pre-training from molecules. ChemRxivhttps://doi.org/10.33774/chemrxiv-2021-5fwjd (2021).

  • Source link

    Source link: https://www.nature.com/articles/s41598-024-61124-0

    Leave a Reply

    Exit mobile version