Artificial intelligence in chemistry

Hardly any other term has made such waves in 2023 as AI (Artificial Intelligence). Deep neural networks in the form of large language models have shown amazing performance in question-and-answer games, in generating text and even in solving exam questions. In science, applications in medical image interpretation or 3D structure prediction of proteins have made headlines.

In chemistry, AI technologies of varying complexity have already been integrated for some time. However, not all of these technologies are always based on artificial intelligence. A useful distinction is to speak of AI from the use of deep neural networks onwards. This development offers a wide range of applications, such as

  • Predicting chemical properties: AI algorithms can analyze large amounts of chemical data and make predictions about properties such as reactivity, solubility and toxicity. This facilitates the identification of promising candidates for drugs, catalysts and materials.
  • Rational drug design: By combining AI with computer-aided design, researchers can specifically design molecules that address specific biological targets. This speeds up the drug development process while minimizing costs and resource consumption (https://link.springer.com/chapter/10.1007/978-3-658-33597-7_6)
  • Automated synthesis planning: AI can analyze and optimize complex synthesis routes to identify efficient ways to produce target compounds. By integrating machine learning, the synthesis process can be further improved by taking into account previous experience and reaction trends. (https://www.nature.com/articles/nature25978)
  • Material design and optimization: In materials science, AI enables the prediction of new materials with tailored properties for applications in areas such as energy, electronics and catalysis. This accelerates the development of innovative materials and contributes to solving global challenges.

All these application examples have one thing in common: large data sets. The neural networks always have to be trained with very large amounts of openly accessible, structured data.

Where AI has been successful, for example with Alphafold in the field of protein structure prediction, large amounts of data were openly available, e.g. in the Protein Data Bank. In chemistry, however, large open databases tend to be an exception. This decades-long lack of open, curated research data, caused by a widespread culture of fear of error, unwillingness to share or ignorance, is one of the biggest barriers to the use of AI in chemistry.

See also:

Counter-movements have long since emerged. Plan S is an international association of funders who make open access publication a condition of their funding, even the DFG expects this. With the National Research Data Infrastructure, financed by the DFG and implemented by the NFDI e.V. and its subject-specific consortia such as NFDI4Chem, work is underway to create not only the cultural but also the technical possibilities to process chemical data digitally over the entire data lifecycle.

And because open data is not necessarily FAIR (findable, accessible, interoperable, reusable) data, we offer many workshops and other training courses to ensure that data is structured and machine-readable.

AI will not provide all the answers. But it will significantly shorten the path to knowledge in many relevant areas of chemistry. If you want to benefit from the possibilities of this increase in knowledge in the future, you can make your contribution now and publish your research data openly and FAIRLY. Always remember the title of a presentation by our member Paul Czodrowski (held at the Chemistry Data Days 2023 in Mainz): “No Data, no AI-Party” 😉