Evolutionary algorithm generates tailored “molecular fingerprints”
Research team develops an improved method for explaining machine predictions of chemical reactions
In order to use machine learning, researchers must first convert the molecules into a computer-readable form. Many research groups have already tackled this problem, and consequently, there are various ways of performing this task. However, it is difficult to predict which of the available methods is best suited to answer a specific question – for example, to determine whether a chemical compound is harmful to humans. The new algorithm is designed to help find the optimal molecular fingerprint in each case. To do this, the algorithm gradually selects the molecular fingerprints that achieve the best results in the prediction from many randomly generated molecular fingerprints. “Following the example of nature, we use mutations, i.e. random changes to individual components of the fingerprints, or recombine components of two fingerprints,” explains doctoral student Felix Katzenburg.
“In other studies, molecules are often described by quantifiable properties that have been selected and calculated by humans,” adds Frank Glorius. “Since the algorithm we developed automatically identifies the relevant molecular structures, there are no systematic biases caused by human experts.” Another advantage is that the method of encoding makes it possible to understand why a model makes a certain prediction. For example, it is possible to draw conclusions about which parts of a molecule positively or negatively impact the prediction of how a reaction would play out, allowing researchers to change the relevant structures in a targeted manner.
The Münster team found that their new method did not always achieve the most optimal results. “When considerable human expertise has gone into selecting particularly relevant molecular properties or very large amounts of data are available, other methods such as neural networks sometimes have the edge,” acknowledges Felix Katzenburg. However, one of the study’s primary goals was to develop a method for encoding molecules that can be applied to any molecular data set and does not require expert knowledge of the underlying relationships.
Original publication
Other news from the department science
Get the chemical industry in your inbox
By submitting this form you agree that LUMITOS AG will send you the newsletter(s) selected above by email. Your data will not be passed on to third parties. Your data will be stored and processed in accordance with our data protection regulations. LUMITOS may contact you by email for the purpose of advertising or market and opinion surveys. You can revoke your consent at any time without giving reasons to LUMITOS AG, Ernst-Augustin-Str. 2, 12489 Berlin, Germany or by e-mail at revoke@lumitos.com with effect for the future. In addition, each email contains a link to unsubscribe from the corresponding newsletter.