An evolutionary algorithm for interpretable molecular representations
Pflüger, Philipp M.; Kühnemund, Marius; Katzenburg, Felix; Kuchen, Herbert; Glorius, Frank
Zusammenfassung
Encoding molecular structures into a computer-readable, utilizable format is the key step for any machine learning application in all chemical sciences. Current representations vary strongly in complexity and shape, depending on the application. Therefore, the number of domain-specific representations is rapidly growing, with somebeing altered and retuned constantly. These tailored representations raise the barriers for entry and method adaption, thus decelerating progress in application. Herein, we present a general algorithm capable of yielding a highly specific representation solely based on a given dataset. The algorithm utilizes structural queries and evolutionary methodologies to generate interpretable molecular fingerprints. These are highly suited for molecular machine learning, enabling the accurate prediction of reactivity, property, and biological activity. We demonstrate its native interpretability, allowing for the extraction of knowledge, such as reactivity trends. We anticipate that the evolutionary multipattern fingerprint (EvoMPF) will be used to discover structure-target relationships in different molecular sciences.
Schlüsselwörter
molecular machine learning; molecular representation; evolutionary algorithm; reaction prediction; QAPR; explainable AI