Home
Publications
Projects
Talks
Contact
Light
Dark
Automatic
Variational Autoencoders
Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints
This study examines three unsupervised dimensionality reduction techniques (PCA, UMAP, and VAEs) for toxicology classification tasks. The research compares these embedding methods against standard molecular fingerprint models and explores transfer learning by training embedders on external chemical compound datasets. By testing various embedding dimensions and external dataset sizes, the findings demonstrate that UMAP can effectively complement established techniques like PCA and VAE for pre-compression in toxicology. However, VAE’s generative approach shows superior performance in pre-compression for classification accuracy.
Mario Lovrić
,
Tomislav Đuričić
,
Han T.N. Tran
,
Hussain Hussain
,
Emanuel Lacić
,
Morten A. Rasmussen
,
Roman Kern
PDF
Cite
Project
DOI
Cite
×