TY - JOUR
T1 - High-precision prediction of fluorescence wavelength of organic based on ensemble automatic machine learning method and online querying
AU - Zhao, Shao
AU - Li, Jiadong
AU - Wu, Lingjun
AU - Zheng, Xiaoyan
AU - Tang, Anping
AU - Liu, Wanqiang
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/11
Y1 - 2025/11
N2 - Organic fluorescence is extensively applied in biomedical imaging, chemical sensing, and environmental monitoring etc. However, the traditional trial-and-error method for measuring the wavelength of organic fluorescent molecules is both time-consuming and labour-intensive. Ensemble automated machine learning (AutoML) methods provide a convenient way to evaluate the fluorescence properties of organics. In this work, we constructed a comprehensive fluorescence database containing 24798 organic fluorescent compounds. The maximum emission wavelengths (λem) of these compounds range from 240 nm to 1200 nm. The database was built based on recent peer-reviewed publications. Molecular structures were standardized, and duplicate entries were removed. This dataset were used for machine learning and to build predictive models. Among the prediction models for fluorescence maximum λem were built using the AutoGluon, the WeightedEnsemble_L2 model performed the best, with a mean absolute error (MAE) of 10 nm on the testing. Shapley additive explanation (SHAP) analysis revealed critical molecular descriptors governing λem, offering actionable insights for molecular engineering. The model was deployed as an open-access web platform (http://predixct-ednk9cynnprgqjbmskl95f.streamlit.app), enabling rapid screening of fluorophores for optoelectronic and sensing applications. This work bridges the gap between data-driven design and experimental synthesis, providing a robust tool to accelerate the development of tailored fluorescent probes for chemical sensing, bioimaging, and optical diagnostics.
AB - Organic fluorescence is extensively applied in biomedical imaging, chemical sensing, and environmental monitoring etc. However, the traditional trial-and-error method for measuring the wavelength of organic fluorescent molecules is both time-consuming and labour-intensive. Ensemble automated machine learning (AutoML) methods provide a convenient way to evaluate the fluorescence properties of organics. In this work, we constructed a comprehensive fluorescence database containing 24798 organic fluorescent compounds. The maximum emission wavelengths (λem) of these compounds range from 240 nm to 1200 nm. The database was built based on recent peer-reviewed publications. Molecular structures were standardized, and duplicate entries were removed. This dataset were used for machine learning and to build predictive models. Among the prediction models for fluorescence maximum λem were built using the AutoGluon, the WeightedEnsemble_L2 model performed the best, with a mean absolute error (MAE) of 10 nm on the testing. Shapley additive explanation (SHAP) analysis revealed critical molecular descriptors governing λem, offering actionable insights for molecular engineering. The model was deployed as an open-access web platform (http://predixct-ednk9cynnprgqjbmskl95f.streamlit.app), enabling rapid screening of fluorophores for optoelectronic and sensing applications. This work bridges the gap between data-driven design and experimental synthesis, providing a robust tool to accelerate the development of tailored fluorescent probes for chemical sensing, bioimaging, and optical diagnostics.
KW - AutoGluon
KW - Automatic machine learning
KW - Chemical sensor design
KW - Emission wavelength
KW - Organic fluorescent
UR - http://www.scopus.com/pages/publications/105010106867
U2 - 10.1016/j.dyepig.2025.113012
DO - 10.1016/j.dyepig.2025.113012
M3 - Review article
AN - SCOPUS:105010106867
SN - 0143-7208
VL - 242
JO - Dyes and Pigments
JF - Dyes and Pigments
M1 - 113012
ER -