Revolutionizing Molecular Diagnostics: Chong Li, Georgia Tech, and the Convergence of Machine Learning with Surface-Enhanced Spectroscopies
Molecular detection and screening have become indispensable in healthcare, particularly highlighted by the COVID-19 pandemic. Surface-enhanced spectroscopy techniques, such as Surface-Enhanced Raman Scattering (SERS) and surface-enhanced infrared absorption (SEIRA), offer direct insights into molecular constituents, chemical bonds, and configurations through lattice and molecular vibrational fingerprint information. This makes them invaluable tools for non-destructive, label-free molecular diagnostics and screening. However, the escalating complexity of molecular diagnostics, driven by the increasing variety of molecular species, rapid viral spread, and stringent demands for accuracy and sensitivity, presents significant challenges.
The integration of artificial intelligence (AI) and machine learning (ML) techniques holds immense potential for enhancing SERS and SEIRA, enabling rapid analysis and automated data processing to address these challenges effectively. This article explores the synergistic relationship between ML and SERS/SEIRA, examining how ML algorithms can augment these spectroscopic techniques, detailing the general process of their integration, highlighting applications in molecular diagnostics and screening, and providing a perspective on future developments in ML-integrated SEIRA/SERS. This review summarizes the integration of machine learning with surface-enhanced Raman scattering and infrared absorption in terms of concepts, processes, and applications, and provides an outlook on the future development of this technology.
Surface-Enhanced Spectroscopy: A Foundation for Molecular Analysis
Surface-enhanced spectroscopy encompasses several techniques, including Surface-Enhanced Raman Scattering (SERS) spectroscopy, surface-enhanced infrared absorption spectroscopy (SEIRA) spectroscopy, and surface-enhanced fluorescence (SEF). SERS and SEIRA provide lattice and molecular vibrational fingerprint information, respectively, which is directly related to the molecular constituents, chemical bonds, and configuration. This correlation makes them powerful analytical tools for unambiguous, nondestructive, and label-free detection of substances in biology, medicine, electrochemistry, catalysis, materials science, etc. Since the intrinsic mechanisms of SEF and the other two are quite different, we will not discuss them in this review.
The discovery of SERS stems from the unprecedentedly intense Raman spectra of molecules adsorbed on specially prepared roughened silver electrodes, as demonstrated by Fleischmann et al. in 1974. This stronger-than-expected spectrum was further investigated and carefully calculated as a million-fold enhancement by Van Duyne et al. in 1977 and then it was dubbed the SERS effect. Serval years later (in 1980), a similar phenomenon in infrared spectroscopy was observed by Hartstein et al. using films of randomly distributed silver nanoparticles in the attenuated-total-reflection (ATR) setup, which is known as SEIRA. Since the underlying mechanism of SEIRA and SERS is the interaction between molecules and plasmonic resonance, their excitation is geometry-dependent and their substrate dimensions are on the micro-/nano scale. Along with the rapid development of nanofabrication and nano-synthesis techniques in recent years, SEIRA and SERS technologies have advanced rapidly and various types of substrates and applications were demonstrated, such as single-molecule (SM) SERS, Tip-Enhanced Raman Spectroscopy (TERS), scattering-type scanning near-field optical microscope (s-SNOM), shell-isolated nanoparticle-enhanced Raman spectroscopy (SHINERS), resonant SEIRA, graphene-based SEIRA, and more.
The Challenges in Surface-Enhanced Spectroscopy
Despite their potential, SERS and SEIRA face several challenges:
Read also: UCLA's Chong Liu and her multifaceted research
- Data Volume and Complexity: Processing vast amounts of spectral data, especially in applications involving multiple biomarkers or viruses, is time-consuming and complex. The volume of spectral data is inevitably ever-increasing with the development of sophisticated SERS/SEIRA-based applications. Furthermore, the processing of each set of spectral data is also complex and time-consuming, which generally includes normalization, baseline calibration, and feature signal extraction.
- Spectral Overlapping: The overlapping of spectra from different molecules limits the application scope of SERS and SEIRA. For instance, almost all kinds of protein molecules suffer from IR spectral overlapping between 1600 and 1700 cm−1, where the amide I and amide II vibration bands are located (proteins are special types of amides).
- Anomalies and Artifacts: Instrumental effects, sample variations, and contamination introduce anomalies and artifacts, reducing accuracy, stability, and reliability. Factors that cause anomalies and artifacts are various, ranging from instrumental effects, sample effects, to contamination in sampling procedures. More specifically, the instrumental effects include shifts in the wavenumber scale, multi-passing errors, detector effects, noise effects, dark noise, etc. The sample effects contain sample heating, fluorescence interference in Raman spectra, and matrix absorption. The contamination in sampling procedures includes ambient lighting, air, sample support surfaces, sample containers, and sample movement.
- Substrate Design: Manual design of surface-enhanced spectroscopy substrates is inefficient and time-consuming, as different analytes require customized structures to match molecular vibrations and plasmonic resonances, particularly in SEIRA spectroscopy. Therefore, the automatic design of substrates is highly desirable and helps to facilitate the practical application of the technology.
Machine Learning: A Paradigm Shift in Spectroscopic Analysis
One potential solution to the bottleneck is an algorithm analysis technique, which was widely employed in early chemometrics. Specifically, it is used to analyze and mine chemical data and to design optimal experiments or choose measurement procedures. The well-known algorithms include principal component analysis (PCA), principal component regression (PCR), multiple linear regression (MLR), linear discriminant analysis (LDA), and more. However, the implementation of these algorithms requires the support of high-performance computing. Therefore, the addressing of these issues can not only rely on algorithms, but also requires the assistance of computers.
Artificial intelligence (AI), a branch of computer science focused on enabling machines to perform tasks requiring human intelligence, offers novel strategies for overcoming the challenges in surface-enhanced spectroscopy. Therefore, AI could provide novel strategies for overcoming the challenges faced by surface-enhanced spectroscopy, which also makes common SEIRA and SERS intelligent tools and analysis platforms. One of the basic requirements for AI is learning, and it is generally agreed by most researchers that there is no AI without learning. Therefore, machine learning (ML) is one of the most rapidly developing and significant subfields of AI research. At the very beginning of ML development (1950s-1960s), there are three major branches, that is, symbolic learning proposed by Hunt et al., statistical methods by Nilsson, and neural networks by Rosenblatt. Nowadays, these branches develop advanced methods and can be divided into four categories, that is, classification, regression, clustering, and dimensionality reduction. The algorithms for these branches include support vector machine (SVM), κ-nearest neighbor (κNN), decision tree (DT), convolutional neural network (CNN), k-means, PCA, etc. These algorithms have been well employed in SEIRA and SERS.
Researchers have demonstrated many advantages of ML-augmented SEIRA and SERS over conventional approaches. As noted above, although ML and surface-enhanced spectroscopy are complementary in terms of technical characteristics, ML-augmented SEIRA and SERS are still in their infancy and their efficient combination is challenging. The landmark work of many researchers has greatly advanced the field, but their technical approaches are diverse and their perspectives are somewhat distinct.
Key Benefits of ML in Surface-Enhanced Spectroscopy
- Automated Substrate Design: ML algorithms automate the design of SERS/SEIRA substrates, optimizing sensitivity and matching molecular vibrational frequencies, reducing time-consuming manual iterations. Taking SEIRA's antenna design as an example, its first step is to analyze the infrared spectrum of the analyte molecule and obtain the position of the molecular vibration. Then, an appropriate antenna structure is chosen to excite plasmonic resonances that match the molecular vibrational frequencies. There is a lot of repetitive work involved here. First, the selection of the structural shape is a critical and continuous optimization process. It requires designers to utilize simulation software (such as FDTD solutions) to compare the figures of merit of different shapes, such as sensitivity, enhancement factor, bandwidth, and so on. While design experience can reduce the number of iterations in the process, it could lead to design deviations due to personnel differences. Another iterative work at this stage is to match antenna resonances and molecular vibrations via the tuning of antenna dimensions, since zero detuning allows for more efficient molecule-plasmon coupling. Additionally, a multiband design is necessary to enhance SEIRA's ability to identify molecules if the detection target is a specific molecule in the mixture. Furthermore, the limitations of nanofabrication are also issues to be considered at the device design stage. A good design considering all of these factors takes a lot of effort and time. Fortunately, these time-consuming and repetitive tasks are easy and efficient for ML-assisted design systems. For example, genetic algorithms were used to automatically generate highly sensitive antenna structures that match well with molecular peaks. Furthermore, the number of iterations and machine learning efficiency can be improved by employing the physical constraints of causality to directly learn the response functions of antennas. In common deep neural networks, the function in the hidden layers to output predictions is unknown like a black box. It works but it is unknown how or why it works. By incorporating physical knowledge into hidden layers, the network is able to learn the physical relationships between the input physical parameters.
- Anomaly and Artifact Reduction: Machine learning algorithm also helps SERS/SEIRA reduce anomalies and artifacts. As mentioned earlier, anomalies and artifacts are critical challenges for SERS and SEIRA, which restricts SERS and SEIRA to low accuracy, poor stability and reliability. Factors that cause anomalies and artifacts are various, ranging from instrumental effects, sample effects, to contamination in sampling procedures. More specifically, the instrumental effects include shifts in the wavenumber scale, multi-passing errors, detector effects, noise effects, dark noise, etc. The sample effects contain sample heating, fluorescence interference in Raman spectra, and matrix absorption. The contamination in sampling procedures includes ambient lighting, air, sample support surfaces, sample containers, and sample movement.
Linglingzhi Zhu: A Pioneer in ML-Augmented Spectroscopy
Linglingzhi Zhu, currently a Postdoctoral Fellow at the H. Milton Stewart School of Industrial and Systems Engineering (ISyE), Georgia Institute of Technology, exemplifies the cutting edge of research in this area. Working with Professor Yao Xie, and collaborating with Professor Xiuyuan Cheng at Duke University, Zhu's work focuses on leveraging machine learning to enhance the capabilities of surface-enhanced spectroscopies. Zhu received Ph.D. in Operations Research in 2024 from The Chinese University of Hong Kong (CUHK) under the supervision of Professor Anthony Man-Cho So. Before that, Zhu received a M.S. in Computational Mathematics in 2020 and a B.S. in Mathematics in 2017 from Zhejiang University, where Zhu was advised by Professor Chong Li.
Key Research Contributions
Zhu's research spans a variety of topics, demonstrating a broad expertise in applying optimization and machine learning techniques to diverse problems. These collaborations and publications highlight Zhu's contributions to the field:
Read also: Read more about Computer Vision and Machine Learning
- Theoretical Foundations: Zhu has contributed to the theoretical understanding of optimization algorithms, with publications in journals like Mathematical Programming, Series A. Jiajin Li, Linglingzhi Zhu*, Anthony Man-Cho So. Mathematical Programming, Series A (2025) 214(1-2):591-641. Preliminary version appeared in NeurIPS 2022 Workshop on Optimization for Machine Learning (OPT 2022), Oral.
- Signal Processing Applications: Zhu's work has also been published in IEEE Transactions on Signal Processing, indicating expertise in applying signal processing techniques. Jiaojiao Zhang, Linglingzhi Zhu, Dominik Fay, Mikael Johansson. IEEE Transactions on Signal Processing (2025) 73:1518-1531. (α-β)
- Numerical Algorithms and Optimization: Zhu has published in Numerical Algorithms and Applied Mathematics & Optimization, showcasing contributions to numerical methods and optimization techniques. Sangho Kum, Chong Li, Jinhua Wang, Jen-Chih Yao, Linglingzhi Zhu. Numerical Algorithms (2023) 94:1819-1848. (α-β) Yaohua Hu, Chong Li, Jinhua Wang, Xiaoqi Yang, Linglingzhi Zhu. Applied Mathematics & Optimization (2023) 87:52.
- Conference Proceedings: Presentations at prestigious conferences like the IEEE Conference on Decision and Control (CDC) and Advances in Neural Information Processing Systems (NeurIPS) demonstrate Zhu's active engagement with the machine learning and control systems communities. Jiaojiao Zhang, Linglingzhi Zhu, Mikael Johansson. Proceedings of the 63rd IEEE Conference on Decision and Control (CDC 2024), pp. Shangyuan Liu, Linglingzhi Zhu, Anthony Man-Cho So. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), pp. Taoli Zheng, Linglingzhi Zhu, Anthony Man-Cho So, José Blanchet, Jiajin Li. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), pp.
The Principles of SERS and SEIRA
The principles of SERS and SEIRA include electromagnetic field enhancement and chemical effect. The underlying mechanisms of electromagnetic field enhancement are mainly about the interaction of molecules and plasmons excited in a SERS/SEIRA substrate. Another mechanism for SERS and SEIRA is the chemical effect. It refers to contributions that are associated with the transfer of electrons between adsorbed molecules and the SERS/SEIRA substrate. It can be achieved by electron transfer in the ground or excited states of the molecule-metal system.
The commonly used SERS substrates include anisotropic nanoparticles, core-shell nanoparticles, single-nanoparticle dimers, self-assembled nanoparticles, nanostructure based on hole-mask colloidal lithography, nanopillars, nanostructured dielectrics and hybrids, and so on. The SEIRA substrates are sub-wavelength nanoantennas or metamaterials which are artificial sub-wavelength structures with extraordinary physical properties distinct from the intrinsic properties of naturally available materials. The nanofabrication technologies for preparing SEIRA/SERS substrates include chemical preparation methods, photolithography, electron beam lithography, magnetron sputtering, electron beam evaporation, and more. The widely used theory for modeling SERS/SEIRA includes perturbation theory, temporal coupled-mode theory, coupled harmonic oscillator theory, and so on.
As mentioned earlier, machine learning is complementary to SERS/SEIRA and offers unparalleled possibilities for solving pressing challenges related to spectral artifacts, overlapping, and huge volumes of spectral data.
Future Directions: The Continued Evolution of ML-Augmented Spectroscopies
The convergence of machine learning and surface-enhanced spectroscopies represents a transformative approach to molecular diagnostics and beyond. As ML algorithms become more sophisticated and computational power increases, the potential for automated, accurate, and rapid molecular analysis will continue to grow. Future research directions include:
- Development of more robust and interpretable ML models: Creating models that not only provide accurate predictions but also offer insights into the underlying physical and chemical processes.
- Integration of multi-modal data: Combining SERS/SEIRA data with other data sources (e.g., imaging, genomics) to provide a more comprehensive understanding of complex biological systems.
- Real-time analysis and diagnostics: Developing portable, point-of-care devices that can perform real-time molecular analysis using ML-augmented SERS/SEIRA.
Read also: Revolutionizing Remote Monitoring
tags: #chong #li #machine #learning #georgia #tech

