TY - CONF
T1 - An Automated Tool for Quality Assessment of Raman Spectra
AU - Zimmermann, Daniel
AU - Lilek, David
AU - Prohaska, Katerina
AU - Herbinger, Birgit
PY - 2022
Y1 - 2022
N2 - High quality data is the main prerequisite for successful data analysis, as aptly summarized by the phrase “garbage in, garbage out”. In Raman spectroscopy, the first step in ensuring data quality is usually the removal of low-quality spectra. This is especially important in Surface-enhanced Raman Spectroscopy (SERS), where an inhomogeneous distribution of nanoparticles in the sample leads to high variations in spectral quality. Identifying and removing low-quality spectra is however often still a manual process, which is both time-consuming and inherently subjective. An automated and objective quality assessment procedure is therefore needed to enable the use of SERS as a routine analytical method. [1] For this purpose, we propose an automated tool for estimating the quality of spectra based on the intensity and number of peaks. First, the baseline of each spectrum is estimated and subtracted to remove background fluorescence. Peaks are then identified using a second derivative Savitzky-Golay-Filter, allowing for the detection of low-intensity peaks which would not be recognizable in the original spectra while simultaneously suppressing noise. The window size of the Savitzky-Golay-Filter and the threshold value for the second derivative control the sensitivity of peak detection and must be carefully selected. If these parameters are set too large, smaller peaks are smoothed out and are not detected. In contrast, values that are set too small lead to an increasing number of false positives. Finally, the number of peaks and their intensity is used to calculate a quality score. We implemented multiple scoring options which can be selected by the user, depending on whether a higher intensity or a higher number of peaks is considered more important. To test the tool, we applied it in two different scenarios. During SERS method development, we used the resulting quality scores to compare the suitability of different nanoparticles. The mean quality score of 50 spectra was used to select the highest-scoring SERS method. To assess the reproducibility, we also examined the distribution of intensities and peak counts. [2] Secondly, we applied the tool as part of a data analysis workflow. [3] Principal component analysis (PCA) showed a significant reduction in the number of outliers compared to the complete dataset. Similarly, the predictive performance of supervised classification models also increased. Visual inspection of individual spectra shows that our tool accurately reflects the spectral quality, confirming its suitability for use in method development and as part of a data analysis workflow.
AB - High quality data is the main prerequisite for successful data analysis, as aptly summarized by the phrase “garbage in, garbage out”. In Raman spectroscopy, the first step in ensuring data quality is usually the removal of low-quality spectra. This is especially important in Surface-enhanced Raman Spectroscopy (SERS), where an inhomogeneous distribution of nanoparticles in the sample leads to high variations in spectral quality. Identifying and removing low-quality spectra is however often still a manual process, which is both time-consuming and inherently subjective. An automated and objective quality assessment procedure is therefore needed to enable the use of SERS as a routine analytical method. [1] For this purpose, we propose an automated tool for estimating the quality of spectra based on the intensity and number of peaks. First, the baseline of each spectrum is estimated and subtracted to remove background fluorescence. Peaks are then identified using a second derivative Savitzky-Golay-Filter, allowing for the detection of low-intensity peaks which would not be recognizable in the original spectra while simultaneously suppressing noise. The window size of the Savitzky-Golay-Filter and the threshold value for the second derivative control the sensitivity of peak detection and must be carefully selected. If these parameters are set too large, smaller peaks are smoothed out and are not detected. In contrast, values that are set too small lead to an increasing number of false positives. Finally, the number of peaks and their intensity is used to calculate a quality score. We implemented multiple scoring options which can be selected by the user, depending on whether a higher intensity or a higher number of peaks is considered more important. To test the tool, we applied it in two different scenarios. During SERS method development, we used the resulting quality scores to compare the suitability of different nanoparticles. The mean quality score of 50 spectra was used to select the highest-scoring SERS method. To assess the reproducibility, we also examined the distribution of intensities and peak counts. [2] Secondly, we applied the tool as part of a data analysis workflow. [3] Principal component analysis (PCA) showed a significant reduction in the number of outliers compared to the complete dataset. Similarly, the predictive performance of supervised classification models also increased. Visual inspection of individual spectra shows that our tool accurately reflects the spectral quality, confirming its suitability for use in method development and as part of a data analysis workflow.
KW - #nosource
KW - ⛔ No DOI found
M3 - Poster
ER -