On the non-universality of distance metrics in laser-induced breakdown spectroscopy

Loading...
Thumbnail Image

Authors

Vrábel, Jakub
Képeš, Erik
Nedělník, Pavel
Záděra, Antonín
Pořízka, Pavel
Kaiser, Jozef

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

The Royal Society of Chemistry
Altmetrics

Abstract

The ability to measure similarity between high-dimensional spectra is crucial for numerous data processing tasks in spectroscopy. Many popular machine learning algorithms depend on, or directly implement, a form of similarity or distance metric. Despite its profound influence on algorithm performance and sensitivity to signal fluctuations, the selection of an appropriate metric remains often neglected within the spectroscopic community. This work aims to shed light on the metric selection process in Laser-Induced Breakdown Spectroscopy (LIBS) and study consequences for data analysis and analytical performance in selected applications. We studied six relevant distance metrics: Euclidean, Manhattan, cosine, Siamese, fractional, and mutual information. We assessed their response to changes in sample composition, additive noise, and signal intensity. Our results show specific vulnerabilities of commonly used metrics, such as the Euclidean metric's high sensitivity to additive noise and the cosine metric's sensitivity to spectral shifts. The Siamese metric stood out in the majority of studied cases and outperformed others in a direct comparison within the spectra classification task. This work provides basic guidelines for selecting metrics in various contexts. The methodology is general and can be directly extended to other spectroscopic techniques that possess comparable data properties.
The ability to measure similarity between high-dimensional spectra is crucial for numerous data processing tasks in spectroscopy. Many popular machine learning algorithms depend on, or directly implement, a form of similarity or distance metric. Despite its profound influence on algorithm performance and sensitivity to signal fluctuations, the selection of an appropriate metric remains often neglected within the spectroscopic community. This work aims to shed light on the metric selection process in Laser-Induced Breakdown Spectroscopy (LIBS) and study consequences for data analysis and analytical performance in selected applications. We studied six relevant distance metrics: Euclidean, Manhattan, cosine, Siamese, fractional, and mutual information. We assessed their response to changes in sample composition, additive noise, and signal intensity. Our results show specific vulnerabilities of commonly used metrics, such as the Euclidean metric's high sensitivity to additive noise and the cosine metric's sensitivity to spectral shifts. The Siamese metric stood out in the majority of studied cases and outperformed others in a direct comparison within the spectra classification task. This work provides basic guidelines for selecting metrics in various contexts. The methodology is general and can be directly extended to other spectroscopic techniques that possess comparable data properties.

Description

Citation

JOURNAL OF ANALYTICAL ATOMIC SPECTROMETRY. 2025, vol. 40, issue 6, p. 1552-1565.
https://pubs.rsc.org/en/content/articlelanding/2025/ja/d4ja00377b

Document type

Peer-reviewed

Document version

Published version

Date of access to the full text

Language of document

en

Study field

Comittee

Date of acceptance

Defence

Result of defence

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Creative Commons Attribution 3.0 Unported
Citace PRO