Ústav počítačové grafiky a multimédií
Browse
Recent Submissions
Now showing 1 - 5 of 9
- ItemSpeech production under stress for machine learning: multimodal dataset of 79 cases and 8 signals(Springer Nature, 2024-11-12) Pešán, Jan; Juřík, Vojtěch; Růžičková, Alexandra; Svoboda, Vojtěch; Janoušek, Oto; Němcová, Andrea; Bojanovská, Hana; Aldabaghová, Jasmína; Kyslík, Filip; Vodičková, Kateřina; Sodomová, Adéla; Bartys, Patrik; Chudý, Peter; Černocký, JanEarly identification of cognitive or physical overload is critical in fields where human decision making matters when preventing threats to safety and property. Pilots, drivers, surgeons, and operators of nuclear plants are among those affected by this challenge, as acute stress can impair their cognition. In this context, the significance of paralinguistic automatic speech processing increases for early stress detection. The intensity, intonation, and cadence of an utterance are examples of paralinguistic traits that determine the meaning of a sentence and are often lost in the verbatim transcript. To address this issue, tools are being developed to recognize paralinguistic traits effectively. However, a data bottleneck still exists in the training of paralinguistic speech traits, and the lack of high-quality reference data for the training of artificial systems persists. Regarding this, we present an original empirical dataset collected using the BESST experimental protocol for capturing speech signals under induced stress. With this data, our aim is to promote the development of pre-emptive intervention systems based on stress estimation from speech.
- ItemPessimistic Off-Policy Optimization for Learning to Rank(IOS Press, 2024-10-21) Čief, Matej; Kompan, MichalOff-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient, and we analyze it. We study its Bayesian and frequentist variants and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines and is both robust and general.
- ItemA 3D Scan Model and Thermal Image Data Fusion Algorithms for 3D Thermography in Medicine(NEUVEDEN, 2017-11-08) Chromý, Adam; Klíma, OndřejObjectives. At present, medical thermal imaging is still considered a mere qualitative tool enabling us to distinguish between but lacking the ability to quantify the physiological and nonphysiological states of the body. Such a capability would, however, facilitate solving the problem of medical quantification, whose presence currently manifests itself within the entire healthcare system. Methods. A generally applicable method to enhance captured 3D spatial data carrying temperature-related information is presented; in this context, all equations required for other data fusions are derived. The method can be utilized for high-density point clouds or detailed meshes at a high resolution but is conveniently usable in large objects with sparse points. Results. The benefits of the approach are experimentally demonstrated on 3D thermal scans of injured subjects. We obtained diagnostic information inaccessible via traditional methods. Conclusion. Using a 3D model and thermal image data fusion allows the quantification of inflammation, facilitating more precise injury and illness diagnostics or monitoring. The technique offers a wide application potential in medicine and multiple technological domains, including electrical and mechanical engineering.
- ItemIndoor and Outdoor Backpack Mapping with Calibrated Pair of Velodyne LiDARs(2019-09-29) Veľas, Martin; Španěl, Michal; Herout, AdamThis paper presents a human-carried mapping backpack based on a pair of Velodyne LiDAR scanners. Our system is a universal solution both for large scale outdoor and also smaller indoor environments. It benefits from a combination of two LiDAR scanners what makes the odometry estimation more precise. The scanners are mounted under different angles, thus larger space around the backpack is scanned. By fusion with GNSS/INS sub-system, the mapping of featureless environments and also the georeferencing of resulting point cloud is possible. By deploying SoA methods for registration and the loop closure optimization it provides sufficient precision for many applications in BIM (Building Information Modeling), inventory check, construction planning, etc. In our indoor experiments, we evaluated our proposed backpack against ZEB-1 solution, using FARO terrestrial scanner as the reference, yielding similar results in terms of precision, while our system provides higher data density, laser intensity readings, and scalability for large environments.
- ItemA Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers(MDPI, 2023-05-22) Zuluaga-Gomez, Juan; Prasad, Amrutha; Nigmatulina, Iuliia; Motlíček, Petr; Kleinert, MatthiasIn this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI)-based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding. Thus, it goes beyond only transcribing the communication and can also understand its meaning. The output is subsequently sent to a response generator system, which resembles the spoken read-back that pilots give to the ATCo trainees. The overall pipeline is composed of the following submodules: (i) an automatic speech recognition (ASR) system that transforms audio into a sequence of words; (ii) a high-level air traffic control (ATC)-related entity parser that understands the transcribed voice communication; and (iii) a text-to-speech submodule that generates a spoken utterance that resembles a pilot based on the situation of the dialogue. Our system employs state-of-the-art AI-based tools such as Wav2Vec 2.0, Conformer, BERT and Tacotron models. To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools. In addition, we develop a robust and modular system with optional submodules that can enhance the system's performance by incorporating real-time surveillance data, metadata related to exercises (such as sectors or runways), or even a deliberate read-back error to train ATCo trainees to identify them. Our ASR system can reach as low as 5.5% and 15.9% absolute word error rates (WER) on high- and low-quality ATC audio. We also demonstrate that adding surveillance data into the ASR can yield a callsign detection accuracy of more than 96%.