Posudek vedoucího
Černocký, Jan
To conclude, we fully recommend Ladislav Mošner’s Ph.D. thesis for defense, wish him
all the best in his professional and personal life, and look forward to continuing
working with him.
Posudek oponenta
Delcroix, Marc
The conclusion should contain an explicit statement saying whether, in your opinion,
the thesis and the student´s achievements until now meet the generally accepted requirements
for the award of an academic degree. I have carefully reviewed the doctoral thesis
of Mr. Ladislav Mošner. Despite a few recommendations and some points I raised for
discussion, the thesis represents a significant contribution to the field of SV and
will provide new opportunities for research and technological development. The work
achieved is original and considerable. The investigations were carried out with great
diligence to details. The material of the thesis is based on numerous peer-reviewed
international conference papers, submitted at top conferences in the field. For all
these reasons, I believe that the doctoral thesis meets the requirements of the proceedings
leading to the PhD title conferment.
Otázky
- The approach taken by the candidate aligns with the research in multi-channel robust
ASR, which has yielded very promising results. However, I wonder if this is the optimal
approach for SV. Indeed, ASR needs to recover all parts of the speech to transcribe
each uttered word accurately. In contrast, SV may not need to recover the whole speech
content to capture the speaker's identity. Therefore, it may be better to completely
ignore unreliable parts of the captured signal, caused by loud noise, etc. Could the
candidate comment on this point?
- In Chapter 5, although RCA seems to bring improvements in the case of the conv-TasNet
configuration, the improvement appears small, and its significance is not measured.
Note that in this chapter, all techniques are evaluated in terms of speech enhancement
metrics, which, as Chapter 6 reveals, are not well correlated with SV performance.
Could the candidate comment whether speech enhancement metrics are useful for selecting
the front-end for multi-channel SV?
- The tendency of the results is not always the same for the dev and eval sets. Could
the candidate comment on these differences and how to practically choose the multi-channel
SV system configuration?
Häb-Umbach, Reinhold
To conclude, in my opinion the doctoral thesis meets all requirements of the proceedings
leading to a PhD conferment.
Otázky
- The MultiSV data sets contain retransmissions as the development and evalua-tion data.
While this is a good compromise between pure simulation and real recordings of speakers,
I wonder what the candidate’s opinion is about the va-lidity of drawn conclusions
for an application with real speakers. Do you expect head movements and thus time-varying
transfer characteristics to be critical, or why do you consider them to be not critical?
- Considering a beamforming frontend for either speaker verification or ASR, do you
expect the downstream task to influence what will come out as the best performing
front end (not considering fine-tuning of front-end with gradients from downstream
task)?
- Fine-tuning the beamforming front-end with a speaker verification objective led to
improved performance. Can you see any difference in the obtained beampat-terns with
and without fine-tuning?
- What is your overall conclusion/recommendation: Should we use a beamform-ing front-end
or employ a multi-channel extension of pretrained models, such as WavLM, instead?
Compare the two both in terms of verification performance and computational complexity.