Posudky závěrečné kvalifikační práce

Posudek vedoucího

Šalko, Milan

Tato diplomová práce nejenže splňuje veškeré požadavky zadání ve vysoké kvalitě, ale svým přístupem a rozsahem je i výrazně přesahuje. Autor zvolil aktuální a odborně náročné téma, které je nejen relevantní v oblasti kyberbezpečnosti, ale také přináší originální pohled na problematiku využitelnosti XAI v detekci deepfake. Tato diplomová práce nejenže splňuje veškeré požadavky zadání ve vysoké kvalitě, ale svým přístupem a rozsahem je i výrazně přesahuje.Vzhledem ke kvalitě výstupů a významnosti zjištění se tato práce jeví jako vhodná k publikaci na vědecké konferenci.

Dílčí hodnocení
Kritérium	Známka	Body	Slovní hodnocení
Informace k zadání			Obtížnost zadání hodnotím jako vysokou. Práce se zabývá aktuálním a odborně náročným výzkumným problémem na pomezí počítačové bezpečnosti a vysvětlitelné umělé inteligence (XAI). Vypracování si vyžadovalo důkladné seznámení s nejnovějšími technikami generování deepfake obsahu, návrh realistického experimentu s lidskými účastníky a kritickou analýzu výsledků.
Práce s literaturou			Student aktivně vyhledával relevantní dostupnou literaturu a vhodně ji začlenil do své práce.
Aktivita během řešení, konzultace, komunikace			Student během řešení projevoval vysokou míru aktivity. Konzultací se účastnil pravidelně, přicházel na ně dobře připraven a vždy prezentoval jasně formulovaný pokrok. Dohodnuté termíny plnil spolehlivě a spolupráce probíhala na velmi dobré úrovni díky jeho efektivní a konstruktivní komunikaci.
Aktivita při dokončování			Student práci dokončil s předstihem a její obsah byl podrobně konzultován.
Publikační činnost, ocenění			Vzhledem ke kvalitě výstupů a významnosti zjištění se tato práce jeví jako vhodná k publikaci na vědecké konferenci.

Navrhovaná známka: A

Body: 100

Posudek oponenta

Ukrop, Martin

The thesis tackled a complex problem of research nature. The results near publication quality with a few deficiencies in experiment methodology and results evaluation. Although selected issues would prevent publication as is, most of them can be fixed. Despite my reservations, I have to admit the assignment was more complex than standard and the student did an excellent work in selected aspects – and due to the nature of the work, my own evaluation criteria were of the level relevant for an internation research publication. Therefore, although my concluding grade is B, I won't oppose A in case of a convincing defense.

Dílčí hodnocení
Kritérium	Body	Slovní hodnocení
Náročnost zadání		The assignment is complex in the breath of skills it requires – apart from the standard related research check, it includes deepfake dataset work and custom detector training, user experiment design, conducting the experiment as well as evaluation using a mixture of qualitative and quantitative methods (although the focus on qualitative methods is smaller).
Rozsah splnění požadavků zadání		All parts of the assignment have been fulfilled (although with some reservations in methodology and evaluation as discussed in more details below). In two ways, I find the thesis significantly surpassing the original assignment: The user study was significantly larger than minimally set in the assignment (at least 45 participants were required, while over 260 questionnaire completions were collected). Although the assignment does not specify any required details of the data evaluation, this has been done very diligently and with a larger scope than usually expected for a Master's thesis.
Rozsah technické zprávy		The thesis is of a larger than usual extent yet does not contain unnecessarily fluff, the information density is appropriate.
Prezentační úroveň technické zprávy	100	The structure is fully appropriate and easily comprehensible.
Formální úprava technické zprávy	100	The thesis is written in good English of appropriate style and reads well. There are no deficiencies in typography or formal aspects of the thesis.
Práce s literaturou	90	Chapters 2–4 provide an excellent background overview of the topic giving more than enough context for face deepfakes and detector technology, nicely leading to the chosen experimental methodology. However, given the nature of the study, more focus could have been put on user studies of deepfakes and deepfake detectors. All sources are cited appropriately. With the astounding 123 references, this work far surpasses the citation count of a usual thesis (with all the references still being relevant).
Realizační výstup	85	In the practical part, I want to particularly praise the following aspects: Transparently set research questions and hypotheses up front. Diligent description of the methodology (especially the dataset and detector training) and up-front discussion of evaluation metrics. Appropriate selection of statistical tests. Study reviewed and acknowledged by institutional ethics board. Appropriate and well-described data cleaning procedure. A dedicated section structured by research questions/hypotheses attempting to draw comprehensive interpretation from multiple analyses, often approaching the problem from multiple angles and comparing them. At the same time, I want to point out the following deficiencies in experiment methodology and data evaluation (in rough order of importance): I disagree with selected conclusions drawn from investigating the individual hypotheses – sometimes the conclusion is overly strong or seems cherry-picked. Study limitations are discussed too superficially (just a few are listed, no mitigation strategies are discussed and no efforts to evaluate the extent of the bias were made). Lacking study data in digital attachment prevents study replicability (I miss the full training dataset, trained detectors, full survey including evaluation images with detector outputs and generated heatmaps, full anonymized dataset if allowed by informed consent). Occasional inappropriate statistical evaluation: Likert scales are ordinal (not interval) and you can only do rank operations and not handle them as numbers (you can report medians and modes but not averages). You cannot make conclusions based on visual appearance of the graphs, statistical tests need to be run instead (normality, symmetry, time trends, confidence, ...). The language and reporting around statistical testing is imprecise at multiple places ("confirming" the hypothesis based on the test, seeing definite conclusions in cases where the data only suggests/hints on something without strong significance, sometimes omitting the test statistic value when reporting results, ...) Insufficient evaluation of qualitative data (open questions) without any stated rigorous methodology (coding techniques, etc.). Not discussing how (and if) the evaluation took into account that the custom CWS metric penalizes not answering. Further minor details as discussed in the review comments section (not included in the formal review template).
Využitelnost výsledků		The thesis contains original research that has a publication potential. I recommend the author and the supervisor to correct the mentioned deficiencies and attempt publication at a reputable venue if not already in progress.

Navrhovaná známka: B

Body: 89

Otázky

The questionnaire contained two open-ended questions. How were these analyzed and what conclusions does the data suggest?
The custom CWS metric seems to penalize skipped/unanswered cases. How was this taken into account during evaluation?