Posudky závěrečné kvalifikační práce

Considering the relative novelty of the topic, the level of understanding Ondrej Kožányi reached during the work, and the quality and usability of the realised system, I propose to grade the work as excellent - A.

Dílčí hodnocení
Kritérium	Známka	Body	Slovní hodnocení
Informace k zadání			The student had to get familiar with the modern methods of code generation by large language models (LLMs) and adaptation of the techniques to other software engineering tasks. Ondrej Kožányi successfully studied key approaches and reviewed existing systems that have appeared in recent months, and realised an interface that provides the expected functionality using existing LLMs. Although adapting local LLMs proved to be infeasible, the results meet the goals of the thesis and can be used for further experimentation. Thus, I am satisfied with the delivered results.
Práce s literaturou			The student collected and studied recent scientific papers relevant to the topic and prepared a convincing survey of the current state of the art.
Aktivita během řešení, konzultace, komunikace			The student was very active in both semesters; he regularly consulted his activities and progress and continuously delivered updates on the development work.
Aktivita při dokončování			The text has been finished well in advance of the deadline, I provided feedback to key chapters, which was always reflected in the provided updates.
Publikační činnost, ocenění			-

Posudek oponenta

Aparovich, Maksim

This thesis presents a multi-agent system for code generation that was fulfilled with minor reservations: insufficient system evaluation and error analysis. While the work demonstrates clear structure and relevant technical implementation, the lack of proper evaluation limits conclusions about its performance and practical usability.

Dílčí hodnocení
Kritérium	Body	Slovní hodnocení
Náročnost zadání		The assignment involes building a multi-agent system that can analyze codebases, generate code and documentation carefully following user instructions. The assignment additionally involves literature review, collecting domain data, fine-tuning LLM for the code generation, and evaluating the system. Each of the mentioned steps itself is brings its own challenges, and assignment overall could be treated as a more difficult.
Rozsah splnění požadavků zadání		The assignment was fulfilled with the exception of the system evaluation component. Although performance measurement for agentic systems is challenging, an error analysis remains both feasible and essential. A thorough analysis that expands beyond the two provided examples would enhance the quality and depth of the work.
Rozsah technické zprávy		The thesis incorporates all essential components, from a comprehensive overview of LLM evolution and code generation methodologies to the detailed exposition of the proposed system and its implementation specifics.
Prezentační úroveň technické zprávy	80	The work demonstrates a clear and logical structure. Certain concepts remain not clearly elaborated, resulting in narrative gaps that detract from the overall understanding of the work. For instance: A more detailed explanation of the graph-based codebase representation and the way how agents search and retrieve code would strengthen the solution. Additional specifics regarding the intent classifier's fine-tuning process, including hyperparameters and performance metrics achieved, would provide insights about the methodology.
Formální úprava technické zprávy	85	The work follows to a conventional structure featuring well-defined chapter headings, sections, figures, and equations. The presentation of results would benefit from incorporating additional prompt and agent response examples, along with a detailed analysis of each agent's individual performance.
Práce s literaturou	79	The referenced papers are relevant and complement the thesis content. Several citations identified as preprints are published in conference proceedings. Additionally, thesis contain minor inaccuracies such as the statement about of OpenAI's GPT-4 model multi-modality (contradicts the official documentation: https://platform.openai.com/docs/models/gpt-4 ).
Realizační výstup	75	The work presents a multi-agent system designed to execute coding tasks based on provided instructions. The codebase is clear and self-documenting. The technical solution lacks fault tolerance: during the demonstration when system failed due to one agent's attempt to execute generated code. Furthermore, the work would benefit from a systematic error analysis of the results to provide an assessment of the proposed approach's performance.
Využitelnost výsledků		The work presents a system for generating code from textual instructions, taking inspiration from existing methods and proposing an extended approach. The primary limitation is the evaluation component, which makes it more challenging to conclude about the usability and further practical application.

Posudky

Posudek vedoucího

Smrž, Pavel

Posudek oponenta

Aparovich, Maksim