KOVAL, M. Efektivnost datových strukur v implementaci automatů [online]. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií. 2025.
Nemám kromě textu zprávy informace o tom, co student dělal. Výsledek, který vidím před sebou, se mi bohužel zdá nevyhovující. Text ani realizace podle mě nesplňují minimální kvalitativní požadavky, nicméně, práce možná nemá od dosažení minimální průchodné úrovně příliš daleko. Text obsahuje smysluplné části, implementace a vyhodnocení také. Dovolím si detailnější názor na prezentovaný realizační výsledek práce v relaci k zadání a mému očekávání. Jedná se o dosti malé implementační a vyhodnocovací úsilí s minimálním vlivem na výkon knihovny, a s nedostatečným vyhodnocením. Student se pokusil o čtyři velmi jednoduché optimalizace, z nichž jen dvě poněkdu zlepšily výkon knihovny a jsou přijatelně rozumně popsány a vyhodnoceny (lineární alokátor a optimalizace datové struktury pro průnik). Od minulého roku student přidal další dvě optimalizace: 1) Paralelizace, nestuduji, nedává smysl, protože v práci nevidím vyhodnocení. 2) Odložené řazení vektorů v přechodové funkci automatů. Tato optimalizace dává konceptuálně smysl, o kvalitě implementace myšlenky však nemám informace a její vyhodnocení se zdá nesmyslné. Totiž, největší pozitivní efekt má podle Obrázku 5.3. tato optimalizace v benchmarku noodler-inter, který testuje výkonnost průniku automatů. Kód průniku automatů by ale nikdy neměl řazení, které student optimalizoval, používat. Je tedy otázka, co student vlastně změřil. Je možné, že benchmark neměří jen průnik -- pak by bylo dobré vědět, co se zrychlilo. Je také možné, že student neizoloval efekt této optimalizace od ostatních změn, které provedl v Matě -- pak by byl experiment nevalidní.
| Kritérium | Známka | Body | Slovní hodnocení |
|---|---|---|---|
| Informace k zadání | Průměrně obtížné zadání. Vyžadovalo pochopit knihovnu Mata a její automatové algoritmy. Jádro knihovny není příliš rozsáhlé ani příliš komplikované, ale pochopení na úrovni umožňující optimalizovat existující kód není triviální úkol. Knihovna je také psána s důrazem na efektivitu, nalézt nové optimalizace tedy také není zcela snadné. | ||
| Práce s literaturou | Student nastudoval potřebné minimum. | ||
| Aktivita během řešení, konzultace, komunikace | Student v minulém roce řešení minimálně komunikoval a pracoval, ve druhém roce řešení ale vůbec nereagoval na opakované výzvy ke komunikaci, ozval se méně než týden před termínem odevzdání s tím, že s týdenním odkladem bude schopen dokončit text. Text vidím poprvé v odevzdané podobě, o vlastní studentově práci nevím nic, kromě toho, co je napsáno v textu. | ||
| Aktivita při dokončování | Viz výše. Aktivita a komunikace nulová i přes opakované výzvy. | ||
| Publikační činnost, ocenění | Je možné, že alokátor nebo datové struktury pro průnik v knihovně nakonec použijeme, pokud se potvrdí, že fungují. |
The goal of the assignment was to analyse the performance of the automata library Mata, explore the existing approaches to representation of automata and data structures used in automata operations, design alternative data structures, and evaluate their performance. The goal was fulfilled only partially: The analysis of the performance is missing, the existing approaches were not explored, the 4 rather simple optimizations seem to be chosen arbitrarily, are insufficiently explained, and experimental evaluation is poorly presented and analysed. 3 out of 4 optimizations were evaluated. However, the amount of work does not seem to be sufficient, and the work done is poorly presented. The thesis severely lacks in both the formal and factual requirements. Although, if better motivation and reasoning for the optimizations, and their experimental evaluation is presented and explained, the thesis could satisfy the minimal requirements for a passable thesis. For these reasons, at the current state of the thesis, I suggest giving the thesis an F.
| Kritérium | Známka | Body | Slovní hodnocení |
|---|---|---|---|
| Náročnost zadání | The student needed to study the automata library Mata, and techniques for optimized data structures for automata representation and algorithms. | ||
| Rozsah splnění požadavků zadání | The analysis of the performance of the library is severely lacking, supported by no data at all. The exploration of the other automata libraries and used representations for automata and automata algorithms is non-existent. The chosen optimizations are insufficiently explained. The work done on the optimizations seems insufficient. The experimental evaluation and results are questionable, poorly presented and largely unexplained. The amount of work done since the last attempt is minimal (deferred sorting is badly explained and questionably measured, and parallelization remains unevaluated). | ||
| Rozsah technické zprávy | The thesis is at the minimal length of 40 standard pages (or slightly under), with numerous chapters being poorly explained, lacking in explanation of the discussed topic (namely, the analysis of the performance of the library, and exploring other automata libraries, and experimental evaluation), or irrelevant to the topic of the thesis. | ||
| Prezentační úroveň technické zprávy | 40 | The topic of the thesis poorly presented, the chapters are lacking deeper explanation of the discussed topics, the optimizations are proposed without any data supporting the choices. Some chapters are superfluous, or seem incomplete. The thesis lacks formal definitions of used terms, and references internal implementation objects in Mata without explaining them. There is no performance analysis of the library Mata, exploration of the existing solutions, other automata libraries and automata algorithms. The experimental evaluation is poorly presented and explained. The thesis states that “an operation Delta::add() [to add a new general transition to an automaton] is the only method to construct an automaton, and hence is a bottleneck”, while the whole design of Mata is to call “Delta::add()” as few times as possible (and instead use other methods to construct an automaton) to omit having to perform an expensive traversal over the multi-layered data structures. Chapter 2.5 explaining the cache hierarchy seems superfluous. While generally a good idea, the deferred sorting in 4.4 seems to be incompletely explained, similarly for the linear allocator and ProductStorage: why, and how were they chosen, and where were they applied? Without deep knowledge of the library, and referencing the source code, parts of the thesis are hard to understand. Code snippets are not explained. Parallelization stands outside the topic of the thesis, seems as an arbitrary choice, without any clear motivation supported by data, and is never evaluated. The experimental evaluation is poorly presented. The timeout of 1000 s (instead of normal 60-120 s for these benchmarks) is excessive. This might skew the results significantly, as no data analysis and cleaning has been done to remove the outliers. The graphs and tables are hard to understand, most show no interesting results as the modifications had little to no effect on the performance, the results are obscured and lack explanation (figure 5.5 is not explained at all in the text). The thesis points to several benchmarks where the performance changed, but it does not explain why the implementations performed better/worse than on other benchmarks. The results seem to contradict each other (table 5.2 shows roughly 2 times performance increase on intersection, yet table 5.1 shows the original implementation at most 1.34 times slower, the 2 times performance decrease on trim is not mentioned at all). The deferred sorting optimization is said to speed up the performance on intersection 2 times, yet the graphs and the table 5.4 show no, and only 10 ms speed up out of 130 ms, respectively (without explaining where the deferred sorting was applied and how; intersection should not benefit from deferred sorting since it mainly uses adding to the end of sorted containers). More statistics (namely quartiles) are necessary to analyse the results. | |
| Formální úprava technické zprávy | 55 | The thesis contains numerous typographical and stylistic mistakes. The sections are incorrectly nested, miss text entirely, or reference non-existent data structures. The graphs and tables are poorly presented, miss annotations (used axes units, the scale, explanation of the columns in tables), contain empty subgraphs, show no interesting results (wrong scale), or the results should be presented in different format. Subsection in chapter 4.2 are incorrectly nested, section 4.3 is missing some text (chapter 4.3.1 starts immediately). The Table 5.5 and the corresponding text feature MatrixProductStorageClas[s]ical and ProductStorageClassical, which is never mentioned in the thesis, the MatrixProductStorageContinues (presumably MatrixProductStorageContinuous) is defined twice in the text with different definitions. | |
| Práce s literaturou | 45 | Numerous claims in the thesis are not supported by any data or external sources, or are imprecise (too general). The thesis references only a few automata theory- and applications-related works, and does not reference even the main sources from the assignment, and instead references works on general programming and optimization techniques. The external sources are incorrectly cited. E.g., what are the most prominent applications of NFAs in chapter 2.2.1, which are not relevant to the thesis, as the conclusions are not considered in the analysis of what to optimize, should be supported by some external sources. Some claims are imprecise: “[NFAs] are especially useful in academic and research contexts where expressiveness and compactness of representation are more important than raw performance”. The chapter 2.4, Comparison with Other Tools, does not discuss the other tools (automata libraries) at all. Only points to some design decisions of Mata, but fails to mention what are the design decisions the other libraries have taken. The thesis claims that “all hopes of parallelization in Mata have to be abandoned”, yet does not give any data supporting the claim. | |
| Realizační výstup | 65 | The implemented changes seem fairly minimal, with little documentation and explanation, but the modified library compiles, runs, and the unit tests pass. The code relatively satisfies the quality requirements on contributions to the automata library Mata. | |
| Využitelnost výsledků | The thesis introduces a few modifications to the existing algorithms in Mata. Since the experimental results are unclear and questionable, it is hard to assess whether the results should be used in the main branch of Mata. If however proper experimental evaluation is conducted, and the linear allocator or the ProductStorage data structures show performance improvement, the applied optimizations might be usable in the library. |
eVSKP id 164182