MANSOOR, M. Statistická analýza výsledků geologického průzkumu [online]. Brno: Vysoké učení technické v Brně. Fakulta strojního inženýrství. 2024.
Mr. Mansoor asked me for a diploma topic in November 2023 after observing that the previous topic of my colleague was too difficult for him. That is why I came with some simple regression modeling of real data. Allow me to describe my observations I have got since we met. In the first month Mr. Mansoor studied linear regression and prepared the basics of the first theoretical part of his thesis by converting pdf of the part of Montgomery‘s book to latex (without any editing). Matlab was then used to analyze the real data. After the first month struggling with reading the data (caused by missing values) we came to a stage when Mr. Mansoor programmed different design matrices and tried to process the calculation presented in the book on original data (not presented in thesis for the lengthy description) and on daily averages of the measurements. I required all applied methods to be described in the first theoretical part of the thesis. Unfortunately and not supported by me, I suspect these sections were mostly created by Chatgpt, as you can see from their structure (6.3, 6.4, 7, 8). Chapter 9 is denoted to the presentation and discussion of the real data analysis results. Unfortunately, during the process of supervision, I was flooded with too general empty comments of the graphs, and texts together with abstract and conclusion created by Chatgpt. At this stage I decided to create a list of tasks to be fulfilled before we proceed further. This ended up by checking if each of the tasks were fulfilled which was not true repeatedly. There were several tasks which still were not done, e.g. commenting on the Box-Cox transformation (the estimated lambda, necessity of this transformation, correct captions of the graphs), definition of autocovariance in 7.2, explanation of the criterion for the stepwise regression, using the latex template for the thesis presented by our department etc. At this stage I gave up on reaching a higher level of achievement with reasonable comments of the results, which I hoped for at the beginning. I should also mentioned, that Mr. Mansoor visited me regularly and was always polite, but did not reflect on my previous notes frequently. Not making notes during our meetings was not helpful either. I could observe he had worked more intensively for the last few weeks. My impression is that now, with the new era of available artificial intelligence help, the teachers are and will be exposed more frequently to a text created by AI and copied by the student without proofreading. Anyone trying to use chatgpt knows that this tool needs to be used with a critical view. However, every experience is good. Mr. Mansoor taught me his IDC (I don’t care) approach, which I’m grateful for. The objective function of each student is different, some of them strive to learn something new, some aim at minimizing their effort. I believe this will come handy to me. I have to admit, it is difficult for me to distinguish if the thesis should be graded E or F. To allow the student to prove that he managed the topic successfully, I have decided to mark the thesis E and leave the final decision to the committee.
Kritérium | Známka | Body | Slovní hodnocení |
---|---|---|---|
Splnění požadavků a cílů zadání | E | ||
Postup a rozsah řešení, adekvátnost použitých metod | E | ||
Vlastní přínos a originalita | F | ||
Schopnost interpretovat dosažené výsledky a vyvozovat z nich závěry | F | ||
Využitelnost výsledků v praxi nebo teorii | F | ||
Logické uspořádání práce a formální náležitosti | E | ||
Grafická, stylistická úprava a pravopis | E | ||
Práce s literaturou včetně citací | E | ||
Samostatnost studenta při zpracování tématu | F |
The submitted thesis, "Statistical Analysis of Results of Geological Survey," is lacking in nearly every area that needs evaluation. Below is a detailed summary of key remarks and complaints. Submitted pdf file did not contain official assignment of the thesis, during reviewing. The language quality of the thesis presented is generally poor. Issues vary from typographical errors (for example, in the acknowledgements: "I would like to sincerely thank Hübnerová Zuzana, doc. Mgr, Ph.D., my supervisor, for all their help, encouragement, and guidance during this thesis project." or "regular distribution of residuals") to frequently protracted sections off the main topic, which contain suggestions that are seldom executed (for instance, the Shapiro-Wilk test for the normality of residuals is recommended but not computed). This renders the understanding of the text difficult, even for trained statisticians. The visual aspect of the thesis is significantly lacking. This ranges from an "unorthodox" title page to unreadable headers (for instance, on page 29), equations that spill over the margins (such as on page 71), and figures of low resolution (like Figure 8.1). Some figures seem to be copied from other sources, but not referenced (e.g. Figure 6.1) Parts of the presented thesis should have been omitted. Explaining the difference between histogram and bar chart is borderline insulting to the reader. This can be seen as even more aggravating given the fact that author references a Forbes article for this section (which suggests similar reasoning should be used while explaining the difference to children below high school). Main part of the thesis (computed models and discussion of their validity) is seriously hindered by omitting used coding for time (charts are referenced by date, but it is unclear whether time-series starts at 0 at 1 or at any arbitrary number). This is evident from the discrepancy between Figure 9.2 and the final model equation y = 5.7807 × 10^(9)t^2 + , where the fitted curve doesn't start at the origin. Another point of contention is the Box-Cox transformation. The author mentions its application, and it is evident from certain charts that it has been utilized, yet the author does not disclose the specific value of employed, nor whether the two-parameter shifted variant was used. Furthermore, the Box-Cox transformation appears to have been calculated without demonstrating its need, as there was no published analysis of the original data. The original data seems to be restricted for modeling purposes, yet the author gives no reason to do so. The concept of multicollinearity is never mentioned in the whole thesis. This can be a problem especially with regards to unclear coding of the time (even though only used terms are functions of time) The description of proposed models lacks clarity since the coefficient tables do not contain any labels assigning rows to regression terms. Computed equations in each section do not include constant term (if constant term was removed, any discussion about model quality using R2 is invalid). It is unclear if later sections are expansions of the original model, or if previous results are discarded. The model presented in Case 5 is misleading since the effect of constant is confounded with cos2(t) and sin2(t) via their sum. The interpretation of coefficients is highly questionable, as the terms "positive" and "negative" influence/impact are not clearly defined. It appears that the author has merely reiterated the sign of the computed regression coefficient values. Furthermore, in certain instances, such as Case 6, the model description is incorrect or incomplete. Checking for normality in section Case 7 is confusing. There are 2 different Normal probability plots, and it is unclear which correspond to which model. In addition, both normal plots show mild pattern of kurtosis different to normal distribution (one with higher kurtosis, the other with lower). Figure 9.17 shows a “textbook example” of heteroskedasticity, but it is never mentioned in the text. Figure 9.22 shows a significant negative autocorrelation over the lag of 12+ days, this is never addressed as well. It is concerning that discussions of presented results resemble more unsolicited advice, than comments on computed values/charts. Considering the aforementioned complaints and reservations, I cannot in good conscience recommend this thesis for defense, at least not without significant revisions and clarifications. Therefore, my final evaluation as a reviewer is an F.
Kritérium | Známka | Body | Slovní hodnocení |
---|---|---|---|
Splnění požadavků a cílů zadání | F | ||
Postup a rozsah řešení, adekvátnost použitých metod | F | ||
Vlastní přínos a originalita | F | ||
Schopnost interpretovat dosaž. výsledky a vyvozovat z nich závěry | F | ||
Využitelnost výsledků v praxi nebo teorii | F | ||
Logické uspořádání práce a formální náležitosti | F | ||
Grafická, stylistická úprava a pravopis | F | ||
Práce s literaturou včetně citací | F |
eVSKP id 162485