- ItemOn the ubiquity of the Bayesian paradigm in statistical machine learning and data science(Vysoké učení technické v Brně, Fakulta strojního inženýrství, Ústav matematiky, 2019) Fokoué, ErnestThis paper seeks to provide a thorough account of the ubiquitous natureof the Bayesian paradigm in modern statistics, data science and artificial intelli-gence. Once maligned, on the one hand by those who philosophically hated thevery idea of subjective probability used in prior specification, and on the otherhand because of the intractability of the computations needed for Bayesian esti-mation and inference, the Bayesian school of thought now permeates and pervadesvirtually all areas of science, applied science, engineering, social science and evenliberal arts, often in unsuspected ways. Thanks in part to the availability of pow-erful computing resources, but also to the literally unavoidable inherent presenceof the quintessential building blocks of the Bayesian paradigm in all walks of life,the Bayesian way of handling statistical learning, estimation and inference is notonly mainstream but also becoming the most central approach to learning from thedata. This paper explores some of the most relevant elements to help to the readerappreciate the pervading power and presence of the Bayesian paradigm in statistics,artificial intelligence and data science, with an emphasis on how the Gospel accord-ing to Reverend Thomas Bayes has turned out to be the truly good news, and insome cases the amazing saving grace, for all who seek to learn statistically from thedata.
- ItemMulti-stage fault warning for large electric grids using anomaly detection and machine learning(Vysoké učení technické v Brně, Fakulta strojního inženýrství, Ústav matematiky, 2019) Raja, Sanjeev; Fokoué, ErnestIn the monitoring of a complex electric grid, it is of paramount impor-tance to provide operators with early warnings of anomalies detected on the network,along with a precise classification and diagnosis of the specific fault type. In thispaper, we propose a novel multi-stage early warning system prototype for electricgrid fault detection, classification, subgroup discovery, and visualization. In thefirst stage, a computationally efficient anomaly detection method based on quar-tiles detects the presence of a fault in real time. In the second stage, the fault isclassified into one of nine pre-defined disaster scenarios. The time series data arefirst mapped to highly discriminative features by applying dimensionality reductionbased on temporal autocorrelation. The features are then mapped through one ofthree classification techniques: support vector machine, random forest, and artificialneural network. Finally in the third stage, intra-class clustering based on dynamictime warping is used to characterize the fault with further granularity. Results onthe Bonneville Power Administration electric grid data show that i) the proposedanomaly detector is both fast and accurate; ii) dimensionality reduction leads todramatic improvement in classification accuracy and speed; iii) the random forestmethod offers the most accurate, consistent, and robust fault classification; and iv)time series within a given class naturally separate into five distinct clusters whichcorrespond closely to the geographical distribution of electric grid buses.
- ItemOn the versatility and polyvalence of certain statistical learning machines(Vysoké učení technické v Brně, Fakulta strojního inženýrství, Ústav matematiky, 2019) Fokoué, ErnestAs data science and its flurry of lucrative career opportunities continue to dominatestrategic planning meetings at companies and universities around the world, it isremarkable to notice that mathematics, the queen of all sciences, is still called uponto play a central role. I use mathematics here in senso lato to mean mathematicalsciences in general, including algebra, analysis, probability, statistics and theoret-ical computer science. Indeed all the statistical learning machines and traditionalstatistical methods permeating the articles of this special issue have in common thefact they all rest on strong mathematical foundations, even though some of the vastmathematical details are not shown here in some cases due to space constraints.
- ItemOn a global measure of nonlinearity and its application in parameter estimation in nonlinear regression(Vysoké učení technické v Brně, Fakulta strojního inženýrství, Ústav matematiky, 2019) Khinkis, LeonidThe theoretical and computational challenges in least squares estimationof parameters in nonlinear regression models are well documented in statisticalliterature. The measures of nonlinearity are intended to quantify the degree ofnonlinearity and to explain the relationship between nonlinearity and statisticalproperties of a model. A new measure of nonlinearity reflecting the model’s globalbehavior is introduced and discussed in this paper. Two new criteria for globalminimum of the sum of squares in nonlinear regression incorporating this measureare presented and illustrated on several published examples.
- ItemWhat do Asian and non-Asian scriptures have in common? An applied statistical machine learning inquiry(Vysoké učení technické v Brně, Fakulta strojního inženýrství, Ústav matematiky, 2019) Sah, Preeti; Fokoué, ErnestThis paper presents a substantially detailed statistical machine learningapproach to the analysis of several aspects of sacred texts from both the Asian andBiblical scriptural canons. The corpus herein considered consists of 4 Asian sacredscriptures, namely the Tao Te Ching, the teachings of the Buddha, the Yogasutras ofPatanjali, and the Upanishads, and 4 non-Asian sacred texts essentially four booksfrom the Bible, namely the Book of Proverbs, the Book of Wisdom, the Book ofEcclesiastes and the Book of Ecclesiasticus. Standard text mining tools are used,like the creation of Document Term Matrices (DTM) to pre-process raw Englishtranslations into word frequencies, and both unsupervised and supervised learningmethods are used to answer some foundational questions featuring similarities anddissimilarities within each canon and interesting differences between all the canonsconsidered. Despite the vast disparities between the translators of the originaltexts, our findings reveal sharp differences between Asian and non Asian scripturesregardless of whether clustering techniques or pattern recognition methods are used.We provide several compelling visualizations to help highlight our striking findings,chief of which are the persistent groupings of the scriptures based on geography.