Random Subspace Learning (RASSEL) with data driven weighting schemes

Loading...
Thumbnail Image
Date
2018
ORCID
Advisor
Referee
Mark
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoké učení technické v Brně, Fakulta strojního inženýrství, Ústav matematiky
Altmetrics
Abstract
We present a novel adaptation of the random subspace learning approach to regression analysis and classification of high dimension low sample size data, in which the use of the individual strength of each explanatory variable is harnessed to achieve a consistent selection of a predictively optimal collection of base learners. In the context of random subspace learning, random forest (RF) occupies a prominent place as can be seen by the vast number of extensions of the random forest idea and the multiplicity of machine learning applications of random forest. The adaptation of random subspace learning presented in this paper differs from random forest in the following ways: (a) instead of using trees as RF does, we use multiple linear regression (MLR) as our regression base learner and the generalized linear model (GLM) as our classification base learner and (b) rather than selecting the subset of variables uniformly as RF does, we present the new concept of sampling variables based on a multinomial distribution with weights (success ’probabilities’) driven through p independent one-way analysis of variance (ANOVA) tests on the predic- tor variables. The proposed framework achieves two substantial benefits, namely, (1) the avoidance of the extra computational burden brought by the permutations needed by RF to de-correlate the predictor variables, and (2) the substantial reduc- tion in the average test error gained with the base learners used.
Description
Citation
Mathematics for Applications. 2018 vol. 7, č. 1, s. 11-30. ISSN 1805-3629
http://ma.fme.vutbr.cz/archiv/7_1/ma_7_1_2_elshrif_fokoue_final.pdf
Document type
Peer-reviewed
Document version
Published version
Date of access to the full text
Language of document
en
Study field
Comittee
Date of acceptance
Defence
Result of defence
Document licence
© Vysoké učení technické v Brně, Fakulta strojního inženýrství, Ústav matematiky
Collections
Citace PRO