|
|
Abernathy, J., Langford, J. & Warmuth, M.K.. Continuous Experts and the Binning Algorithm.
Abstract: We consider the design of online master algorithms for combining the predictions from a set of experts where the absolute loss of the master is to be close to the absolute loss of the best expert. For the case when the master must produce binary predictions, the Binomial Weighting algorithm is known to be optimal when the number of experts is large. It has remained an open problem how to design master algorithms based on binomial weights when the predictions of the master are allowed to be real valued. In this paper we provide such an algorithm and call it the Binning algorithm because it maintains experts in an array of bins. We show that this algorithm is optimal in a relaxed setting in which we consider experts as continuous quantities. The algorithm is efficient and near-optimal in the standard experts setting.
Keywords: Ensembles
|
|
|
|
Albanis, G.T. & Batchelor, R.A., 1999. Combining Heterogeneous Classifiers for Stock Selection.
Abstract: Combining unbiased forecasts of continuous variables necessarily reduces the error variance below that of the median individual forecast. However, this does not necessarily hold for forecasts of discrete variables, or where the costs of errors are not directly related to the error variance. This paper investigates empirically the benefits of combining forecasts of outperforming shares, based on five linear and nonlinear statistical classification techniques, including neural network and recursive partitioning methods. We find that simple “Majority Voting” improves accuracy and profitability only marginally. Much greater gains come from applying the “Unanimity Principle”, whereby a share is not held in the high-performing portfolio unless all classifiers agree.
Keywords: Ensembles
|
|
|
|
Arlot, S. & Celisse, A., 2009. A survey of cross-validation procedures for model selection, 907.4728.
Abstract: Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.
Keywords: DataMiningGeneral; Ensembles
|
|
|
|
Baram, Y., El-Yaniv, R. & Luz, kobi, 2004. Online Choice of Active Learning Algorithms, Journal of Machine Learning Research, 5, p. 255–291.
|
|
|
|
Bell, R.M., Koren, Y. & Volinsky, C., 2007. The BellKor solution to the Netflix Prize.
Abstract: Our final solution (RMSE=0.8712) consists of blending 107 individual results. Since many of these results are close variants, we first describe the main approaches behind them. Then, we will move to describing each individual result. The core components of the solution are published in our ICDM'2007 paper [1] (or, KDD-Cup’2007 paper [2]), and also in the earlier KDD'2007 paper [3]. We assume that the reader is familiar with these works and our terminology there.
Keywords: Ensembles
|
|
|
|
Bifet, A. et al, 2009. New Ensemble Methods For Evolving Data Streams. Paris, France: ACM Press.
Abstract: Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is when concepts drift or change completely, is becoming one of the core issues. When tackling non-stationary concepts, ensembles of classiers have several advantages over single classier methods: they are easy to scale and parallelize, they can adapt to change quickly by pruning under-performing parts of the ensemble, and they therefore usually also generate more accurate concept descriptions. This paper proposes a new experimental data stream framework for studying concept drift, and two new variants of Bagging: ADWIN Bagging and Adaptive-Size Hoeding Tree (ASHT) Bagging. Using the new experimental framework, an evaluation study on synthetic and real-world datasets comprising up to ten million examples shows that the new ensemble methods perform very well compared to several known methods.
Keywords: Ensembles
|
|
|
|
Bousquet, O. & Warmuth, M.K., 2002. Tracking a Small Set of Experts by Mixing Past Posteriors, Journal of Machine Learning Research, 3, p. 363–396.
|
|
|
|
Caruana, R., Munson, A. & Niculescu-Mizil, A., 2006. Getting the Most Out of Ensemble Selection.
Abstract: We investigate four previously unexplored aspects of ensemble selection, a procedure for building ensembles of classifiers. First we test whether adjusting model predictions to put them on a canonical scale makes the ensembles more effective. Second, we explore the performance of ensemble selection when different amounts of data are available for ensemble hillclimbing. Third, we quantify the benefit of ensemble selection’s ability to optimize to arbitrary metrics. Fourth, we study the performance impact of pruning the number of models available for ensemble selection. Based on our results we present improved ensemble selection methods that double the benefit of the original method.
Keywords: Ensembles
|
|