|
|
2008. Mathematics for Anaylsis of Petascale Data. ASCR, Office of Science, Department of Enegy.
Keywords: DataMiningGeneral; LargeScaleLearning
|
|
|
|
Bacardit, J., Burke, E.K. & Krasnoger, N., 2009. Improving the scalability of rule-based evolutionary learning, Memetic Computation, 1, p. 55–67.
Abstract: Evolutionary learning techniques are comparable in accuracy with other learning methods such as Bayesian Learning, SVM, etc. These techniques often produce more interpretable knowledge than, e.g. SVM; however, efficiency is a significant drawback. This paper presents a newrepresentation motivated by our observations that Bioinformatics and Systems Biology often give rise to very large-scale datasets that are noisy, ambiguous and usually described by a large number of attributes. The crucial observation is that, in the most successful rules obtained for such datasets, only a few key attributes (from the large number of available ones) are expressed in a rule, hence automatically discovering these few key attributes and only keeping track of them contributes to a substantial speed up by avoiding useless match operations with irrelevant attributes. Thus, in effective terms this procedure is performing a fine-grained feature selection at a rule-wise level, as the key attributes may be different for each learned rule. The representation we propose has been tested within the BioHEL machine learning system, and the experiments performed show that not only the representation has competent learning performance, but that it also manages to reduce considerably the system run-time. That is, the proposed representation is up to 2–3 times faster than state-of-the-art evolutionary learning representations designed specifically for efficiency purposes.
Keywords: GeneticProgramming; LargeScaleLearning
|
|
|
|
Bach, F., 2008. Consistency of the Group Lasso and Multiple Kernel Learning, Journal of Machine Learning Research, 9, p. 1179–1225.
Abstract: We consider the least-square regression problem with regularization by a block `1-norm, that is, a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the `1-norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic group selection consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied. Keywords: sparsity, regularization, consistency, convex optimization, covariance operators
Keywords: KernelMethods; LargeScaleLearning
|
|
|
|
Balakrishnan, S. & Madigan, D., 2008. Algorithms for Sparse Linear Classifiers in the Massive Data Setting, Journal of Machine Learning Research, 9, p. 313–337.
Abstract: Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multipass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive data sets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function.
Keywords: LargeScaleLearning
|
|
|
|
Bell, R.M., Koren, Y. & Volinsky, C., 2007. Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems. San Jose, USA.
Abstract: The collaborative filtering approach to recommender systems predicts user preferences for products or services by learning past useritem relationships. In this work, we propose novel algorithms for predicting user ratings of items by integrating complementary models that focus on patterns at different scales. At a local scale, we use a neighborhood-based technique that infers ratings from observed ratings by similar users or of similar items. Unlike previous local approaches, our method is based on a formal model that accounts for interactions within the neighborhood, leading to improved estimation quality. At a higher, regional, scale, we use SVD-like matrix factorization for recovering the major structural patterns in the user-item rating matrix. Unlike previous approaches that require imputations in order to fill in the unknown matrix entries, our new iterative algorithm avoids imputation. Because the models involve estimation of millions, or even billions, of parameters, shrinkage of estimated values to account for sampling variability proves crucial to prevent overfitting. Both the local and the regional approaches, and in particular their combination through a unifying model, compare favorably with other approaches and deliver substantially better results than the commercial Netflix Cinematch recommender system on a large publicly available data set.
Keywords: LargeScaleLearning
|
|
|
|
Bengio, Y. & LeCun, Y., 2007. Scalaing Learning Algorithms towards AI.
Abstract: One long-term goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with minimal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to non-parametric learning, particularly kernel methods, are fundamentally limited in their ability to learn complex high-dimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefficients. We argue that shallow architectures can be very inefficient in terms of required number of computational elements and examples. Second, we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learning) and semi-supervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in which lower-level features or concepts are progressively combined into more abstract and higher-level representations. We argue that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.
Keywords: DataMiningGeneral; LargeScaleLearning
|
|
|
|
Boulle, M., 2009. A Parameter-Free Classification Method for Large Scale Learning, Journal of Machine Learning Research, 10, p. 1367–1385.
Abstract: With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper, we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with data sets that are far larger than the available central memory. We finally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation time.
Keywords: Bayesian; LargeScaleLearning
|
|
|
|
Bradley, J.K. & Schapire, R.E.. FilterBoost: Regression and Classification on Large Datasets.
Keywords: LargeScaleLearning
|
|