|
|
Boutsidis, C., Mahoney, M.W. & Drineas, P., 2008. Unsupervised Featuer Selection for Principal Components Analysis. Las Vegas, Nevada, USA: ACM Press.
Abstract: Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scienti¯c domains. We consider, both theoretically and empirically, the topic of unsuper- vised feature selection for PCA, by leveraging algorithms for the so-called Column Subset Selection Problem (CSSP). In words, the CSSP seeks the\best“subset of exactly k columns from an m£n data matrix A, and has been extensively stud- ied in the Numerical Linear Algebra community. We present a novel two-stage algorithm for the CSSP. From a theoretical perspective, for small to moderate values of k, this algorithm signi¯cantly improves upon the best previously-existing re- sults [24, 12] for the CSSP. From an empirical perspective, we evaluate this algorithm as an unsupervised feature selec- tion strategy in three application domains of modern sta- tistical data analysis: ¯nance, document-term data, and ge- netics. We pay particular attention to how this algorithm may be used to select representative or landmark features from an object-feature matrix in an unsupervised manner. In all three application domains, we are able to identify k landmark features, i.e., columns of the data matrix, that capture nearly the same amount of information as does the subspace that is spanned by the top k \eigenfeatures.”
Keywords: DimReduction
|
|
|
|
Braun, M.L., Buhmann, J.M. & Muller, K.-R., 2008. On Relevant Dimensions in Kernel Feature Spaces, Journal of Machine Learning Research, 9, p. 1875–1908.
Abstract: We show that the relevant information of a supervised learning problem is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem in the sense that it can asymptotically represent the function to be learned and is sufficiently smooth. Thus, kernels do not only transform data sets such that good generalization can be achieved using only linear discriminant functions, but this transformation is also performed in a manner which makes economical use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data for supervised learning problems. Practically, we propose an algorithm which enables us to recover the number of leading kernel PCA components relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to aid in model selection, and (3) to denoise in feature space in order to yield better classification results.
Keywords: DimReduction; KernelMethods
|
|
|
|
Chen, J. et al, 2008. Learning Subspace Kernels for Classification. Las Vegas, Nevada, USA: ACM Press.
Abstract: Kernel methods have been applied successfully in many data mining tasks. Subspace kernel learning was recently proposed to discover an effective low-dimensional subspace of a kernel feature space for improved classification. In this paper, we propose to construct a subspace kernel using the Hilbert-Schmidt Independence Criterion (HSIC). We show that the optimal subspace kernel can be obtained efficiently by solving an eigenvalue problem. One limitation of the existing subspace kernel learning formulations is that the kernel learning and classification are independent and the subspace kernel may not be optimally adapted for classification. To overcome this limitation, we propose a joint optimization framework, in which we learn the subspace kernel and subsequent classifiers simultaneously. In addition, we propose a novel learning formulation that extracts an uncorrelated subspace kernel to reduce the redundant information in a subspace kernel. Following the idea from multiple kernel learning, we extend the proposed formulations to the case when multiple kernels are available and need to be combined. We show that the integration of subspace kernels can be formulated as a semidefinite program (SDP) which is computationally expensive. To improve the efficiency of the SDP formulation, we propose an equivalent semi-infinite linear program (SILP) formulation which can be solved efficiently by the column generation technique. Experimental results on a collection of benchmark data sets demonstrate the effectiveness of the proposed algorithms.
Keywords: DimReduction
|
|
|
|
Chen, X.-wen & Wasikowski, M., 2008. FAST: A ROC-based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problems. Las Vegas, Nevada, USA: ACM Press.
Abstract: The class imbalance problem is encountered in a large number of practical applications of machine learning and data mining, for example, information retrieval and filtering, and the detection of credit card fraud. It has been widely realized that this imbalance raises issues that are either nonexistent or less severe compared to balanced class cases and often results in a classifier’s suboptimal performance. This is even more true when the imbalanced data are also high dimensional. In such cases, feature selection methods are critical to achieve optimal performance. In this paper, we propose a new feature selection method, Feature Assessment by Sliding Thresholds (FAST), which is based on the area under a ROC curve generated by moving the decision boundary of a single feature classifier with thresholds placed using an even-bin distribution. FAST is compared to two commonly-used feature selection methods, correlation coefficient and RELevance In Estimating Features (RELIEF), for imbalanced data classification. The experimental results obtained on text mining, mass spectrometry, and microarray data sets showed that the proposed method outperformed both RELIEF and correlation methods on skewed data sets and was comparable on balanced data sets; when small number of features is preferred, the classification performance of the proposed method was significantly improved compared to correlation and RELIEF-based methods.
Keywords: DimReduction
|
|
|
|
Claeskens, G., Croux, C. & Van Kerckhoven, J., 2008. An Information Criterion for Variable Selection in Support Vector Machines, Journal of Machine Learning Research, 9, p. 541–558.
Abstract: Support vector machines for classification have the advantage that the curse of dimensionality is circumvented. It has been shown that a reduction of the dimension of the input space leads to even better results. For this purpose, we propose two information criteria which can be computed directly from the definition of the support vector machine. We assess the predictive performance of the models selected by our new criteria and compare them to existing variable selection techniques in a simulation study. The simulation results show that the new criteria are competitive in terms of generalization error rate while being much easier to compute. We arrive at the same findings for comparison on some real-world benchmark data sets.
Keywords: DimReduction
|
|
|
|
d'Aspremont, A., Bach, F. & El Ghaoui, L., 2008. Optimal Solutions for Sparse Principal Component Analysis, Journal of Machine Learning Research, 9, p. 1269–1294.
Abstract: Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all target numbers of non zero coefficients, with total complexity O(n3), where n is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in O(n3) per pattern. We discuss applications in subset selection and sparse recovery and show on artificial examples and biological data that our algorithm does provide globally optimal solutions in many cases.
Keywords: DimReduction
|
|
|
|
Dalalyan, A.S., Juditsky, A. & Spokoiny, V., 2008. A New Algorithm for Estimating the Effective Dimension-Reduction Subspace, Journal of Machine Learning Research, 9, p. 1647–1678.
Abstract: The statistical problem of estimating the effective dimension-reduction (EDR) subspace in the multi-index regression model with deterministic design and additive noise is considered. A new procedure for recovering the directions of the EDR subspace is proposed. Many methods for estimating the EDR subspace perform principal component analysis on a family of vectors, say ˆb 1; : : : ;ˆbL, nearly lying in the EDR subspace. This is in particular the case for the structure-adaptive approach proposed by Hristache et al. (2001a). In the present work, we propose to estimate the projector onto the EDR subspace by the solution to the optimization problem minimize max `=1;:::;L ˆb >` (IA)ˆb` subject to A 2 Am ; where Am is the set of all symmetric matrices with eigenvalues in [0;1] and trace less than or equal to m, with m being the true structural dimension. Under mild assumptions, pn-consistency of the proposed procedure is proved (up to a logarithmic factor) in the case when the structural dimension is not larger than 4. Moreover, the stochastic error of the estimator of the projector onto the EDR subspace is shown to depend on L logarithmically. This enables us to use a large number of vectors ˆb ` for estimating the EDR subspace. The empirical behavior of the algorithm is studied through numerical simulations.
Keywords: DimReduction
|
|
|
|
Dy, J.G. & Brodley, C.E., 2004. Feature Selection for Unsupervised Learning, Journal of Machine Learning Research, 5, p. 845–889.
|
|