|
|
Aha, D.W., Kibler, D. & Albert, M.K., 1991. Instance-Based Learning Algorithms,p. 37–66.
Keywords: BestOfClass; NearestNeighbor
|
|
|
|
Cleary, J.G. & Trigg, L.E., 1995. K*: an instance-based learner using an entropic distance measure. Morgan Kaufmann, p. 108–114.
Keywords: BestOfClass; NearestNeighbor
|
|
|
|
Petridis, V. & Kaburlasos, V.G., 2003. FINkNN: A Fuzzy Interval Number k-Nearest Neighbor Classifier for Prediction of Sugar Production from Populations of Samples, Journal of Machine Learning Research, 4, p. 17–37.
Keywords: Fuzzy; NearestNeighbor
|
|
|
|
Weinberger, K.Q. & Saul, L.K., 2009. Distance Metric Learning for Large Margin Nearest Neighbor Classification, Journal of Machine Learning Research, 10, p. 207–244.
Abstract: The accuracy of k-nearest neighbor (kNN) classification depends significantly on the metric used to compute distances between different examples. In this paper, we show how to learn a Mahalanobis distance metric for kNN classification from labeled examples. The Mahalanobis metric can equivalently be viewed as a global linear transformation of the input space that precedes kNN classification using Euclidean distances. In our approach, the metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. As in support vector machines (SVMs), the margin criterion leads to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our approach requires no modification or extension for problems in multiway (as opposed to binary) classification. In our framework, the Mahalanobis distance metric is obtained as the solution to a semidefinite program. On several data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification. Sometimes these results can be further improved by clustering the training examples and learning an individual metric within each cluster. We show how to learn and combine these local metrics in a globally integrated manner.
Keywords: NearestNeighbor; SupportVectorMachines
|
|
|
|
Zemke, S., 2003. Data Mining for Prediction. Financial Series Case. Ph.D. thesis. The Royal Institute of Technology, Sweden.
Abstract: Hard problems force innovative approaches and attention to detail, their exploration often contributing beyond the area initially attempted. This thesis investigates the data mining process resulting in a predictor for numerical series. The series experimented with come from financial data – usually hard to forecast. One approach to prediction is to spot patterns in the past, when we already know what followed them, and to test on more recent data. If a pattern is followed by the same outcome frequently enough, we can gain confidence that it is a genuine relationship. Because this approach does not assume any special knowledge or form of the regularities, the method is quite general – applicable to other time series, not just financial. However, the generality puts strong demands on the pattern detection – as to notice regularities in any of the many possible forms. The thesis’ quest for an automated pattern-spotting involves numerous data mining and optimization techniques: neural networks, decision trees, nearest neighbors, regression, genetic algorithms and other. Comparison of their performance on a stock exchange index data is one of the contributions. As no single technique performed sufficiently well, a number of predictors have been put together, forming a voting ensemble. The vote is diversified not only by different training data – as usually done – but also by a learning method and its parameters. An approach is also proposed how to speed-up a predictor fine-tuning. The algorithm development goes still further: A prediction can only be as good as the training data, therefore the need for good data preprocessing. In particular, new multivariate discretization and attribute selection algorithms are presented. The thesis also includes overviews of prediction pitfalls and possible solutions, as well as of ensemble-building for series data with financial characteristics, such as noise and many attributes. The Ph.D. thesis consists of an extended background on financial prediction, 7 papers, and 2 appendices.
Keywords: Bayesian; GeneticProgramming; NearestNeighbor; NeuralNets
|
|