2001
Authors
Gama, J;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
The design of algorithms that explore multiple representation languages and explore different search spaces has an intuitive appeal. In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. The same applies to model trees algorithms, in regression domains, but using linear models at leaf nodes. In this paper we study where to use combinations of attributes in regression and classification tree learning. We present an algorithm for multivariate tree learning that combines a univariate decision tree with a linear function by means of constructive induction. This algorithm is able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. The algorithm has been implemented both for classification problems and regression problems. The experimental evaluation shows that our algorithm has clear advantages with respect to the generalization ability when compared against its components, two simplified versions, and competes well against the state-of-the-art in multivariate regression and classification trees. © Springer-Verlag Berlin Heidelberg 2001.
2010
Authors
Shultz, TR; Fahlman, SE; Craw, S; Andritsos, P; Tsaparas, P; Silva, R; Drummond, C; Ling, CX; Sheng, VS; Drummond, C; Lanzi, PL; Gama, J; Wiegand, RP; Sen, P; Namata, G; Bilgic, M; Getoor, L; He, J; Jain, S; Stephan, F; Jain, S; Stephan, F; Sammut, C; Harries, M; Sammut, C; Ting, KM; Pfahringer, B; Case, J; Jain, S; Wagstaff, KL; Nijssen, S; Wirth, A; Ling, CX; Sheng, VS; Zhang, X; Sammut, C; Cancedda, N; Renders, J; Michelucci, P; Oblinger, D; Keogh, E; Mueen, A;
Publication
Encyclopedia of Machine Learning
Abstract
2008
Authors
del Campo Avilaa, J; Ramos Jimeneza, G; Gamab, J; Morales Buenoa, R;
Publication
Intelligent Data Analysis
Abstract
Classification is a quite relevant task within data analysis field. This task is not a trivial task and different difficulties can arise depending on the nature of the problem. All these difficulties can become worse when the datasets are too large or when new information can arrive at any time. Incremental learning is an approach that can be used to deal with the classification task in these cases. It must alleviate, or solve, the problem of limited time and memory resources. One emergent approach uses concentration bounds to ensure that decisions are made when enough information supports them. IADEM is one of the most recent algorithms that use this approach. The aim of this paper is to improve the performance of this algorithm in different ways: simplifying the complexity of the induced models, adding the ability to deal with continuous data, improving the detection of noise, selecting new criteria for evolutionating the model, including the use of more powerful prediction techniques, etc. Besides these new properties, the new system, IADEM-2, preserves the ability to obtain a performance similar to standard learning algorithms independently of the datasets size and it can incorporate new information as the basic algorithm does: using short time per example.
2000
Authors
Gama, J;
Publication
Intelligent Data Analysis
Abstract
Naive Bayes is a well known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. Experimental evaluation of Iterative Bayes on 27 benchmark datasets shows consistent gains in accuracy. Moreover, the update schema can take costs into account turning the algorithm cost sensitive. Unlike stratification, it is applicable to any number of classes and to arbitrary cost matrices. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.
2008
Authors
Gama, J; Aguilar Ruiz, J; Klinkenberg, R;
Publication
Intelligent Data Analysis
Abstract
We address the problem of matching imperfectly documented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records - which is infeasible for data streams. We devise a fast matching algorithm that uses only a small sample of records, and is yet guaranteed to find a matching that is a close approximation of the matching that would be obtained if the entire stream were processed. The method can be applied to any given (combination of) similarity metrics that can be estimated from a sample with bounded error; we apply the algorithm to several metrics. We give a rigorous proof of the method's correctness and report on experiments using large databases.
2007
Authors
Pimenta, E; Gama, J; Carvalho, A;
Publication
Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2007
Abstract
Recent work highlights advantages in decomposing multiclass decision problems into multiple binary problems. Several strategies have been proposed for this decomposition. The most frequently investigated are All-vs-All, One-vs-All and the Error correction output codes (ECOC). ECOC are binary words (codewords) and can be adapted to be used in classifications problems. They must, however, comply with some specific constraints. The codewords can have several dimensions for each number of classes to be represented. These dimensions grow exponentially with the number of classes of the multiclass problem. Two methods to choose the dimension of a ECOC, which assure a good trade-off between redundancy and error correction capacity, are proposed in this paper. The methods are evaluated in a set of benchmark classification problems. Experimental results show that they are competitive against conventional multiclass decomposition methods. Copyright
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.