Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2005

Bias management of Bayesian network classifiers

Authors
Castillo, G; Gama, J;

Publication
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
The purpose of this paper is to describe an adaptive algorithm for improving the performance of Bayesian Network Classifiers (BNCs) in an on-line learning framework. Instead of choosing a priori a particular model class of BNCs, our adaptive algorithm scales up the model's complexity by gradually increasing the number of allowable dependencies among features, Starting with the simple Naive Bayes structure, it uses simple decision rules based on qualitative information about the performance's dynamics to decide when it makes sense to do the next move in the spectrum of feature dependencies and to start searching for a more complex classifier. Results in conducted experiments using the class of Dependence Bayesian Classifiers on three large datasets show that our algorithm is able to select a model with the appropriate complexity for the current amount of training data, thus balancing the computational cost of updating a model with the benefits of increasing in accuracy.

2012

Improving the offline clustering stage of data stream algorithms in scenarios with variable number of clusters

Authors
Faria, ER; Barros, RC; Gama, J; Carvalho, ACPLF;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
Many data stream clustering algorithms operate in two well-defined steps: (i) online statistical data collection stage; and (ii) offline macro-clustering stage. The well-known k-means algorithm is often employed for performing the offline macro-clustering step. The conventional k-means algorithm assumes that the number of clusters (k) is defined a priori by the user. Given the difficulty of defining the value of k a priori in real-world problems, we describe a new approach that allows estimating k dynamically from streams with variable number of clusters, which is a common scenario in data with a non-stationary distribution. In addition, we combine our dynamic approach with two different strategies for initializing the centroids during the offline clustering. Analysis of results suggest that, using the dynamic approach, the method k-means++ for centroids initialization present better results. © 2012 Authors.

2012

Very fast decision rules for multi-class problems

Authors
Kosina, P; Gama, J;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
Decision rules are one of the most interpretable and flexible models for data mining prediction tasks. Till now, few works presented online, any-time and one-pass algorithms for learning decision rules in the stream mining scenario. A quite recent algorithm, the Very Fast Decision Rules (VFDR), learns set of rules, where each rule discriminates one class from all the other. In this work we extend the VFDR algorithm by decomposing a multi-class problem into a set of two-class problems and inducing a set of discriminative rules for each binary problem. The proposed algorithm maintains all properties required when learning from stationary data streams: online and any-time classifiers, processing each example once. Moreover, it is able to learn ordered and unordered rule sets. The new approach is evaluated on various real and artificial datasets. The new algorithm improves the performance of the previous version and is competitive with the state-of-the-art decision tree learning method for data streams. © 2012 ACM.

2011

Advances in data stream mining for mobile and ubiquitous environments

Authors
Krishnaswamy, S; Gama, J; Gaber, MM;

Publication
International Conference on Information and Knowledge Management, Proceedings

Abstract
The tutorial presents the state-of-the-art in mobile and ubiquitous data stream mining and discusses open research problems, issues, and challenges in this area. © 2011 Authors.

2011

Incremental multi-target model trees for data streams

Authors
Ikonomovska, E; Gama, J; Dzeroski, S;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
As in batch learning, one may identify a class of streaming real-world problems which require the modeling of several targets simultaneously. Due to the dependencies among the targets, simultaneous modeling can be more successful and informative than creating independent models for each target. As a result one may obtain a smaller model able to simultaneously explain the relations between the input attributes and the targets. This problem has not been addressed previously in the streaming setting. We propose an algorithm for inducing multi-target model trees with low computational complexity, based on the principles of predictive clustering trees and probability bounds for supporting splitting decisions. Linear models are computed for each target separately, by incremental training of perceptrons in the leaves of the tree. Experiments are performed on synthetic and real-world datasets. The multi-target regression tree algorithm produces equally accurate and smaller models for simultaneous prediction of all the target attributes, as compared to a set of independent regression trees built separately for each target attribute. When the regression surface is smooth, the linear models computed in the leaves significantly improve the accuracy for all of the targets. © 2011 ACM.

2010

The next generation of transportation systems, greenhouse emissions, and data mining

Authors
Kargupta, H; Gama, J; Fan, W;

Publication
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Abstract

  • 68
  • 91