Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2007

Change detection in learning histograms from data streams

Authors
Sebastiao, R; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
In this paper we study the problem of constructing histograms from high-speed time-changing data streams. Learning in this context requires the ability to process examples once at the rate they arrive, maintaining a histogram consistent with the most recent data, and forgetting out-date data whenever a change in the distribution is detected. To construct histogram from high-speed data streams we use the two layer structure used in the Partition Incremental Discretization (PiD) algorithm. Our contribution is a new method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference time window, that reflects the distribution observed in the past; and the current time window reflecting the distribution observed in the most recent data. We compare both distributions and signal a change whenever they are greater than a threshold value, using three different methods: the Entropy Absolute Difference, the Kullback-Leibler divergence and the Cosine Distance. The experimental results suggest that Kullback-Leibler divergence exhibit high probability in change detection, faster detection rates, with few false positives alarms.

2003

Iterative Bayes

Authors
Gama, J;

Publication
THEORETICAL COMPUTER SCIENCE

Abstract
Naive Bayes is a well-known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The Iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. In this paper we argue that Iterative Bayes minimizes a quadratic loss function instead of the 0-1 loss function that usually applies, to classification problems. Experimental evaluation of Iterative Bayes on 27 benchmark data sets shows consistent gains in accuracy. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

1999

Iterative naive Bayes

Authors
Gama, J;

Publication
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
Naive Bayes is a well known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. Experimental evaluation of Iterative Bayes on 25 benchmark datasets shows consistent gains in accuracy. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

2006

Discretization from data streams: Applications to histograms and data mining

Authors
Gama, J; Pinto, C;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
In this paper we propose a new method to perform incremental discretization. The basic idea is to perform the task in two layers. The first layer receives the sequence of input data and keeps some, statistics on the data using many more intervals than required. Based on the statistics stored by the first layer, the second layer creates the final discretization. The proposed architecture processes streaming examples in a single scan, in constant time and space even for infinite sequences of examples. We experimentally demonstrate that incremental discretization is able to maintain the performance of learning algorithms in comparison to a batch discretization. The proposed method is much more appropriate in incremental learning, and in problems where data flows continuously, as in most of the recent data mining applications. Copyright 2006 ACM.

2008

Special track on data streams

Authors
Gama, J; Carvalho, A; Aguilar Rlliz, J;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract

2010

Clustering data streams with weightless neural networks

Authors
Cardoso, DO; Lima, PMV; De Gregorio, M; Gama, J; Franca, FMG;

Publication
ESANN 2011 proceedings, 19th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning

Abstract
Producing good quality clustering of data streams in real time is a difficult problem, since it is necessary to perform the analysis of data points arriving in a continuous style, with the support of quite limited computational resources. The incremental and evolving nature of the resulting clustering structures must reflect the dynamics of the target data stream. The WiSARD weightless perceptron, and its associated DRASiW extension, are intrinsically capable of, respectively, performing one-shot learning and producing prototypes of the learnt categories. This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.

  • 61
  • 91