Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2004

Learning with drift detection

Autores
Gama, J; Medas, P; Castillo, G; Rodrigues, P;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2004

Abstract
Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generate the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to control the online error-rate of the algorithm. The training examples are presented in sequence. When a new training example is available, it is classified using the actual model. Statistical theory guarantees that while the distribution is stationary, the error will decrease. When the distribution changes, the error will increase. The method controls the trace of the online error of the algorithm. For the actual context we define a warning level, and a drift level. A new context is declared, if in a sequence of examples, the error increases reaching the warning level at example k(w), and the drift level at example k(d). This is an indication of a change in the distribution of the examples. The algorithm learns a new model using only the examples since k(w). The method was tested with a set of eight artificial datasets and a real world dataset. We used three learning algorithms: a perceptron, a neural network and a decision tree. The experimental results show a good performance detecting drift and with learning the new concept. We also observe that the method is independent of the learning algorithm.

FecharLer Abstract

2005

Learning decision trees from dynamic data streams

Autores
Gama, J; Medas, P; Rodrigues, P;

Publicação
Proceedings of the ACM Symposium on Applied Computing

Abstract
This paper presents a system for induction of forest of functional trees from data streams able to detect concept drift. The Ultra Fast Forest of Trees (UFFT) is an incremental algorithm, that works online, processing each example in constant time, and performing a single scan over the training examples. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. For multi-class problems the algorithm grows a binary tree for each possible pair of classes, leading to a forest of trees. Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples, naive-Bayes in inner nodes can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect drift in the distribution of the examples that traverse the node. When a drift is detected, all the sub-tree rooted at that node will be pruned. The use of naive-Bayes classifiers at leaves to classify test examples, the use of splitting-tests based on the outcome of naive-Bayes, and the use of naive-Bayes classifiers at decision nodes to detect drift are directly obtained from the sufficient statistics required to compute the splitting criteria, without no additional computations. This aspect is a main advantage in the context of high-speed data streams. This methodology was tested with artificial and real-world data sets. The experimental results show a very good performance in comparison to a batch decision tree learner, and high capacity to detect and react to drift. Copyright 2005 ACM.

FecharLer Abstract

2009

Total Mass TCI driven by Parametric Estimation

Autores
Silva, MM; Sousa, C; Sebastiao, R; Gama, J; Mendonca, T; Rocha, P; Esteves, S;

Publicação
MED: 2009 17TH MEDITERRANEAN CONFERENCE ON CONTROL & AUTOMATION, VOLS 1-3

Abstract
This paper presents the Total Mass Target Controlled Infusion algorithm. The system comprises an On Line tuned Algorithm for Recovery Detection (OLARD) after an initial bolus administration and a Bayesian identification method for parametric estimation based on sparse measurements of the accessible signal. To design the drug dosage profile, two algorithms are here proposed. During the transient phase, an Input Variance Control (IVC) algorithm is used. It is based on the concept of TCI and aims to steer the drug effect to a predefined target value within an a priori fixed interval of time. After the steady state phase is reached the drug dose regimen is controlled by a Total Mass Control (TMC) algorithm. The mass control law for compartmental systems is robust even in the presence of parameter uncertainties. The whole system feasibility has been evaluated for the case of Neuromuscular Blockade (NMB) level and was tested both in simulation and in real cases.

FecharLer Abstract

2012

Online evaluation of a changes detection algorithm for depth of anesthesia signals ?

Autores
Sebastiao, R; Silva, MM; Rabico, R; Gama, J; Mendonca, T;

Publicação
IFAC Proceedings Volumes (IFAC-PapersOnline)

Abstract
The detection of changes in the signals used to evaluate the depth of anesthesia of patients undergoing surgery is of foremost importance. This detection allows to decide how to adapt the doses of hypnotics and analgesics to be administered to patients for minimally invasive diagnostics and therapeutic procedures. This paper presents an algorithm based on the Page-Hinkley test to automatically detect changes in the referred depth of anesthesia signals of patients undergoing general anesthesia. The performance of the proposed method is evaluated online using data from patients subject to surgery. The results show that most of the detected changes are in accordance with the actions of the clinicians in terms of times where a change in the hypnotic or analgesic rates had occurred. This detection was performed under the presence of noise and sensor faults. The results encourage the inclusion of the proposed algorithm in a decision support system based on depth of anesthesia signals. © 2012 IFAC.

FecharLer Abstract

2011

Constrained Sequential Pattern Knowledge in Multi-relational Learning

Autores
Ferreira, CA; Gama, J; Costa, VS;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
In this work we present XMuSer, a multi-relational framework suitable to explore temporal patterns available in multi-relational databases. XMuSer's main idea consists of exploiting frequent sequence mining, using an efficient and direct method to learn temporal patterns in the form of sequences. Grounded on a coding methodology and on the efficiency of sequential miners, we find the most interesting sequential patterns available and then map these findings into a new table, which encodes the multi-relational timed data using sequential patterns. In the last step of our framework, we use an ILP algorithm to learn a theory on the enlarged relational database that consists on the original multi-relational database and the new sequence relation. We evaluate our framework by addressing three classification problems. Moreover, we map each one of three different types of sequential patterns: frequent sequences, closed sequences or maximal sequences.

FecharLer Abstract

2004

Forest trees for on-line data

Autores
Gama, J; Medas, P; Rocha, R;

Publicação
Proceedings of the ACM Symposium on Applied Computing

Abstract
This paper presents an hybrid adaptive system for induction of forest of trees from data streams. The Ultra Fast Forest Tree system (UFFT) is an incremental algorithm, with constant time for processing each example, works online, and uses the Hoeffding bound to decide when to install a splitting test in a leaf leading to a decision node. Our system has been designed for continuous data. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. The number of examples required to evaluate the splitting criteria is sound, based on the Hoeffding bound. For multiclass problems,the algorithm builds a binary tree for each possible pair of classes, leading to a forest of trees. During the training phase the algorithm maintains a short term memory. Given a data stream, a fixed number of the most recent examples are maintained in a data-structure that supports constant time insertion and deletion. When a test is installed, a leaf is transformed into a decision node with two descendant leaves. The sufficient statistics of these leaves are initialized with the examples in the short term memory that will fall at these leaves. We study the behavior of UFFT in different problems. The experimental results shows that UFFT is competitive against a batch decision tree learner in large and medium datasets.

FecharLer Abstract