Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2010

Sequential Pattern Mining in Multi-relational Datasets

Authors
Ferreira, CA; Gama, J; Costa, VS;

Publication
CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE

Abstract
We present a framework designed to mine sequential temporal patterns from multi-relational databases. In order to exploit logic-relational information without using aggregation methodologies, we convert the multi-relational dataset into what we name a multi-sequence database. Each example in a multi-relational target table is coded into a sequence that combines intra-table and inter-table relational temporal information. This allows us to find heterogeneous temporal patterns through standard sequence miners. Our framework is grounded in the excellent results achieved by previous propositionalization strategies. We follow a pipelined approach, where we first use a sequence miner to find frequent sequences in the multi-sequence database. Next, we select the most interesting findings to augment the representational space of the examples. The most interesting sequence patterns are discriminative and class correlated. In the final step we build a classifier model by taking an enlarged target table as input to a classifier algorithm. We evaluate the performance of this work through a motivating application, the hepatitis multi-relational dataset. We prove the effectiveness of our methodology by addressing two problems of the hepatitis dataset.

2010

Bipartite Graphs for Monitoring Clusters Transitions

Authors
Oliveira, M; Gama, J;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS IX, PROCEEDINGS

Abstract
The study of evolution has become an important research issue; especially in the last decade, due to a greater awareness of our world's volatility. As a consequence, a new paradigm has emerged to respond more effectively to a elms of new problems in Data Mining. In this paper we address the problem of monitoring the evolution of clusters and propose the MClusT framework, which was developed along the lines of this new Change Mining paradigm. MClusT includes a taxonomy of transitions, a tracking method based in Graph Theory; and a transition detection algorithm. To demonstrate its feasibility and applicability we present; real world case studies, using datasets extracted from Banco de Portugal and the Portuguese Institute of Statistics. We also test our approach in a benchmark dataset from TSDL. The results are encouraging and demonstrate the ability of MClusT framework to provide an efficient diagnosis of clusters transitions.

2010

A Simple Dense Pixel Visualization for Mobile Sensor Data Mining

Authors
Rodrigues, PP; Gama, J;

Publication
KNOWLEDGE DISCOVERY FROM SENSOR DATA

Abstract
Sensor data is usually represented by streaming time series. Current state-of-the-art systems for visualization include line plots and three-dimensional representations, which most of the time require screen resolutions that are not available in small transient mobile devices. Moreover, when data presents cyclic behaviors, such as in the electricity domain, predictive models may tend to give higher errors in certain recurrent points of time, but the human-eye is not trained to notice this cycles in a long stream. In these contexts, information is usually hard to extract from visualization. New visualization techniques may help to detect recurrent faulty predictions. En this paper we inspect visualization techniques in the scope of a real-world sensor network, quickly dwelling into future trends in visualization in transient mobile devices. We propose a simple dense pixel display visualization system, exploiting the benefits that it may represent on detecting and correcting recurrent faulty predictions. A case study is also presented, where a simple corrective strategy is studied in the context of global electrical load demand, exemplifying the utility of the new visualization method when compared with automatic detection of recurrent errors.

2009

Regression Trees from Data Streams with Drift Detection

Authors
Ikonomovska, E; Gama, J; Sebastiao, R; Gjorgjevik, D;

Publication
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
The problem of extracting meaningful patterns from time changing data streams is of increasing importance for the machine learning and data mining communities. We present an algorithm which is able to learn regression trees from fast and unbounded data streams in the presence of concept drifts. To our best knowledge there is no other algorithm for incremental learning regression trees equipped with change detection abilities. The FIRT-DD algorithm has mechanisms for drift detection and model adaptation, which enable to maintain accurate and updated regression models at any time. The drift detection mechanism is based on sequential statistical tests that track the evolution of the local error, at each node of the tree, and inform the learning process for the detected changes. As a response to a local drift, the algorithm is able to adapt the model only locally, avoiding the necessity of a global model adaptation. The adaptation strategy consists of building a new tree whenever a change is suspected in the region and replacing the old ones when the new trees become more accurate. This enables smooth and granular adaptation of the global model. The results from the empirical evaluation performed over several different types of drift show that the algorithm has good capability of consistent detection and proper adaptation to concept drifts.

2009

Tracking Recurring Concepts with Meta-learners

Authors
Gama, J; Kosina, P;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
This work address data stream mining front dynamic environments where the distribution underlying the observations may change over time. In these contexts, learning algorithms must be equipped with change detection mechanisms. Several methods have been proposed able to detect and react to concept drift;. When a drift is signaled, most of the approaches use a forgetting mechanism, by releasing the current; model, and start, learning a, new decision model, Nevertheless, it; is not rare for the, concepts front history to reappear, for example seasonal changes. In this work we present; method that memorizes learnt; decision models whenever a concept drift is signaled. The system uses meta-learning techniques that characterize the domain of applicability of previous learnt models. The meta-learner can detect, re-occurrence of contexts and take pro-active actions by activating previous learnt models. The main benefit of this approach is that the proposed meta-learner is capable of selecting similar historical concepts, if there is one, without the knowledge of true classes of examples.

2009

An overview on mining data streams

Authors
Gama, J; Rodrigues, PP;

Publication
Studies in Computational Intelligence

Abstract
The most challenging applications of knowledge discovery involve dynamic environments where data continuous flow at high-speed and exhibit non-stationary properties. In this chapter we discuss the main challenges and issues when learning from data streams. In this work, we discuss the most relevant issues in knowledge discovery from data streams: incremental learning, cost-performance management, change detection, and novelty detection. We present illustrative algorithms for these learning tasks, and a real-world application illustrating the advantages of stream processing. The chapter ends with some open issues that emerge from this new research area. © 2009 Springer-Verlag Berlin Heidelberg.

  • 72
  • 88