Publications

Publications by LIAAD

2017

Weightless neural networks for open set recognition

Authors
Cardoso, DO; Gama, J; Franca, FMG;

Publication
MACHINE LEARNING

Abstract
Open set recognition is a classification-like task. It is accomplished not only by the identification of observations which belong to targeted classes (i.e., the classes among those represented in the training sample which should be later recognized) but also by the rejection of inputs from other classes in the problem domain. The need for proper handling of elements of classes beyond those of interest is frequently ignored, even in works found in the literature. This leads to the improper development of learning systems, which may obtain misleading results when evaluated in their test beds, consequently failing to keep the performance level while facing some real challenge. The adaptation of a classifier for open set recognition is not always possible: the probabilistic premises most of them are built upon are not valid in a open-set setting. Still, this paper details how this was realized for WiSARD a weightless artificial neural network model. Such achievement was based on an elaborate distance-like computation this model provides and the definition of rejection thresholds during training. The proposed methodology was tested through a collection of experiments, with distinct backgrounds and goals. The results obtained confirm the usefulness of this tool for open set recognition.

CloseRead Abstract

2017

Fading histograms in detecting distribution and concept changes

Authors
Sebastião, R; Gama, J; Mendonça, T;

Publication
I. J. Data Science and Analytics

Abstract

2017

The Initialization and Parameter Setting Problem in Tensor Decomposition-Based Link Prediction

Authors
Silva Fernandes, Sd; Tork, HF; da Gama, JMP;

Publication
2017 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2017, Tokyo, Japan, October 19-21, 2017

Abstract
Link prediction is the task of social network analysis whose goal is to predict the links that will appear in the network in future instants. Among the link predictors exploiting the time evolution of the networks, we can find the tensor decomposition-based methods. A major limitation of these methods is the lack of appropriate approaches for estimating their parameters and initialization. In this paper, we address this problem by proposing a parameter setting method. Our proposed approach resorts to optimization techniques to drive the search for an adequate parameter and initialization choice. © 2017 IEEE.

CloseRead Abstract

2017

An evolutionary algorithm for clustering data streams with a variable number of clusters

Authors
Silva, JD; Hruschka, ER; Gama, J;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Several algorithms for clustering data streams based on k-Means have been proposed in the literature. However, most of them assume that the number of clusters, k, is known a priori by the user and can be kept fixed throughout the data analySis process. Besides the difficulty in choosing k, data stream clustering imposes several challenges to be addressed, such as addressing non-stationary, unbounded data that arrive in an online fashion. In this paper, we propose a Fast Evolutionary Algorithm for Clustering data streams (FEAC-Stream) that allows estimating k automatically from data in an online fashion. FEAC-Stream uses the Page-Hinkley Test to detect eventual degradation in the quality of the induced clusters, thereby triggering an evolutionary algorithm that re-estimates k accordingly. FEAC-Stream relies on the assumption that clusters of (partially unknown) data can provide useful information about the dynamics of the data stream. We illustrate the potential of FEAC-Stream in a set of experiments using both synthetic and real-world data streams, comparing it to four related algorithms, namely: CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk and StreamKM++-BkM. The obtained results show that FEAC-Stream provides good data partitions and that it can detect, and accordingly react to, data changes.

CloseRead Abstract

2017

Ensemble learning for data stream analysis: A survey

Authors
Krawczyk, B; Minku, LL; Gama, J; Stefanowski, J; Wozniak, M;

Publication
INFORMATION FUSION

Abstract
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary characteristics of streaming data, prediction models are often also required to adapt to concept drifts. Out of several new proposed stream algorithms, ensembles play an important role, in particular for 'non-stationary environments. This paper surveys research on ensembles for data stream classification as well as regression tasks. Besides presenting a comprehensive spectrum of ensemble approaches for data streams, we also discuss advanced learning concepts such as imbalanced data streams, novelty detection, active and semi supervised learning, complex data representations and structured outputs. The paper concludes with a discussion of open research problems and lines of future research. Published by Elsevier B.V.

CloseRead Abstract

2017

Proceedings of the First Workshop on Data Science for Social Good co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Dicovery in Databases, SoGood@ECML-PKDD 2016, Riva del Garda, Italy, September 19, 2016

Authors
Gavaldà, Ricard; Zliobaite, Indre; Gama, Joao;

Publication
SoGood@ECML-PKDD

Abstract