Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2018

Dynamic graph summarization: a tensor decomposition approach

Autores
Fernandes, S; Fanaee T, H; Gama, J;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Due to the scale and complexity of todays' social networks, it becomes infeasible to mine them with traditional approaches. A possible solution to reduce such scale and complexity is to produce a compact (lossy) version of the network that represents its major properties. This task is known as graph summarization, which is the subject of this research. Our focus is on time-evolving graphs, a more complex scenario where the dynamics of the network also should be taken into account. We address this problem using tensor decomposition, which enables us to capture the multi-way structure of the time-evolving network. This property is unique and is impossible to obtain with other approaches such as matrix factorization. Experimental evaluation on five real world networks implies promising results demonstrating that tensor decomposition is quite useful for summarizing dynamic networks.

FecharLer Abstract

2018

Social network analysis: An overview

Autores
Tabassum, S; Pereira, FSF; Fernandes, S; Gama, J;

Publicação
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Social network analysis (SNA) is a core pursuit of analyzing social networks today. In addition to the usual statistical techniques of data analysis, these networks are investigated using SNA measures. It helps in understanding the dependencies between social entities in the data, characterizing their behaviors and their effect on the network as a whole and over time. Therefore, this article attempts to provide a succinct overview of SNA in diverse topological networks (static, temporal, and evolving networks) and perspective (ego-networks). As one of the primary applicability of SNA is in networked data mining, we provide a brief overview of network mining models as well; by this, we present the readers with a concise guided tour from analysis to mining of networks. This article is categorized under: Application Areas > Science and Technology Technologies > Machine Learning Fundamental Concepts of Data and Knowledge > Human Centricity and User Interaction Commercial, Legal, and Ethical Issues > Social Considerations

FecharLer Abstract

2018

Self Hyper-Parameter Tuning for Data Streams

Autores
Veloso, B; Gama, J; Malheiro, B;

Publicação
Discovery Science - 21st International Conference, DS 2018, Limassol, Cyprus, October 29-31, 2018, Proceedings

Abstract
The widespread usage of smart devices and sensors together with the ubiquity of the Internet access is behind the exponential growth of data streams. Nowadays, there are hundreds of machine learning algorithms able to process high-speed data streams. However, these algorithms rely on human expertise to perform complex processing tasks like hyper-parameter tuning. This paper addresses the problem of data variability modelling in data streams. Specifically, we propose and evaluate a new parameter tuning algorithm called Self Parameter Tuning (SPT). SPT consists of an online adaptation of the Nelder & Mead optimisation algorithm for hyper-parameter tuning. The method explores a dynamic size sample method to evaluate the current solution, and uses the Nelder & Mead operators to update the current set of parameters. The main contribution is the adaptation of the Nelder-Mead algorithm to automatically tune regression hyper-parameters for data streams. Additionally, whenever concept drifts occur in the data stream, it re-initiates the search for new hyper-parameters. The proposed method has been evaluated on regression scenario. Experiments with well known time-evolving data streams show that the proposed SPT hyper-parameter optimisation outperforms the results of previous expert hyper-parameter tuning efforts. © 2018, Springer Nature Switzerland AG.

FecharLer Abstract

2018

Weightless neural modeling for mining data streams

Autores
Cardoso, DO; Gama, J; França, F;

Publicação
Data Mining in Time Series and Streaming Databases

Abstract
Learning from data streams can only be realized by systems which are not only effective but also efficient. That is, knowledge discovery in this context is impossible without being aware of the computational resources available. Weightless artificial neural networks (WANNs) are based on an alternative principle to iterative optimization of weights employed by most mainstream artificial neural network models and related tools. WANNs explicitly manage knowledge pieces, which are stored by RAM nodes. Such foundational difference reflects on the adaptability of these models to streaming inputs: in such scenario, the application of weightless models can be considered more natural than the same for their weighted counterparts, with an ample control over learning capability as well as resources consumption. This chapter details a WANN-based approach for mining data streams, which allows the maintenance of an up-to-date data summary which can be used for several purposes. The insights and original ideas which power such model are explained as well, enabling novel applications and further development of them.

FecharLer Abstract

2018

A comparison of hierarchical multi-output recognition approaches for anuran classification

Autores
Colonna, JG; Gama, J; Nakamura, EF;

Publicação
MACHINE LEARNING

Abstract
In bioacoustic recognition approaches, a flat classifier is usually trained to recognize several species of anurans, where the number of classes is equal to the number of species. Consequently, the complexity of the classification function increases proportionally with the number of species. To avoid this issue, we propose a hierarchical approach that decomposes the problem into three taxonomic levels: the family, the genus, and the species. To accomplish this, we transform the original single-labelled problem into a multi-output problem (multi-label and multi-class) considering the biological taxonomy of the species. We then develop a top-down method using a set of classifiers organized as a hierarchical tree. We test and compare two hierarchical methods, using (1) one classifier per parent node and (2) one classifier per level, against a flat approach. Thus, we conclude that it is possible to predict the same set of species as a flat classifier, and additionally obtain new information about the samples and their taxonomic relationship. This helps us to better understand the problem and achieve additional conclusions by the inspection of the confusion matrices at the three classification levels. In addition, we propose a soft decision rule based on the joint probabilities of hierarchy pathways. With this we are able to identify and reject confusing cases. We carry out our experiments using cross-validation performed by individuals. This form of CV avoids mixing syllables that belong to the same specimens in the testing and training sets, preventing an overestimate of the accuracy and generalizing the predictive capabilities of the system. We tested our methods in a dataset with sixty individual frogs, from ten different species, eight genera, and four families, achieving a final Macro-Fscore of 80 and 70% with and without applying the rejection rule, respectively.

FecharLer Abstract

2018

A local algorithm to approximate the global clustering of streams generated in ubiquitous sensor networks

Autores
Rodrigues, PP; Araujo, J; Gama, J; Lopes, L;

Publicação
INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS

Abstract
In ubiquitous streaming data sources, such as sensor networks, clustering nodes by the data they produce gives insights on the phenomenon being monitored. However, centralized algorithms force communication and storage requirements to grow unbounded. This article presents L2GClust, an algorithm to compute local clusterings at each node as an approximation of the global clustering. L2GClust performs local clustering of the sources based on the moving average of each node's data over time: the moving average is approximated using memory-less statistics; clustering is based on the furthest-point algorithm applied to the centroids computed by the node's direct neighbors. Evaluation is performed both on synthetic and real sensor data, using a state-of-the-art sensor network simulator and measuring sensitivity to network size, number of clusters, cluster overlapping, and communication incompleteness. A high level of agreement was found between local and global clusterings, with special emphasis on separability agreement, while an overall robustness to incomplete communications emerged. Communication reduction was also theoretically shown, with communication ratios empirically evaluated for large networks. L2GClust is able to keep a good approximation of the global clustering, using less communication than a centralized alternative, supporting the recommendation to use local algorithms for distributed clustering of streaming data sources.

FecharLer Abstract