Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2001

Parallel Implementation of Decision Tree Learning Algorithms

Autores
Amado, N; Gama, J; Silva, FMA;

Publicação
Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving, 10th Portuguese Conference on Artificial Intelligence, EPIA 2001, Porto, Portugal, December 17-20, 2001, Proceedings

Abstract
In the fields of data mining and machine learning the amount of data available for building classifiers is growing very fast. Therefore, there is a great need for algorithms that are capable of building classifiers from very-large datasets and, simultaneously, being computationally efficient and scalable. One possible solution is to employ parallelism to reduce the amount of time spent in building classifiers from very-large datasets and keeping the classification accuracy. This work first overviews some strategies for implementing decision tree construction algorithms in parallel based on techniques such as task parallelism, data parallelism and hybrid parallelism. We then describe a new parallel implementation of the C4.5 decision tree construction algorithm. Even though the implementation of the algorithm is still in final development phase, we present some experimental results that can be used to predict the expected behavior of the algorithm. © Springer-Verlag Berlin Heidelberg 2001.

FecharLer Abstract

2009

Change Detection in Climate Data over the Iberian Peninsula

Autores
Sebastiao, R; Rodrigues, PP; Gama, J;

Publicação
2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009)

Abstract
This paper addresses the space-time change detection problem in climate data over the Iberian Peninsula using a 50 years dataset. The data were analyzed concerning the temporal and geographical information, using the following methodology: information about space-time drifts in climate data was obtained by applying a change detection algorithm on all the temporal data available for each physical location considered in this study; the performance and the robustness of this algorithm were then assessed by the McNemar nonparametric statistical test on cluster structures; geographical correlations were inferred using visualization tools and graphical representations of data. Most of the space-temporal drifts detected by the algorithm were confirmed by the results of the McNemar test and are in accordance with visual and graphical representations, supporting the advantage of using inter-disciplinary methods. This analysis also shows that there are locations which do not reveal any change along all the observed years.

FecharLer Abstract

2012

An overview of social network analysis

Autores
Oliveira, M; Gama, J;

Publicação
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Data mining is being increasingly applied to social networks. Two relevant reasons are the growing availability of large volumes of relational data, boosted by the proliferation of social media web sites, and the intuition that an individual's connections can yield richer information than his/her isolate attributes. This synergistic combination can show to be germane to a variety of applications such as churn prediction, fraud detection and marketing campaigns. This paper attempts to provide a general and succinct overview of the essentials of social network analysis for those interested in taking a first look at this area and oriented to use data mining in social networks. C (C) 2012 Wiley Periodicals, Inc.

FecharLer Abstract

2011

L2GClust: local-to-global clustering of stream sources

Autores
Rodrigues, PP; Gama, J; Araújo, J; Lopes, LMB;

Publicação
Proceedings of the 2011 ACM Symposium on Applied Computing (SAC), TaiChung, Taiwan, March 21 - 24, 2011

Abstract
In ubiquitous streaming data sources, such as sensor networks, clustering nodes by the data they produce is an important problem that gives insights on the phenomenon being monitored by such networks. However, if these techniques require data to be gathered centrally, communication and storage requirements are often unbounded. The goal of this paper is to assess the feasibility of computing local clustering at each node, using only neighbors' centroids, as an approximation of the global clustering computed by a centralized process. A local algorithm is proposed to perform clustering of sensors based on the moving average of each node's data over time: the moving average of each node is approximated using memory-less fading average; clustering is based on the furthest point algorithm applied to the centroids computed by the node's direct neighbors. The algorithm was evaluated on a state-of-the-art sensor network simulator, measuring the agreement between local and global clustering. Experimental work on synthetic data with spherical Gaussian clusters is consistently analyzed for different network size, number of clusters and cluster overlapping. Results show a high level of agreement between each node's clustering definitions and the global clustering definition, with special emphasis on separability agreement. Overall, local approaches are able to keep a good approximation of the global clustering, improving privacy among nodes, and decreasing communication and computation load in the network. Hence, the basic requirements for distributed clustering of streaming data sensors recommend that clustering on these settings should be performed locally. © 2011 ACM.

FecharLer Abstract

2011

Data Streams

Autores
Gama, J; Rodrigues, PP;

Publicação
Encyclopedia of Data Warehousing and Mining, Second Edition

Abstract

2011

Learning from Data Streams

Autores
Gama, J; Rodrigues, PP;

Publicação
Encyclopedia of Data Warehousing and Mining, Second Edition

Abstract