Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2016

Concept Neurons - Handling Drift Issues for Real-Time Industrial Data Mining

Autores
Moreira Matias, L; Gama, J; Mendes Moreira, J;

Publicação
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2016, PT III

Abstract
Learning from data streams is a challenge faced by data science professionals from multiple industries. Most of them struggle hardly on applying traditional Machine Learning algorithms to solve these problems. It happens so due to their high availability on ready-to-use software libraries on big data technologies (e.g. SparkML). Nevertheless, most of them cannot cope with the key characteristics of this type of data such as high arrival rate and/or non-stationary distributions. In this paper, we introduce a generic and yet simplistic framework to fill this gap denominated Concept Neurons. It leverages on a combination of continuous inspection schemas and residual-based updates over the model parameters and/or the model output. Such framework can empower the resistance of most of induction learning algorithms to concept drifts. Two distinct and hence closely related flavors are introduced to handle different drift types. Experimental results on successful distinct applications on different domains along transportation industry are presented to uncover the hidden potential of this methodology.

2016

Dynamic community detection in evolving networks using locality modularity optimization

Autores
Cordeiro, M; Sarmento, RP; Gama, J;

Publicação
SOCIAL NETWORK ANALYSIS AND MINING

Abstract
The amount and the variety of data generated by today's online social and telecommunication network services are changing the way researchers analyze social networks. Facing fast evolving networks with millions of nodes and edges are, among other factors, its main challenge. Community detection algorithms in these conditions have also to be updated or improved. Previous state-of-the-art algorithms based on the modularity optimization (i.e. Louvain algorithm), provide fast, efficient and robust community detection on large static networks. Nonetheless, due to the high computing complexity of these algorithms, the use of batch techniques in dynamic networks requires to perform network community detection for the whole network in each one of the evolution steps. This fact reveals to be computationally expensive and unstable in terms of tracking of communities. Our contribution is a novel technique that maintains the community structure always up-to-date following the addition or removal of nodes and edges. The proposed algorithm performs a local modularity optimization that maximizes the modularity gain function only for those communities where the editing of nodes and edges was performed, keeping the rest of the network unchanged. The effectiveness of our algorithm is demonstrated with the comparison to other state-of-the-art community detection algorithms with respect to Newman's Modularity, Modularity with Split Penalty, Modularity Density, number of detected communities and running time.

2016

Event detection from traffic tensors: A hybrid model

Autores
Fanaee T, H; Gama, J;

Publicação
NEUROCOMPUTING

Abstract
A traffic tensor or simply origin x destination x time is a new data model for conventional origin/destination (O/D) matrices. Tensor models are traffic data analysis techniques which use this new data model to improve performance. Tensors outperform other models because both temporal and spatial fluctuations of traffic patterns are simultaneously taken into account, obtaining results that follow a more natural pattern. Three major types of fluctuations can occur in traffic tensors: mutations to the overall traffic flows, alterations to the network topology and chaotic behaviors. How can we detect events in a system that is faced with all types of fluctuations during its life cycle? Our initial studies reveal that the current design of tensor models face some difficulties in dealing with such a realistic scenario. We propose a new hybrid tensor model called HTM that enhances the detection ability of tensor models by using a parallel tracking technique on the traffic's topology. However, tensor decomposition techniques such as Tucker, a key step for tensor models, require a complicated parameter that not only is difficult to choose but also affects the model's quality. We address this problem examining a recent technique called adjustable core size Tucker decomposition (ACS-Tucker). Experiments on simulated and real-world data sets from different domains versus several techniques indicate that the proposed model is effective and robust, therefore it constitutes a viable alternative for analysis of the traffic tensors.

2016

IoT Big Data Stream Mining

Autores
Morales, GDF; Bifet, A; Khan, L; Gama, J; Fan, W;

Publicação
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016

Abstract
The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in IoT stream mining. This tutorial is a gentle introduction to mining IoT big data streams. The first part introduces data stream learners for classification, regression, clustering, and frequent pattern mining. The second part deals with scalability issues inherent in IoT applications, and discusses how to mine data streams on distributed engines such as Spark, Flink, Storm, and Samza. © 2016 Copyright held by the owner/author(s).

2016

Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers

Autores
Borchani, H; Larranaga, P; Gama, J; Bielza, C;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most of these approaches can only be applied to uni-dimensional classification problems where each input instance has to be assigned to a single output class variable. The problem of mining multi-dimensional data streams, which includes multiple output class variables, is largely unexplored and only few streaming multi-dimensional approaches have been recently introduced. In this paper, we propose a novel adaptive method, named Locally Adaptive-MB-MBC (LA-MB-MBC), for mining streaming multi-dimensional data. To this end, we make use of multi-dimensional Bayesian network classifiers (MBCs) as models. Basically, LA-MB-MBC monitors the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a concept drift is detected, LA-MB-MBC adapts the current MBC network locally around each changed node. An experimental study carried out using synthetic multi-dimensional data streams shows the merits of the proposed method in terms of concept drift detection as well as classification performance.

2016

Online Social Networks Event Detection: A Survey

Autores
Cordeiro, Mario; Gama, Joao;

Publicação
Solving Large Scale Learning Tasks. Challenges and Algorithms - Essays Dedicated to Katharina Morik on the Occasion of Her 60th Birthday

Abstract
Today online social network services are challenging stateof- the-art social media mining algorithms and techniques due to its realtime nature, scale and amount of unstructured data generated. The continuous interactions between online social network participants generate streams of unbounded text content and evolutionary network structures within the social streams that make classical text mining and network analysis techniques obsolete and not suitable to deal with such new challenges. Performing event detection on online social networks is no exception, state-of-the-art algorithms rely on text mining techniques applied to pre-known datasets that are being processed with no restrictions on the computational complexity and required execution time per document analysis. Moreover, network analysis algorithms used to extract knowledge from users relations and interactions were not designed to handle evolutionary networks of such order of magnitude in terms of the number of nodes and edges. This specific problem of event detection becomes even more serious due to the real-time nature of online social networks. New or unforeseen events need to be identified and tracked on a real-time basis providing accurate results as quick as possible. It makes no sense to have an algorithm that provides detected event results a few hours after being announced by traditional newswire. © Springer International Publishing Switzerland 2016.

  • 214
  • 430