2016
Authors
Borchani, H; Larranaga, P; Gama, J; Bielza, C;
Publication
INTELLIGENT DATA ANALYSIS
Abstract
In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most of these approaches can only be applied to uni-dimensional classification problems where each input instance has to be assigned to a single output class variable. The problem of mining multi-dimensional data streams, which includes multiple output class variables, is largely unexplored and only few streaming multi-dimensional approaches have been recently introduced. In this paper, we propose a novel adaptive method, named Locally Adaptive-MB-MBC (LA-MB-MBC), for mining streaming multi-dimensional data. To this end, we make use of multi-dimensional Bayesian network classifiers (MBCs) as models. Basically, LA-MB-MBC monitors the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a concept drift is detected, LA-MB-MBC adapts the current MBC network locally around each changed node. An experimental study carried out using synthetic multi-dimensional data streams shows the merits of the proposed method in terms of concept drift detection as well as classification performance.
2016
Authors
Cordeiro, Mario; Gama, Joao;
Publication
Solving Large Scale Learning Tasks. Challenges and Algorithms - Essays Dedicated to Katharina Morik on the Occasion of Her 60th Birthday
Abstract
Today online social network services are challenging stateof- the-art social media mining algorithms and techniques due to its realtime nature, scale and amount of unstructured data generated. The continuous interactions between online social network participants generate streams of unbounded text content and evolutionary network structures within the social streams that make classical text mining and network analysis techniques obsolete and not suitable to deal with such new challenges. Performing event detection on online social networks is no exception, state-of-the-art algorithms rely on text mining techniques applied to pre-known datasets that are being processed with no restrictions on the computational complexity and required execution time per document analysis. Moreover, network analysis algorithms used to extract knowledge from users relations and interactions were not designed to handle evolutionary networks of such order of magnitude in terms of the number of nodes and edges. This specific problem of event detection becomes even more serious due to the real-time nature of online social networks. New or unforeseen events need to be identified and tracked on a real-time basis providing accurate results as quick as possible. It makes no sense to have an algorithm that provides detected event results a few hours after being announced by traditional newswire. © Springer International Publishing Switzerland 2016.
2016
Authors
Moreira, R; Bessa, R; Gama, J;
Publication
2016 13TH INTERNATIONAL CONFERENCE ON THE EUROPEAN ENERGY MARKET (EEM)
Abstract
With the liberalization of the electricity markets, price forecasting has become crucial for the decision-making process of market agents. The unique features of electricity price, such as non-stationary, non-linearity and high volatility make this a very difficult task. For this reason, rather than a simple point forecast, market participants are more interested in a probabilistic forecast that is essential to estimate the uncertainty involved in the price. By focusing on this issue, the aim of this paper is to analyze the impact of external factors in the electricity price and present a methodology for probabilistic forecasting of day-ahead electricity prices from the Iberian electricity market. The models are built using regression techniques and aim to obtain, for each hour, the quantiles of 5% to 95% by steps of 5%.
2016
Authors
Tabassum, Shazia; Gama, Joao;
Publication
IEEE 17th International Conference on Mobile Data Management, MDM 2016, Porto, Portugal, June 13-16, 2016 - Workshops
Abstract
2016
Authors
Tabassum, S; Gama, J;
Publication
Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, April 4-8, 2016
Abstract
The problem of analyzing massive graph streams in real time is growing along with the size of streams. Sampling techniques have been used to analyze these streams in real time. However, it is difficult to answer questions like, which structures are well preserved by the sampling techniques over the evolution of streams? Which sampling techniques yield proper estimates for directed and weighted graphs? Which techniques have least time complexity etc? In this work, we have answered the above questions by comparing and analyzing the evolutionary samples of such graph streams. We have evaluated sequential sampling techniques by comparing the structural metrics from their samples. We have also presented a biased version of reservoir sampling, which shows better comparative results in our scenario. We have carried out rigorous experiments over a massive stream of 3 hundred million calls made by 11 million anonymous subscribers over 31 days. We evaluated node based and edge based methods of sampling. We have compared the samples generated by using sequential algorithms like, space saving algorithm for finding topK items, reservoir sampling, and a biased version of reservoir sampling. Our overall results and observations show that edge based samples perform well in our scenario. We have also compared the distribution of degrees and biases of evolutionary samples. © 2016 ACM.
2016
Authors
Sarmento, R; Oliveira, M; Cordeiro, M; Tabassum, S; Gama, J;
Publication
Studies in Big Data
Abstract
Mobile phones are powerful tools to connect people. The streams of Call Detail Records (CDR’s) generating from these devices provide a powerful abstraction of social interactions between individuals, representing social structures. Call graphs can be deduced from these CDRs, where nodes represent subscribers and edges represent the phone calls made. These graphs may easily reach millions of nodes and billions of edges. Besides being large-scale and generated in real-time, the underlying social networks are inherently complex and, thus, difficult to analyze. Conventional data analysis performed by telecom operators is slow, done by request and implies heavy costs in data warehouses. In face of these challenges, real-time streaming analysis becomes an ever increasing need to mobile operators, since it enables them to quickly detect important network events and optimize business operations. Sampling, together with visualization techniques, are required for online exploratory data analysis and event detection in such networks. In this chapter, we report the burgeoning body of research in network sampling, visualization of streaming social networks, stream analysis and the solutions proposed so far. © 2016, Springer International Publishing Switzerland.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.