Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2018

A local algorithm to approximate the global clustering of streams generated in ubiquitous sensor networks

Authors
Rodrigues, PP; Araujo, J; Gama, J; Lopes, L;

Publication
INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS

Abstract
In ubiquitous streaming data sources, such as sensor networks, clustering nodes by the data they produce gives insights on the phenomenon being monitored. However, centralized algorithms force communication and storage requirements to grow unbounded. This article presents L2GClust, an algorithm to compute local clusterings at each node as an approximation of the global clustering. L2GClust performs local clustering of the sources based on the moving average of each node's data over time: the moving average is approximated using memory-less statistics; clustering is based on the furthest-point algorithm applied to the centroids computed by the node's direct neighbors. Evaluation is performed both on synthetic and real sensor data, using a state-of-the-art sensor network simulator and measuring sensitivity to network size, number of clusters, cluster overlapping, and communication incompleteness. A high level of agreement was found between local and global clusterings, with special emphasis on separability agreement, while an overall robustness to incomplete communications emerged. Communication reduction was also theoretically shown, with communication ratios empirically evaluated for large networks. L2GClust is able to keep a good approximation of the global clustering, using less communication than a centralized alternative, supporting the recommendation to use local algorithms for distributed clustering of streaming data sources.

2018

On analyzing user preference dynamics with temporal social networks

Authors
Pereira, FSF; Gama, J; de Amo, S; Oliveira, GMB;

Publication
MACHINE LEARNING

Abstract
The preferences adopted by individuals are constantly modified as these are driven by new experiences, natural life evolution and, mainly, influence from friends. Studying these temporal dynamics of user preferences has become increasingly important for personalization tasks in information retrieval and recommendation systems domains. However, existing models are too constrained for capturing the complexity of the underlying phenomenon. Online social networks contain rich information about social interactions and relations. Thus, these become an essential source of knowledge for the understanding of user preferences evolution. In this work, we investigate the interplay between user preferences and social networks over time. First, we propose a temporal preference model able to detect preference change events of a given user. Following this, we use temporal networks concepts to analyze the evolution of social relationships and propose strategies to detect changes in the network structure based on node centrality. Finally, we look for a correlation between preference change events and node centrality change events over Twitter and Jam social music datasets. Our findings show that there is a strong correlation between both change events, specially when modeling social interactions by means of a temporal network.

2018

Incremental TextRank - Automatic Keyword Extraction for Text Streams

Authors
Sarmento, RP; Cordeiro, M; Brazdil, P; Gama, J;

Publication
Proceedings of the 20th International Conference on Enterprise Information Systems, ICEIS 2018, Funchal, Madeira, Portugal, March 21-24, 2018, Volume 1.

Abstract
Text Mining and NLP techniques are a hot topic nowadays. Researchers thrive to develop new and faster algorithms to cope with larger amounts of data. Particularly, text data analysis has been increasing in interest due to the growth of social networks media. Given this, the development of new algorithms and/or the upgrade of existing ones is now a crucial task to deal with text mining problems under this new scenario. In this paper, we present an update to TextRank, a well-known implementation used to do automatic keyword extraction from text, adapted to deal with streams of text. In addition, we present results for this implementation and compare them with the batch version. Major improvements are lowest computation times for the processing of the same text data, in a streaming environment, both in sliding window and incremental setups. The speedups obtained in the experimental results are significant. Therefore the approach was considered valid and useful to the research community. Copyright

2018

Guest Editorial Special Issue on Knowledge Discovery From Mobility Data for Intelligent Transportation Systems

Authors
Moreira Matias, L; Gama, J; Monreal, CO; Nair, R; Trasarti, R;

Publication
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
The recent technological advances on telecommunications create a new reality on mobility sensing. Nowadays, we live in an era where ubiquitous digital devices are able to broadcast rich information about human mobility in real-Time and at a high rate. Such fact exponentially increased the availability of large-scale mobility data which has been popularized in the media as the new currency, fueling the future vision of our smart cities that will transform our lives. The reality is that we just began to recognize significant research challenges across a spectrum of topics. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders on building knowledge discovery pipelines over such data sources. However, such availability also raises privacy issues that must be considered by both industrial and academic stakeholders on using these resources. © 2000-2011 IEEE.

2018

Biased Dynamic Sampling for Temporal Network Streams

Authors
Tabassum, S; Gama, J;

Publication
Complex Networks and Their Applications VII - Volume 1 Proceedings The 7th International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2018, Cambridge, UK, December 11-13, 2018.

Abstract
Considering the avalanche of evolving data and the memory constraints, streaming networks’ sampling has gained much attention in the recent decade. However, samples choosing data uniformly from the beginning to the end of a temporal stream are not very relevant for temporally evolving networks where recent activities are more important than the old events. Moreover, the relationships also change overtime. Recent interactions are evident to show the current status of relationships, nevertheless some old stronger relations are also substantially significant. Considering the above issues we propose a fast memory less dynamic sampling mechanism for weighted or multi-graph high-speed streams. For this purpose, we use a forgetting function with two parameters that help introduce biases on the network based on time and relationship strengths. Our experiments on real-world data sets show that our samples not only preserve the basic properties like degree distributions but also maintain the temporal distribution correlations. We also observe that our method generates samples with increased efficiency. It also outperforms current sampling algorithms in the area. © 2019, Springer Nature Switzerland AG.

2018

Improving acute kidney injury detection with conditional probabilities

Authors
Nogueira, AR; Ferreira, CA; Gama, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
The Acute Kidney Injury (AKI), is a disease that affects the kidneys and is characterized by the rapid deterioration of these organs, usually associated with a pre-existing critical illness. Being an acute disease, time is a key element in the prevention. By anticipating a patient's state transition, we are preventing future complications in his health, such as the development of a chronic disease or loss of an organ, in addition to decreasing the amount of money spent on the patient's care. The main goal of this paper is to address the problem of correctly predicting the illness path in various patients by studying different methodologies to predict this disease and propose new distinct approaches based on this idea of improving the performance of the classification. Through the comparison of five different approaches (Markov Chain Model ICU Specialists, Markov Chain Model Features, Markov Chain Model Conditional Features, Markov Chain Model and Random Forest), we came to the conclusion that the application of conditional probabilities to this problem produces a more accurate prediction, based on common inputs.

  • 225
  • 509