Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2015

EigenEvent: An algorithm for event detection from complex data streams in syndromic surveillance

Authors
Fanaee T, H; Gama, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
Syndromic surveillance systems continuously monitor multiple pre-diagnostic daily streams of indicators from different regions with the aim of early detection of disease outbreaks. The main objective of these systems is to detect outbreaks hours or days before the clinical and laboratory confirmation. The type of data that is being generated via these systems is usually multivariate and seasonal with spatial and temporal dimensions. The algorithm What's Strange About Recent Events (WSARE) is the state-of-the-art method for such problems. It exhaustively searches for contrast sets in the multivariate data and signals an alarm when find statistically significant rules. This bottom-up approach presents a much lower detection delay comparing the existing top-down approaches. However, WSARE is very sensitive to the small-scale changes and subsequently comes with a relatively high rate of false alarms. We propose a new approach called EigenEvent that is neither fully top-down nor bottom-up. In this method, we instead of top-down or bottom-up search, track changes in data correlation structure via eigenspace techniques. This new methodology enables us to detect both overall changes (via eigenvalue) and dimension-level changes (via eigenvectors). Experimental results on hundred sets of benchmark data reveals that EigenEvent presents a better overall performance comparing state-of-the-art, in particular in terms of the false alarm rate.

2015

Eigenspace method for spatiotemporal hotspot detection

Authors
Fanaee T, H; Gama, J;

Publication
EXPERT SYSTEMS

Abstract
Hotspot detection aims at identifying sub-groups in the observations that are unexpected, with respect to some baseline information. For instance, in disease surveillance, the purpose is to detect sub-regions in spatiotemporal space, where the count of reported diseases (e.g. cancer) is higher than expected, with respect to the population. The state-of-the-art method for this kind of problem is the space-time scan statistics, which exhaustively search the whole space through a sliding window looking for significant spatiotemporal clusters. Space-time scan statistics makes some restrictive assumptions about the distribution of data, the shape of the hotspots and the quality of data, which can be unrealistic for some non-traditional data sources. A novel methodology called EigenSpot is proposed where instead of an exhaustive search over the space, it tracks the changes in a space-time occurrences structure. The new approach does not only present much more computational efficiency but also makes no assumption about the data distribution, hotspot shape or the data quality. The principal idea is that with the joint combination of abnormal elements in the principal spatial and the temporal singular vectors, the location of hotspots in the spatiotemporal space can be approximated. The experimental evaluation, both on simulated and real data sets, reveals the effectiveness of the proposed method.

2015

Exploring multi-relational temporal databases with a propositional sequence miner

Authors
Ferreira, CA; Gama, J; Costa, VS;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
In this work, we introduce the MuSer, a propositional framework that explores temporal information available in multi-relational databases. At the core of this system is an encoding technique that translates the temporal information into a propositional sequence of events. By using this technique, we are able to explore the temporal information using a propositional sequence miner. With this framework, we mine each class partition individually and we do not use classical aggregation strategies, like window aggregation. Moreover, in this system we combine feature selection and propositionalization techniques to cast a multi-relational classification problem into a propositional one. We empirically evaluate the MuSer framework using two real databases. The results show that mining each partition individually is a time-and memory-efficient strategy that generates a high number of highly discriminative patterns.

2015

Multi-aspect-streaming tensor analysis

Authors
Fanaee T, H; Gama, J;

Publication
KNOWLEDGE-BASED SYSTEMS

Abstract
Tensor analysis is a powerful tool for multiway problems in data mining, signal processing, pattern recognition and many other areas. Nowadays, the most important challenges in tensor analysis are efficiency and adaptability. Still, the majority of techniques are not scalable or not applicable in streaming settings. One of the promising frameworks that simultaneously addresses these two issues is Incremental Tensor Analysis (ITA) that includes three variants called Dynamic Tensor Analysis (DTA), Streaming Tensor Analysis (STA) and Window-based Tensor Analysis (WTA). However, ITA restricts the tensor's growth only in time, which is a huge constraint in scalability and adaptability of other modes. We propose a new approach called multi-aspect-streaming tensor analysis (MASTA) that relaxes this constraint and allows the tensor to concurrently evolve through all modes. The new approach, which is developed for analysis-only purposes, instead of relying on expensive linear algebra techniques is founded on the histogram approximation concept. This consequently brought simplicity, adaptability, efficiency and flexibility to the tensor analysis task. The empirical evaluation on various data sets from several domains reveals that MASTA is a potential technique with a competitive value against ITA algorithms.

2015

Improving Mass Transit Operations by Using AVL-Based Systems: A Survey

Authors
Moreira Matias, L; Mendes Moreira, J; de Sousa, JF; Gama, J;

Publication
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Intelligent transportation systems based on automated data collection frameworks are widely used by the major transit companies around the globe. This paper describes the current state of the art on improving both planning and control on public road transportation companies using automatic vehicle location (AVL) data. By surveying this topic, the expectation is to help develop a better understanding of the nature, approaches, challenges, and opportunities with regard to these problems. This paper starts by presenting a brief review on improving the network definition based on historical location-based data. Second, it presents a comprehensive review on AVL-based evaluation techniques of the schedule plan (SP) reliability, discussing the existing metrics. Then, the different dimensions on improving the SP reliability are presented in detail, as well as the works addressing such problem. Finally, the automatic control strategies are also revised, along with the research employed over the location-based data. A comprehensive discussion on the techniques employed is provided to encourage those who are starting research on this topic. It is important to highlight that there are still gaps in AVL-based literature, such as the following: 1) long-term travel time prediction; 2) finding optimal slack time; or 3) choosing the best control strategy to apply in each situation in the event of schedule instability. Hence, this paper includes introductory model formulations, reference surveys, formal definitions, and an overview of a promising area, which is of interest to any researcher, regardless of the level of expertise.

2015

Visualization for streaming telecommunications networks

Authors
Sarmento, R; Cordeiro, M; Gama, J;

Publication
Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

Abstract
Regular services in telecommunications produce massive volumes of relational data. In this work the data produced in telecommunications is seen as a streaming network, where clients are the nodes and phone calls are the edges. Visualization techniques are required for exploratory data analysis and event detection. In social network visualization and analysis the goal is to get more information from the data taking into account actors at the individual level. Previous methods relied on aggregating communities, k-Core decompositions and matrix feature representations to visualize and analyse the massive network data. Our contribution is a group visualization and analysis technique of influential actors in the network by sampling the full network with a top-k representation of the network data stream. © Springer International Publishing 2015.

  • 238
  • 430