Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2020

Spatiotemporal Traffic Anomaly Detection on Urban Road Network Using Tensor Decomposition Method

Authors
Tisljaric, L; Silva Fernandes, Sd; Caric, T; Gama, J;

Publication
Discovery Science - 23rd International Conference, DS 2020, Thessaloniki, Greece, October 19-21, 2020, Proceedings

Abstract
Tensor-based models emerged only recently in modeling and analysis of the spatiotemporal road traffic data. They outperform other data models regarding the property of simultaneously capturing both spatial and temporal components of the observed traffic dataset. In this paper, the nonnegative tensor decomposition method is used to extract traffic patterns in the form of Speed Transition Matrix (STM). The STM is presented as the approach for modeling the large sparse Floating Car Data (FCD). The anomaly of the traffic pattern is estimated using Kullback–Leibler divergence between the observed traffic pattern and the average traffic pattern. Experiments were conducted on the large sparse FCD dataset for the most relevant road segments in the City of Zagreb, which is the capital and largest city in Croatia. Results show that the method was able to detect the most anomalous traffic road segments, and with analysis of the extracted spatial and temporal components, conclusions could be drawn about the causes of the anomalies. Results are validated by using the domain knowledge from the Highway Capacity Manual and achieved a precision score value of more than 90%. Therefore, such valuable traffic information can be used in routing applications and urban traffic planning. © 2020, Springer Nature Switzerland AG.

2016

Online Bagging for Recommendation with Incremental Matrix Factorization

Authors
Vinagre, J; Jorge, AM; Gama, J;

Publication
Proceedings of the Workshop on Large-scale Learning from Data Streams in Evolving Environments (STREAMEVOLV 2016) co-located with the 2016 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016), Riva del Garda, Italy, September 23, 2016.

Abstract
Online recommender systems often deal with continuous, potentially fast and unbounded ows of data. Ensemble methods for recommender systems have been used in the past in batch algorithms, however they have never been studied with incremental algorithms, that are capable of processing those data streams on the y. We propose online bagging, using an incremental matrix factorization algorithm for positiveonly data streams. Using prequential evaluation, we show that bagging is able to improve accuracy more than 20% over the baseline with small computational overhead.

2020

NORMO: A new method for estimating the number of components in CP tensor decomposition

Authors
Fernandes, S; Fanaee T, H; Gama, J;

Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
Tensor decompositions are multi-way analysis tools which have been successfully applied in a wide range of different fields. However, there are still challenges that remain few explored, namely the following: when applying tensor decomposition techniques, what should we expect from the result? How can we evaluate its quality? It is expected that, when the number of components is suitable, then few redundancy is observed in the decomposition result. Based on this assumption, we propose a new method, NORMO, which aims at estimating the number of components in CANDECOMP/PARAFAC (CP) decomposition so that no redundancy is observed in the result. To the best of our knowledge, this work encompasses the first attempt to tackle such problem. According to our experiments, the number of non-redundant components estimated by NORMO is among the most accurate estimates of the true CP number of components in both synthetic and real-world tensor datasets (thus validating the rationale guiding our method). Moreover, NORMO is more efficient than most of its competitors. Additionally, our method can be used to discover multi-levels of granularity in the patterns discovered.

2020

Assembled Feature Selection for Credit Scoring in Microfinance with Non-traditional Features

Authors
Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;

Publication
Discovery Science - 23rd International Conference, DS 2020, Thessaloniki, Greece, October 19-21, 2020, Proceedings

Abstract
Since early 2000, Microfinance Institutions (MFI) have been using credit scoring for their risk assessment. However, one of the main problems of credit scoring in microfinance is the lack of structured financial data. To address this problem, MFI have started using non-traditional data which can be extracted from the digital footprint of their users. The non-traditional data can be used to build algorithms that can identify good borrowers as in traditional banking. This paper proposes an assembled method to evaluate the predictive power of the non-traditional method. By using the Weight of Evidence (WoE), a transformation based on the distribution within the feature, as feature transformation method, and then applying extremely randomized trees for feature selection, we were able to improve the accuracy of the credit scoring model by 20.20% when compared to the credit scoring model built with the traditional implementation of WoE. This paper shows how the assembling of WoE with different feature selection criteria can result in more robust credit scoring models in microfinance. © 2020, Springer Nature Switzerland AG.

2020

Data Science in Economics: Comprehensive Review of Advanced Machine Learning and Deep Learning Methods

Authors
Nosratabadi, S; Mosavi, A; Duan, P; Ghamisi, P; Filip, F; Band, SS; Reuter, U; Gama, J; Gandomi, AH;

Publication
MATHEMATICS

Abstract
This paper provides a comprehensive state-of-the-art investigation of the recent advances in data science in emerging economic applications. The analysis is performed on the novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a broad and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, is used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which outperform other learning algorithms. It is further expected that the trends will converge toward the evolution of sophisticated hybrid deep learning models.

2020

IoT data stream analytics

Authors
Bifet, A; Gama, J;

Publication
ANNALS OF TELECOMMUNICATIONS

Abstract

  • 37
  • 90