Publications

Publications by João Gama

2017

Credit Scoring in Microfinance Using Non-traditional Data

Authors
Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)

Abstract
Emerging markets contain the vast majority of the world's population. Despite the huge number of inhabitants, these markets still lack a proper finance infrastructure. One of the main difficulties felt by customers is the access to loans. This limitation arises from the fact that most customers usually lack a verifiable credit history. As such, traditional banks are unable to provide loans. This paper proposes credit scoring modeling based on non-traditional data, acquired from smartphones, for loan classification processes. We use Logistic Regression (LR) and Support Vector Machine (SVM) models which are the top performers in traditional banking. Then we compared the transformation of the training datasets creating boolean indicators against recoding using Weight of Evidence (WoE). Our models surpassed the performance of the manual loan application selection process, loans granted through the models criteria presented fewer overdues, also the approval criteria of the models increased the amount of granted loans substantially. Compared to the baseline, the loans approved by meeting the criteria of the SVM model presented -196.80% overdue rate. At the same time, the approval criteria of the SVM model generated 251.53% more loans. This paper shows that credit scoring can be useful in emerging markets. The non-traditional data can be used to build algorithms that can identify good borrowers as in traditional banking.

CloseRead Abstract

2014

Challenges in Learning from Streaming Data Extended Abstract

Authors
Gama, J;

Publication
ICT Innovations 2014 - World of Data, Ohrid, Macedonia, 1-4 October, 2014

Abstract
Machine learning studies automatic methods for acquisition of domain knowledge with the goal of improving systems performance as the result of experience. In the past two decades, machine learning research and practice has focused on batch learning usually with small data sets. The rationale behind this practice is that examples are generated at random accordingly to some stationary probability distribution. Most learners use a greedy, hill-climbing search in the space of models. They are prone to overfitting, local maximas, etc. Data are scarce and statistic estimates have high variance. A paradigmatic example is the TDIT algorithm to learn decision trees [14]. As the tree grows, less and fewer examples are available to compute the sufficient statistics, variance increase leading to model instability Moreover, the growing process re-uses the same data, exacerbating the overfitting problem. Regularization and pruning mechanisms are mandatory. © Springer International Publishing Switzerland 2015.

CloseRead Abstract

2014

Ensembles of Adaptive Model Rules from High-Speed Data Streams

Authors
Duarte, J; Gama, J;

Publication
Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine 2014, New York City, USA, August 24, 2014

Abstract
The volume and velocity of data is increasing at astonishing rates. In order to extract knowledge from this huge amount of information there is a need for efficient on-line learning algorithms. Rule-based algorithms produce models that are easy to understand and can be used almost offhand. Ensemble methods combine several predicting models to improve the quality of prediction. In this paper, a new on-line ensemble method that combines a set of rule-based models is proposed to solve regression problems from data streams. Experimental results using synthetic and real time-evolving data streams show the proposed method significantly improves the performance of the single rule-based learner, and outperforms two state-of-the-art regression algorithms for data streams.

CloseRead Abstract

2015

Prediction Intervals for Electric Load Forecast: Evaluation for Different Profiles

Authors
Almeida, V; Gama, J;

Publication
2015 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM APPLICATION TO POWER SYSTEMS (ISAP)

Abstract
Electricity industries throughout the world have been using load profiles for many years. Electrical load data contain valuable information that can be useful for both electricity producers and consumers. Load forecasting is a fundamental and important task to operate power systems efficiently and economically. Currently, prediction intervals (PIs) are assuming increasing importance comparatively to point forecast that cannot properly handle forecast uncertainties, since they are capable to compromise informativeness and correctness. This paper aims to demonstrate that different demand profiles clearly influence PIs reliability and width. The evaluation is performed using data from different customers on the basis of their electricity behavior using hierarchical clustering, and taking the Kullback-Leibler divergence as the distance metric. PIs are obtained using two different strategies: (1) dual perturb and combine algorithm and (2) conformal prediction. It was possible to demonstrate that different demand profiles clearly influence PI reliability and width for both models. The knowledge retrieved from the analysis of the load patterns is useful and can be used to support the selection of the best method to interval forecast, considering a specific location. And also, it can support the selection of an optimum confidence level, considering that a too wide PI conveys little information and is of no use for decision making.

CloseRead Abstract

2014

Keynote speakers

Authors
Gama, J;

Publication
IEEE Symposium on Computers and Communications, ISCC 2014, Funchal, Madeira, Portugal, June 23-26, 2014

Abstract

2017

Progress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Porto, Portugal, September 5-8, 2017, Proceedings

Authors
Oliveira, Eugenio; Gama, Joao; Vale, ZitaA.; Cardoso, HenriqueLopes;

Publication
EPIA

Abstract