Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2017

Credit Scoring in Microfinance Using Non-traditional Data

Autores
Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)

Abstract
Emerging markets contain the vast majority of the world's population. Despite the huge number of inhabitants, these markets still lack a proper finance infrastructure. One of the main difficulties felt by customers is the access to loans. This limitation arises from the fact that most customers usually lack a verifiable credit history. As such, traditional banks are unable to provide loans. This paper proposes credit scoring modeling based on non-traditional data, acquired from smartphones, for loan classification processes. We use Logistic Regression (LR) and Support Vector Machine (SVM) models which are the top performers in traditional banking. Then we compared the transformation of the training datasets creating boolean indicators against recoding using Weight of Evidence (WoE). Our models surpassed the performance of the manual loan application selection process, loans granted through the models criteria presented fewer overdues, also the approval criteria of the models increased the amount of granted loans substantially. Compared to the baseline, the loans approved by meeting the criteria of the SVM model presented -196.80% overdue rate. At the same time, the approval criteria of the SVM model generated 251.53% more loans. This paper shows that credit scoring can be useful in emerging markets. The non-traditional data can be used to build algorithms that can identify good borrowers as in traditional banking.

FecharLer Abstract

2014

Challenges in Learning from Streaming Data Extended Abstract

Autores
Gama, J;

Publicação
ICT Innovations 2014 - World of Data, Ohrid, Macedonia, 1-4 October, 2014

Abstract
Machine learning studies automatic methods for acquisition of domain knowledge with the goal of improving systems performance as the result of experience. In the past two decades, machine learning research and practice has focused on batch learning usually with small data sets. The rationale behind this practice is that examples are generated at random accordingly to some stationary probability distribution. Most learners use a greedy, hill-climbing search in the space of models. They are prone to overfitting, local maximas, etc. Data are scarce and statistic estimates have high variance. A paradigmatic example is the TDIT algorithm to learn decision trees [14]. As the tree grows, less and fewer examples are available to compute the sufficient statistics, variance increase leading to model instability Moreover, the growing process re-uses the same data, exacerbating the overfitting problem. Regularization and pruning mechanisms are mandatory. © Springer International Publishing Switzerland 2015.

FecharLer Abstract

2014

Ensembles of Adaptive Model Rules from High-Speed Data Streams

Autores
Duarte, J; Gama, J;

Publicação
Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine 2014, New York City, USA, August 24, 2014

Abstract

2015

Prediction Intervals for Electric Load Forecast: Evaluation for Different Profiles

Autores
Almeida, V; Gama, J;

Publicação
2015 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM APPLICATION TO POWER SYSTEMS (ISAP)

Abstract
Electricity industries throughout the world have been using load profiles for many years. Electrical load data contain valuable information that can be useful for both electricity producers and consumers. Load forecasting is a fundamental and important task to operate power systems efficiently and economically. Currently, prediction intervals (PIs) are assuming increasing importance comparatively to point forecast that cannot properly handle forecast uncertainties, since they are capable to compromise informativeness and correctness. This paper aims to demonstrate that different demand profiles clearly influence PIs reliability and width. The evaluation is performed using data from different customers on the basis of their electricity behavior using hierarchical clustering, and taking the Kullback-Leibler divergence as the distance metric. PIs are obtained using two different strategies: (1) dual perturb and combine algorithm and (2) conformal prediction. It was possible to demonstrate that different demand profiles clearly influence PI reliability and width for both models. The knowledge retrieved from the analysis of the load patterns is useful and can be used to support the selection of the best method to interval forecast, considering a specific location. And also, it can support the selection of an optimum confidence level, considering that a too wide PI conveys little information and is of no use for decision making.

FecharLer Abstract

2014

Keynote speakers

Autores
Gama, J;

Publicação
IEEE Symposium on Computers and Communications, ISCC 2014, Funchal, Madeira, Portugal, June 23-26, 2014

Abstract

2017

Progress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Porto, Portugal, September 5-8, 2017, Proceedings

Autores
Oliveira, Eugenio; Gama, Joao; Vale, ZitaA.; Cardoso, HenriqueLopes;

Publicação
EPIA

Abstract