Publications

Publications by Ricardo Teixeira Sousa

2016

Online Semi-supervised Learning for Multi-target Regression in Data Streams Using AMRules

Authors
Sousa, R; Gama, J;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XV

Abstract
Most data streams systems that use online Multi-target regression yield vast amounts of data which is not targeted. Targeting this data is usually impossible, time consuming and expensive. Semi-supervised algorithms have been proposed to use this untargeted data (input information only) for model improvement. However, most algorithms are adapted to work on batch mode for classification and require huge computational and memory resources. Therefore, this paper proposes an semi-supervised algorithm for online processing systems based on AMRules algorithm that handle both targeted and untargeted data and improves the regression model. The proposed method was evaluated through a comparison between a scenario where the untargeted examples are not used on the training and a scenario where some untargeted examples are used. Evaluation results indicate that the use of the untargeted examples improved the target predictions by improving the model.

CloseRead Abstract

2018

Co-training study for Online Regression

Authors
Sousa, R; Gama, J;

Publication
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING

Abstract
This paper describes the development of a Co-training (semi-supervised approach) method that uses multiple learners for single target regression on data streams. The experimental evaluation was focused on the comparison between a realistic supervised scenario (all unlabelled examples are discarded) and scenarios where unlabelled examples are used to improve the regression model. Results present fair evidences of error measure reduction by using the proposed Co-training method. However, the error reduction still is relatively small.

CloseRead Abstract

2019

Robust cepstral-based features for anomaly detection in ball bearings

Authors
Sousa, R; Antunes, J; Coutinho, F; Silva, E; Santos, J; Ferreira, H;

Publication
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY

Abstract
This paper proposes the linear frequency cepstral coefficients as highly discriminative features for anomaly detection in ball bearings using vibration sensor data. These features are based on cepstral analysis and are capable of encoding the patterns of a spectral magnitude profile. Incipient damages on bearings can grow rapidly under normal use resulting in vibration and harsh noise. If left undetected, this damage will worsen, leading to high maintenance costs or even injury. Multiple interferences in an industrial environment contaminate the signal, making it a challenge to correctly identify the bearings' condition. Many studies have attempted to overcome this issue at the signal level. However, the discriminative capacity of the current vibration signal features is still vulnerable to interference, which motivates this work. In order to demonstrate the benefits of these features, we (1) show that they are computationally efficient and suitable for real-time incremental training; (2) conduct discriminative analysis by evaluating the separability performance and comparing it with the state of the art; and (3) test the robustness of the proposed features under noise interference, which is ideal for use in the harsh operating conditions of industrial machinery. The data was obtained from a laboratory workbench setting that reproduces bearing fault scenarios. Results show that the proposed features are fast, competitive when compared to state-of-the-art features, and resilient to high levels of interference. Despite the higher performance when using the quadratic model, the proposed features remain highly discriminative when used with several other discriminant function.

CloseRead Abstract

2019

BRIGHT - Drift-Aware Demand Predictions for Taxi Networks

Authors
Saadallah, A; Moreira Matias, L; Sousa, R; Khiari, J; Jenelius, E; Gama, J;

Publication
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019)

Abstract
The dynamic behavior of urban mobility patterns makes matching taxi supply with demand as one of the biggest challenges in this industry. Recently, the increasing availability of massive broadcast GPS data has encouraged the exploration of this issue under different perspectives. One possible solution is to build a data-driven real-time taxi-dispatching recommender system. However, existing systems are based on strong assumptions such as stationary demand distributions and finite training sets, which make them inadequate for modeling the dynamic nature of the network. In this paper, we propose BRIGHT: a drift-aware supervised learning framework which aims to provide accurate predictions for short-term horizon taxi demand quantities through a creative ensemble of time series analysis methods that handle distinct types of concept drift. A large experimental set-up which includes three real-world transportation networks and a synthetic test-bed with artificially inserted concept drifts, was employed to illustrate the advantages of BRIGHT when compared to S.o.A methods for this problem.

CloseRead Abstract

2020

BRIGHT-Drift-Aware Demand Predictions for Taxi Networks

Authors
Saadallah, A; Moreira Matias, L; Sousa, R; Khiari, J; Jenelius, E; Gama, J;

Publication
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Abstract
Massive data broadcast by GPS-equipped vehicles provide unprecedented opportunities. One of the main tasks in order to optimize our transportation networks is to build data-driven real-time decision support systems. However, the dynamic environments where the networks operate disallow the traditional assumptions required to put in practice many off-the-shelf supervised learning algorithms, such as finite training sets or stationary distributions. In this paper, we propose BRIGHT: a drift-aware supervised learning framework to predict demand quantities. BRIGHT aims to provide accurate predictions for short-term horizons through a creative ensemble of time series analysis methods that handles distinct types of concept drift. By selecting neighborhoods dynamically, BRIGHT reduces the likelihood of overfitting. By ensuring diversity among the base learners, BRIGHT ensures a high reduction of variance while keeping bias stable. Experiments were conducted using three large-scale heterogeneous real-world transportation networks in Porto (Portugal), Shanghai (China), and Stockholm (Sweden), as well as with controlled experiments using synthetic data where multiple distinct drifts were artificially induced. The obtained results illustrate the advantages of BRIGHT in relation to state-of-the-art methods for this task.

CloseRead Abstract

2020

Transfer Learning in urban object classification: Online images to recognize point clouds

Authors
Balado, J; Sousa, R; Diaz Vilarino, L; Arias, P;

Publication
AUTOMATION IN CONSTRUCTION

Abstract
The application of Deep Learning techniques to point clouds for urban object classification is limited by the large number of samples needed. Acquiring and tagging point clouds is more expensive and tedious labour than its image equivalent process. Point cloud online datasets contain few samples for Deep Learning or not always the desired classes This work focuses on minimizing the use of point cloud samples for neural network training in urban object classification. The method proposed is based on the conversion of point clouds to images (pc-images) because it enables: the use of Convolutional Neural Networks, the generation of several samples (images) per object (point clouds) by means of multi-view, and the combination of pc-images with images from online datasets (ImageNet and Google Images). The study is conducted with ten classes of objects extracted from two street point clouds from two different cities. The network selected for the job is the InceptionV3. The training set consists of 5000 online images with a variable percentage (0% to 10%) of pc-images. The validation and testing sets are composed exclusively of pc-images. Although the network trained only with online images reached 47% accuracy, the inclusion of a small percentage of pc-images in the training set improves the classification to 99.5% accuracy with 6% pc-images. The network is also applied at IQmulus & TerraMobilita Contest dataset and it allows the correct classification of elements with few samples.

CloseRead Abstract